Hi Prav, Yes, you are correct that DistributedCache does not upload file into memory. Also using job configuration and DistributedCache are 2 different approaches. I am referring based on "Hadoop: The definitive guide" Chapter:8 > Side Data Distribution (Page 288-295). As you are saying that now methods of DistributedCache moved to Job, I request if you please share some article or document on that for my better understanding, it will be great help.
Thanks Amit On Thu, Jan 30, 2014 at 5:35 PM, praveenesh kumar <[email protected]>wrote: > Hi Amit, > > I am not sure how are they linked with DistributedCache.. Job > configuration is not uploading any data in memory.. As far as I am aware of > how DistributedCache works, nothing get loaded in memory. Distributed cache > just copies the files into slave nodes, so that they are accessible to > mappers/reducers. Usually the location is > ${hadoop.tmp.dir}/${mapred.local.dir}/tasktracker/archive (depends from > distribution to distribution) You always have to read the files in your > mapper or reducer when ever you want to use them. > > What has happened is the method of DistributedCache class has now been > added to Job class, and I am assuming they won't change the functionality > of how distributed cache methods used to work, otherwise there would have > been some nice articles on that, plus I don't see any reason of changing > that as well too.. so everything works still the same way.. Its just that > you use the new Job class to use distributed cache features. > > I am not sure what entries you are exactly pointing to. Am I missing > anything here ? > > > Regards > Prav > > > On Thu, Jan 30, 2014 at 6:12 AM, Amit Mittal <[email protected]>wrote: > >> Hi Mike & Prav, >> >> Although I am new to Hadoop, but would like to add my 2 cents if that >> helps. >> We are having 2 ways for distribution of shared data, one is using Job >> configuration and other is DistributedCache. >> As job configuration is read by the JT, TT and child JVMs, and each time >> the configuration is read, all of its entries are read in memory, even if >> they are not used. So using job configuration is not advised if the data is >> more than few kilobytes. So it is not alternative to DistributedCache >> unless some modifications are done in Job configuration to address this >> limitation. >> So I am also curious to know the alternatative to DistributedCache class. >> >> Thanks >> Amit >> >> >> >> On Thu, Jan 30, 2014 at 2:43 AM, Giordano, Michael < >> [email protected]> wrote: >> >>> I noticed that in Hadoop 2.2.0 >>> org.apache.hadoop.mapreduce.filecache.DistributedCache has been deprecated. >>> >>> >>> >>> (http://hadoop.apache.org/docs/current/api/deprecated-list.html#class) >>> >>> >>> >>> Is there a class that provides equivalent functionality? My application >>> relies heavily on DistributedCache. >>> >>> >>> >>> Thanks, >>> >>> Mike G. >>> >>> This communication, along with its attachments, is considered >>> confidential and proprietary to Vistronix. It is intended only for the use >>> of the person(s) named above. Note that unauthorized disclosure or >>> distribution of information not generally known to the public is strictly >>> prohibited. If you are not the intended recipient, please notify the >>> sender immediately. >>> >> >> >
