Re: Passing data files via the distributed cache

2011-11-28 Thread Andy Doddington
Thanks for that link Prashant - very useful. Two brief follow-up questions: 1) Having put data in the cache, I would like to be a good citizen by deleting the data from the cache once I’ve finished - how do I do that? 2) Would it be simpler to pass the data as a value in the jobConf object?

Re: Passing data files via the distributed cache

2011-11-28 Thread Robert Evans
There is currently no way to delete the data from the cache when you are done. It is garbage collected when the cache starts to fill up (in LRU order if you are on a newer release). The DistributedCache.addCacheFile is modifying the JobConf behind the scenes for you. If you want to dig into

Re: Passing data files via the distributed cache

2011-11-25 Thread Prashant Kommireddi
I believe you want to ship data to each node in your cluster before MR begins so the mappers can access files local to their machine. Hadoop tutorial on YDN has some good info on this. http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata -Prashant Kommireddi On Fri, Nov 25, 2011 at