Hmm, distributedcache.getLocalCacheArchives
On Tue, Feb 12, 2013 at 9:28 PM, Saptarshi Guha <[email protected]>wrote: > Hello, > > I'm bit fuzzy on the details here so appreciate your help. > > I am embedding a language into the JVM. My hadoop job will instantiate the > child JVM once for all tasks assigned (mapred.job.reuse.jvm.num.tasks = > -1) > > So if a node can run 6 parallel JVMs, it will and these 6 will churn > through all the tasks assigned to them. > > Now, per JVM, the language engine will be instantiated. For this to work, > I will ship the language distribution to the nodes (the nodes are really > bare and installing the language on the node is not an option) using the > distributed cache (as a tar.gz. file). > > My understanding is that HadoopMapreduce will unarchive this tgz file and > then for every task attempt symlink it into the task attempt's working > folder. > > However, for the language engine to be successfully initialized i need to > know the location of the unarchived file, a location that will stay > constant across all task attempts for that child JVM, > > Q: How can i infer this location? > > Cheers > Saptarshi > >
