When you have a Configuration object you are good to go. Try any of the static DistributedCache.get*() methods, such as getCacheFiles or getFileStatus.
On Thu, Oct 11, 2012 at 5:11 PM, Bai Shen <[email protected]> wrote: > Okay, I think I can use the DistributedCache, but I'm not sure on one > thing. How do I get to the cache from the plugin? I need the hadoop > context in order to make a call to the cache. > > On Thu, Oct 11, 2012 at 7:40 AM, Ferdy Galema <[email protected] > >wrote: > > > There are some options. The best way imho is to use > getResourceAsStream(). > > Put the file in the same package as the calling java code. Then do > > something like > > getClass().getClassLoader().getResourceAsStream("myfile"); > > > > If you really want to directly access the file then you can put it in > > /classes in the job file. This directory is expanded into the current > > working directory of a running task. > > > > Last but not least you are able to use shared filesystem, for example the > > HDFS or the mapreduce DistributedCache. This is useful if the files are > > big, or change a lot. > > > > On Thu, Oct 11, 2012 at 1:20 PM, Bai Shen <[email protected]> > wrote: > > > > > I need to reference a file from my plugin. However, when I try to call > > it > > > using File(blah.txt), it looks for the file at the location where I run > > > nutch from, not in the job file. > > > > > > What is the proper way to refer to the files in the job file? > > > > > > Thanks. > > > > > >

