When you have a Configuration object you are good to go. Try any of the
static DistributedCache.get*() methods, such as getCacheFiles or
getFileStatus.

On Thu, Oct 11, 2012 at 5:11 PM, Bai Shen <[email protected]> wrote:

> Okay, I think I can use the DistributedCache, but I'm not sure on one
> thing.  How do I get to the cache from the plugin?  I need the hadoop
> context in order to make a call to the cache.
>
> On Thu, Oct 11, 2012 at 7:40 AM, Ferdy Galema <[email protected]
> >wrote:
>
> > There are some options. The best way imho is to use
> getResourceAsStream().
> > Put the file in the same package as the calling java code. Then do
> > something like
> > getClass().getClassLoader().getResourceAsStream("myfile");
> >
> > If you really want to directly access the file then you can put it in
> > /classes in the job file. This directory is expanded into the current
> > working directory of a running task.
> >
> > Last but not least you are able to use shared filesystem, for example the
> > HDFS or the mapreduce DistributedCache. This is useful if the files are
> > big, or change a lot.
> >
> > On Thu, Oct 11, 2012 at 1:20 PM, Bai Shen <[email protected]>
> wrote:
> >
> > > I need to reference a file from my plugin.  However, when I try to call
> > it
> > > using File(blah.txt), it looks for the file at the location where I run
> > > nutch from, not in the job file.
> > >
> > > What is the proper way to refer to the files in the job file?
> > >
> > > Thanks.
> > >
> >
>

Reply via email to