Okay, I think I can use the DistributedCache, but I'm not sure on one
thing.  How do I get to the cache from the plugin?  I need the hadoop
context in order to make a call to the cache.

On Thu, Oct 11, 2012 at 7:40 AM, Ferdy Galema <[email protected]>wrote:

> There are some options. The best way imho is to use getResourceAsStream().
> Put the file in the same package as the calling java code. Then do
> something like
> getClass().getClassLoader().getResourceAsStream("myfile");
>
> If you really want to directly access the file then you can put it in
> /classes in the job file. This directory is expanded into the current
> working directory of a running task.
>
> Last but not least you are able to use shared filesystem, for example the
> HDFS or the mapreduce DistributedCache. This is useful if the files are
> big, or change a lot.
>
> On Thu, Oct 11, 2012 at 1:20 PM, Bai Shen <[email protected]> wrote:
>
> > I need to reference a file from my plugin.  However, when I try to call
> it
> > using File(blah.txt), it looks for the file at the location where I run
> > nutch from, not in the job file.
> >
> > What is the proper way to refer to the files in the job file?
> >
> > Thanks.
> >
>

Reply via email to