Maybe you can build this resources into plugin jar package like
"language-identifier" plugin and load them at run time.


On Sun, May 4, 2014 at 1:52 PM, chethan <[email protected]> wrote:

> I have setup Nutch to crawl on Amazon EMR and I have a plugin that
> uses GATE<https://gate.ac.uk/> for
> text processing in the Indexing filters. GATE requires certain static
> resources (some xmls and text files) to be loaded for it to be initialized.
> I tried to bundle these resources in the job jar and load them from the
> classpath but that didn't work. I also tried copying them to HDFS and
> loading them from there but that too failed.
>
> What is the best way to bundle such static resources and reference them in
> the Indexing filters? I am working on copying the file to the distributed
> cache and loading it from there but I wanted to know how others are
> handling this. Thanks.
>
> Regards,
>
> --
> Chethan Prasad
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to