Maybe you can build this resources into plugin jar package like "language-identifier" plugin and load them at run time.
On Sun, May 4, 2014 at 1:52 PM, chethan <[email protected]> wrote: > I have setup Nutch to crawl on Amazon EMR and I have a plugin that > uses GATE<https://gate.ac.uk/> for > text processing in the Indexing filters. GATE requires certain static > resources (some xmls and text files) to be loaded for it to be initialized. > I tried to bundle these resources in the job jar and load them from the > classpath but that didn't work. I also tried copying them to HDFS and > loading them from there but that too failed. > > What is the best way to bundle such static resources and reference them in > the Indexing filters? I am working on copying the file to the distributed > cache and loading it from there but I wanted to know how others are > handling this. Thanks. > > Regards, > > -- > Chethan Prasad > -- Don't Grow Old, Grow Up... :-)

