That's what's done in Behemoth (https://github.com/DigitalPebble/behemoth) e.g. for sharing GATE or UIMA resources. The code can be used as an example of how to do this.
Julien > I think distributed cache is a good way to do this. I did some similar work about stanford parser model loading in Hadoop using > distributed cache. > I think that will solve the problem. But we should be careful because the > Hadoop system is normally data-intensive, and NLP handling there may cause > high-CPU usage and problem to other jobs. > > Sheng > > > Date: Thu, 7 Jun 2012 20:17:26 -0400 > > From: [email protected] > > To: [email protected] > > Subject: Re: openNLP with Hadoop MapReduce Programming > > > > Hadoop seems to be a large scale project; so, the work would be spread > > across many servers / clients to perform the work. The map reduce would > > allow all the processes across many servers to be done and then > > synchronized to provide the final results. So, each process would have > > to load its own model. The file system using HDFS should allow sharing > > of the models and large data collection between them all. > > > > On 6/7/2012 3:45 AM, Jörn Kottmann wrote: > > > On 06/07/2012 05:39 AM, James Kosin wrote: > > >> Hmm, good idea. I'll have to try that soon... I do create models for > my > > >> project and have them included in the JAR... but, haven't gotten > around > > >> to testing with them embedded in the JAR file. I know there will be > > >> issues with this and it is usually best to keep them in either windows > > >> or linux file system. > > >> Jorn has the start of supporting the web-server side; but, I know it > is > > >> far from complete... he still has this marked as a TODO for the > > >> interface. Unless I'm a bit behind now. > > > > > > I usually load my models from an http server, because > > > they are getting updated much more frequently than > > > my jars, but if you use map reduce you will need to do > > > the loading yourself (very easy in java). > > > > > > Just including a model in a jar works great and many > > > people actually do that. > > > > > > If you have many threads you want to share the models > > > between them I am not sure how this is done in map reduce. > > > > > > Jörn > > > > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
