Re: openNLP with Hadoop MapReduce Programming

Julien Nioche Fri, 08 Jun 2012 06:46:39 -0700

That's what's done in Behemoth (https://github.com/DigitalPebble/behemoth)
e.g. for sharing GATE or UIMA resources. The code can be used as an example
of how to do this.


Julien


> I think distributed cache is a good way to do this.

I did some similar work about stanford parser model loading in Hadoop using
> distributed cache.
> I think that will solve the problem. But we should be careful because the
> Hadoop system is normally data-intensive, and NLP handling there may cause
> high-CPU usage and problem to other jobs.
>
> Sheng
>
> > Date: Thu, 7 Jun 2012 20:17:26 -0400
> > From: [email protected]
> > To: [email protected]
> > Subject: Re: openNLP with Hadoop MapReduce Programming
> >
> > Hadoop seems to be a large scale project; so, the work would be spread
> > across many servers / clients to perform the work.  The map reduce would
> > allow all the processes across many servers to be done and then
> > synchronized to provide the final results.  So, each process would have
> > to load its own model.  The file system using HDFS should allow sharing
> > of the models and large data collection between them all.
> >
> > On 6/7/2012 3:45 AM, Jörn Kottmann wrote:
> > > On 06/07/2012 05:39 AM, James Kosin wrote:
> > >> Hmm, good idea.  I'll have to try that soon... I do create models for
> my
> > >> project and have them included in the JAR... but, haven't gotten
> around
> > >> to testing with them embedded in the JAR file.  I know there will be
> > >> issues with this and it is usually best to keep them in either windows
> > >> or linux file system.
> > >> Jorn has the start of supporting the web-server side; but, I know it
> is
> > >> far from complete... he still has this marked as a TODO for the
> > >> interface.  Unless I'm a bit behind now.
> > >
> > > I usually load my models from an http server, because
> > > they are getting updated much more frequently than
> > > my jars, but if you use map reduce you will need to do
> > > the loading yourself (very easy in java).
> > >
> > > Just including a model in a jar works great and many
> > > people actually do that.
> > >
> > > If you have many threads you want to share the models
> > > between them I am not sure how this is done in map reduce.
> > >
> > > Jörn
> >
> >
>
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: openNLP with Hadoop MapReduce Programming

Reply via email to