RE: openNLP with Hadoop MapReduce Programming

Roeder, Chris Sun, 10 Jun 2012 11:30:24 -0700

I've gotten some use out of putting models in JARs so I could
use Maven to deploy out to the cluster.

In either case,  JAR files or HDFS, if the code is written to
open a java.io.File, some modification will be necessary.

-Chris
________________________________________
From: James Kosin [[email protected]]
Sent: Thursday, June 07, 2012 6:17 PM
To: [email protected]
Subject: Re: openNLP with Hadoop MapReduce Programming

Hadoop seems to be a large scale project; so, the work would be spread
across many servers / clients to perform the work.  The map reduce would
allow all the processes across many servers to be done and then
synchronized to provide the final results.  So, each process would have
to load its own model.  The file system using HDFS should allow sharing
of the models and large data collection between them all.

On 6/7/2012 3:45 AM, Jörn Kottmann wrote:
> On 06/07/2012 05:39 AM, James Kosin wrote:
>> Hmm, good idea.  I'll have to try that soon... I do create models for my
>> project and have them included in the JAR... but, haven't gotten around
>> to testing with them embedded in the JAR file.  I know there will be
>> issues with this and it is usually best to keep them in either windows
>> or linux file system.
>> Jorn has the start of supporting the web-server side; but, I know it is
>> far from complete... he still has this marked as a TODO for the
>> interface.  Unless I'm a bit behind now.
>
> I usually load my models from an http server, because
> they are getting updated much more frequently than
> my jars, but if you use map reduce you will need to do
> the loading yourself (very easy in java).
>
> Just including a model in a jar works great and many
> people actually do that.
>
> If you have many threads you want to share the models
> between them I am not sure how this is done in map reduce.
>
> Jörn

RE: openNLP with Hadoop MapReduce Programming

Reply via email to