I've gotten some use out of putting models in JARs so I could use Maven to deploy out to the cluster.
In either case, JAR files or HDFS, if the code is written to open a java.io.File, some modification will be necessary. -Chris ________________________________________ From: James Kosin [[email protected]] Sent: Thursday, June 07, 2012 6:17 PM To: [email protected] Subject: Re: openNLP with Hadoop MapReduce Programming Hadoop seems to be a large scale project; so, the work would be spread across many servers / clients to perform the work. The map reduce would allow all the processes across many servers to be done and then synchronized to provide the final results. So, each process would have to load its own model. The file system using HDFS should allow sharing of the models and large data collection between them all. On 6/7/2012 3:45 AM, Jörn Kottmann wrote: > On 06/07/2012 05:39 AM, James Kosin wrote: >> Hmm, good idea. I'll have to try that soon... I do create models for my >> project and have them included in the JAR... but, haven't gotten around >> to testing with them embedded in the JAR file. I know there will be >> issues with this and it is usually best to keep them in either windows >> or linux file system. >> Jorn has the start of supporting the web-server side; but, I know it is >> far from complete... he still has this marked as a TODO for the >> interface. Unless I'm a bit behind now. > > I usually load my models from an http server, because > they are getting updated much more frequently than > my jars, but if you use map reduce you will need to do > the loading yourself (very easy in java). > > Just including a model in a jar works great and many > people actually do that. > > If you have many threads you want to share the models > between them I am not sure how this is done in map reduce. > > Jörn
