Re: 1.6.0 maven repo
+1 to start making a release. I would like to be involved too. R On 19 Nov 2014 23:40, "Joern Kottmann" wrote: > Hello, > > yes, that should be the current state. > > Can you please elaborate on the issue you have. > Do you get an old version? > > We should try to make a release of 1.6.0, I think most issues > are already solved and remaining bugs we will uncover during the manual > testing phase. > > Jörn > > On Wed, 2014-11-19 at 21:20 +0100, Rodrigo Agerri wrote: > > Hi > > > > Any chance to release snapshot repos to maven central? Or to an apache > > snapshots repo? > > > > It would make the use of current trunk via API much easier. > > > > Cheers > > > > Rodrigo > > >
Re: 1.6.0 maven repo
You probably need to include the Apache snapshot repository in your pom to make that work. https://repository.apache.org/content/repositories/snapshots/ Maybe we should mention that on our site, so people know how to run the latest snapshot version. Jörn On 11/20/2014 07:52 AM, Rodrigo Agerri wrote: Hi, Sorry, I was on my mobile. The issues is that when I add version 1.6.0 as a dependency, org.apache.opennlp opennlp-tools 1.6.0-SNAPSHOT compile it does not find it. So, either the dependency should be specified differently or I need to add a repository (an apache repository presumably) which I could not find in the documentation (I used to remember I could do this for 1.5.3-SNAPSHOT...). Thanks, Rodrigo On Wed, Nov 19, 2014 at 10:38 PM, Joern Kottmann wrote: Hello, yes, that should be the current state. Can you please elaborate on the issue you have. Do you get an old version? We should try to make a release of 1.6.0, I think most issues are already solved and remaining bugs we will uncover during the manual testing phase. Jörn On Wed, 2014-11-19 at 21:20 +0100, Rodrigo Agerri wrote: Hi Any chance to release snapshot repos to maven central? Or to an apache snapshots repo? It would make the use of current trunk via API much easier. Cheers Rodrigo
Re: 1.6.0 maven repo
Hi, Sorry, I was on my mobile. The issues is that when I add version 1.6.0 as a dependency, org.apache.opennlp opennlp-tools 1.6.0-SNAPSHOT compile it does not find it. So, either the dependency should be specified differently or I need to add a repository (an apache repository presumably) which I could not find in the documentation (I used to remember I could do this for 1.5.3-SNAPSHOT...). Thanks, Rodrigo On Wed, Nov 19, 2014 at 10:38 PM, Joern Kottmann wrote: > Hello, > > yes, that should be the current state. > > Can you please elaborate on the issue you have. > Do you get an old version? > > We should try to make a release of 1.6.0, I think most issues > are already solved and remaining bugs we will uncover during the manual > testing phase. > > Jörn > > On Wed, 2014-11-19 at 21:20 +0100, Rodrigo Agerri wrote: >> Hi >> >> Any chance to release snapshot repos to maven central? Or to an apache >> snapshots repo? >> >> It would make the use of current trunk via API much easier. >> >> Cheers >> >> Rodrigo > >
Re: Need to speed up the model creation process of OpenNLP
The runtime almost scales with the number of cores your CPU you have. If you have a 4 core CPU you might come down from 3 hours to 1 hour. To enabled it you need to train with the -params argument and provide a config file for the learner. There are samples shipped with OpenNLP. Jörn On Wed, 2014-11-19 at 20:19 +, nikhil jain wrote: > Hi Rodrigo, > No, I am not using multi-threading, it's a simple Java program, took help > from openNLP documentation but it is worth mentioning over here is that as > the corpus is containing 4 million records so my Java program running in > eclipse was frequently giving me java heap space issue (out of memory issue) > so I investigate a bit and found that process was taking around 10GB memory > for building the model so i increased the memory to 10 GB using -Xmx > parameter. so it worked properly but took 3 hours. > Thanks-NIkhil > From: Rodrigo Agerri > To: "dev@opennlp.apache.org" ; nikhil jain > > Cc: "us...@opennlp.apache.org" > Sent: Wednesday, November 19, 2014 2:17 AM > Subject: Re: Need to speed up the model creation process of OpenNLP > > Hi, > > Are you using multithreading, lots of threads, RAM memory? > > R > > > > > On Tue, Nov 18, 2014 at 5:46 PM, nikhil jain > wrote: > > Hi, > > I asked below question yesterday, did anyone get a chance to look at this. > > I am new in OpenNLP and really need some help. Please provide some clue or > > link or example. > > ThanksNIkhil > > From: nikhil jain > > To: "us...@opennlp.apache.org" ; Dev at Opennlp > > Apache > > Sent: Tuesday, November 18, 2014 12:02 AM > > Subject: Need to speed up the model creation process of OpenNLP > > > > Hi, > > I am using OpenNLP Token Name Finder for parsing the unstructured data. I > > have created a corpus of about 4 million records. When I am creating a > > model out of the training set using openNLP API's in Eclipse using default > > setting (cut-off 5 and iterations 100), process is taking a good amount of > > time, around 2-3 hours. > > Can someone suggest me how can I reduce the time as I want to experiment > > with different iterations but as the model creation process is taking so > > much time, I am not able to experiment with it. This is really a time > > consuming process. > > Please provide some feedback. > > Thanks in advance.Nikhil Jain > > > > > >
Re: 1.6.0 maven repo
Hello, yes, that should be the current state. Can you please elaborate on the issue you have. Do you get an old version? We should try to make a release of 1.6.0, I think most issues are already solved and remaining bugs we will uncover during the manual testing phase. Jörn On Wed, 2014-11-19 at 21:20 +0100, Rodrigo Agerri wrote: > Hi > > Any chance to release snapshot repos to maven central? Or to an apache > snapshots repo? > > It would make the use of current trunk via API much easier. > > Cheers > > Rodrigo
1.6.0 maven repo
Hi Any chance to release snapshot repos to maven central? Or to an apache snapshots repo? It would make the use of current trunk via API much easier. Cheers Rodrigo
Re: Need to speed up the model creation process of OpenNLP
Hi Rodrigo, No, I am not using multi-threading, it's a simple Java program, took help from openNLP documentation but it is worth mentioning over here is that as the corpus is containing 4 million records so my Java program running in eclipse was frequently giving me java heap space issue (out of memory issue) so I investigate a bit and found that process was taking around 10GB memory for building the model so i increased the memory to 10 GB using -Xmx parameter. so it worked properly but took 3 hours. Thanks-NIkhil From: Rodrigo Agerri To: "dev@opennlp.apache.org" ; nikhil jain Cc: "us...@opennlp.apache.org" Sent: Wednesday, November 19, 2014 2:17 AM Subject: Re: Need to speed up the model creation process of OpenNLP Hi, Are you using multithreading, lots of threads, RAM memory? R On Tue, Nov 18, 2014 at 5:46 PM, nikhil jain wrote: > Hi, > I asked below question yesterday, did anyone get a chance to look at this. > I am new in OpenNLP and really need some help. Please provide some clue or > link or example. > ThanksNIkhil > From: nikhil jain > To: "us...@opennlp.apache.org" ; Dev at Opennlp >Apache > Sent: Tuesday, November 18, 2014 12:02 AM > Subject: Need to speed up the model creation process of OpenNLP > > Hi, > I am using OpenNLP Token Name Finder for parsing the unstructured data. I > have created a corpus of about 4 million records. When I am creating a model > out of the training set using openNLP API's in Eclipse using default setting > (cut-off 5 and iterations 100), process is taking a good amount of time, > around 2-3 hours. > Can someone suggest me how can I reduce the time as I want to experiment with > different iterations but as the model creation process is taking so much > time, I am not able to experiment with it. This is really a time consuming > process. > Please provide some feedback. > Thanks in advance.Nikhil Jain > >
Re: Need to speed up the model creation process of OpenNLP
Hi Samik, Thank you so much for the quick feedback. 1. You can possibly have smaller training sets and see if the models deteriorate substantially: Yes I have 4 training sets each containing 1 million records but i dont understand how it would be useful? because when I am creating a one model out of these 4 training sets then I have to pass all the records at once for creating a model so it would take time, right? 2. Another strategy is to incrementally introduce training sets containing specific class of Token Names - that would provide a quicker turnaroundRight, I am doing the same thing as you mentioned, like I have 4 different classes and each class contains 1 Million records. so initially I created a model on 1 Millions records so it took less time and worked properly then I added another one, so size of the corpus become 2 million and again created a model based on 2 million records and so on, but the problem is when i am adding more records in the corpus then model creation process is taking time.is it possible to reuse the model with new training set, means like i have a model based on 2 million records and now i can say reuse the old model but adjust the model again based on new records. if this is possible then small training sets would be useful, right? As I mentioned, I am new in openNLP and machine learning. so please explain with example if I am missing something. Thanks Nikhil From: Samik Raychaudhuri To: dev@opennlp.apache.org Sent: Wednesday, November 19, 2014 6:00 AM Subject: Re: Need to speed up the model creation process of OpenNLP Hi, This is essentially a machine learning problem, nothing to do with OpenNLP. If you have such a large corpus, it would take a substantial amount of time to train models. You can possibly have smaller training sets and see if the models deteriorate substantially. Another strategy is to incrementally introduce training sets containing specific class of Token Names - that would provide a quicker turnaround. Hope this help. Best, -Samik On 18/11/2014 8:46 AM, nikhil jain wrote: > Hi, > I asked below question yesterday, did anyone get a chance to look at this. > I am new in OpenNLP and really need some help. Please provide some clue or > link or example. > ThanksNIkhil > From: nikhil jain > To: "us...@opennlp.apache.org" ; Dev at Opennlp >Apache > Sent: Tuesday, November 18, 2014 12:02 AM > Subject: Need to speed up the model creation process of OpenNLP > > Hi, > I am using OpenNLP Token Name Finder for parsing the unstructured data. I > have created a corpus of about 4 million records. When I am creating a model > out of the training set using openNLP API's in Eclipse using default setting > (cut-off 5 and iterations 100), process is taking a good amount of time, > around 2-3 hours. > Can someone suggest me how can I reduce the time as I want to experiment with > different iterations but as the model creation process is taking so much > time, I am not able to experiment with it. This is really a time consuming > process. > Please provide some feedback. > Thanks in advance.Nikhil Jain > >