I am training the model in this way: opennlp POSTaggerTrainer -type maxent -model /home/damiano/it-pos-maxent-new.bin -lang it -data /home/damiano/postagger.train -encoding UTF-8
2017-01-03 21:01 GMT+01:00 Damiano Porta <damianopo...@gmail.com>: > I am using the default postagger tool. > > I have many sentences like: > > word1_pos word2_pos ... > > I did not add anything. I know It is a java problem, but how is it > possible that 7GB cannot handle a training corpus of 1 GB ? > > > > Il 03/Gen/2017 20:48, "William Colen" <william.co...@gmail.com> ha > scritto: > >> Review your context generator. Maybe it is getting too many features. Try >> to keep the strings small in the context generator. >> >> >> 2017-01-03 17:02 GMT-02:00 Damiano Porta <damianopo...@gmail.com>: >> >> > I always get Exception in thread "main" java.lang.OutOfMemoryError: GC >> > overhead limit exceeded >> > I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB >> for >> > training. >> > >> > Could the number of threads helps? >> > >> > 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>: >> > >> > > Ok, i think the best value is matching the number of CPU cores, right? >> > > >> > > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < >> > dr...@mail.nih.gov> >> > > : >> > > >> > >> I do not believe the perceptron trainer is multithreaded. But it >> should >> > >> be fast. >> > >> >> > >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote: >> > >> >> > >> Hi WIlliam, thank you! >> > >> Is there a similar thing for perceptron (perceptron sequence) >> too? >> > >> >> > >> 2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>: >> > >> >> > >> > Damiano, >> > >> > >> > >> > If you are using Maxent, try TrainingParameters.THREADS_PARAM >> > >> > >> > >> > https://opennlp.apache.org/documentation/1.7.0/apidocs/ >> > >> > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR >> > >> EADS_PARAM >> > >> > >> > >> > William >> > >> > >> > >> > 2017-01-03 16:27 GMT-02:00 Damiano Porta < >> damianopo...@gmail.com >> > >: >> > >> > >> > >> > > I am training a new postagger and lemmatizer. >> > >> > > >> > >> > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < >> > >> > dr...@mail.nih.gov >> > >> > > >: >> > >> > > >> > >> > > > Can you be a little more specific? What trainer are you >> > using? >> > >> > > > Thanks >> > >> > > > Daniel >> > >> > > > >> > >> > > > On 1/3/17, 1:22 PM, "Damiano Porta" < >> damianopo...@gmail.com> >> > >> wrote: >> > >> > > > >> > >> > > > Hello, >> > >> > > > I have a very very big training set, is there a way to >> > >> speed up the >> > >> > > > training process? I only have changed the Xmx option >> > inside >> > >> > > bin/opennlp >> > >> > > > >> > >> > > > Thanks >> > >> > > > Damiano >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> >> > >> >> > > >> > >> >