I am using the default postagger tool. I have many sentences like:
word1_pos word2_pos ... I did not add anything. I know It is a java problem, but how is it possible that 7GB cannot handle a training corpus of 1 GB ? Il 03/Gen/2017 20:48, "William Colen" <william.co...@gmail.com> ha scritto: > Review your context generator. Maybe it is getting too many features. Try > to keep the strings small in the context generator. > > > 2017-01-03 17:02 GMT-02:00 Damiano Porta <damianopo...@gmail.com>: > > > I always get Exception in thread "main" java.lang.OutOfMemoryError: GC > > overhead limit exceeded > > I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB for > > training. > > > > Could the number of threads helps? > > > > 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>: > > > > > Ok, i think the best value is matching the number of CPU cores, right? > > > > > > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < > > dr...@mail.nih.gov> > > > : > > > > > >> I do not believe the perceptron trainer is multithreaded. But it > should > > >> be fast. > > >> > > >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote: > > >> > > >> Hi WIlliam, thank you! > > >> Is there a similar thing for perceptron (perceptron sequence) too? > > >> > > >> 2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>: > > >> > > >> > Damiano, > > >> > > > >> > If you are using Maxent, try TrainingParameters.THREADS_PARAM > > >> > > > >> > https://opennlp.apache.org/documentation/1.7.0/apidocs/ > > >> > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR > > >> EADS_PARAM > > >> > > > >> > William > > >> > > > >> > 2017-01-03 16:27 GMT-02:00 Damiano Porta < > damianopo...@gmail.com > > >: > > >> > > > >> > > I am training a new postagger and lemmatizer. > > >> > > > > >> > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < > > >> > dr...@mail.nih.gov > > >> > > >: > > >> > > > > >> > > > Can you be a little more specific? What trainer are you > > using? > > >> > > > Thanks > > >> > > > Daniel > > >> > > > > > >> > > > On 1/3/17, 1:22 PM, "Damiano Porta" <damianopo...@gmail.com > > > > >> wrote: > > >> > > > > > >> > > > Hello, > > >> > > > I have a very very big training set, is there a way to > > >> speed up the > > >> > > > training process? I only have changed the Xmx option > > inside > > >> > > bin/opennlp > > >> > > > > > >> > > > Thanks > > >> > > > Damiano > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >> > > > > > >