The error message indicates there is not enough virtual memory on the machine or perhaps that the machine is not managing the allocated memory effectively.
>From my experience increasing the number of threads will speed up the training of a model; however, increasing this parameter can cause this error message to appear when the training corpus reaches a certain critical size. Decreasing the number of threads at that point has enabled me to work around this issue. It appears that there is a trade-off at some point between a faster compilation speed and an effective memory management for the training process on a given machine. Isolating the precise causes of this behaviour and determining how to best control it would be useful research. For a discussion of Java memory management see this article: http://stackoverflow.com/questions/25033458/memory-consumed-by-a-thread D.S. On Tue, Jan 3, 2017 at 2:02 PM, Damiano Porta <damianopo...@gmail.com> wrote: > I always get Exception in thread "main" java.lang.OutOfMemoryError: GC > overhead limit exceeded > I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB for > training. > > Could the number of threads helps? > > 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>: > > > Ok, i think the best value is matching the number of CPU cores, right? > > > > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < > dr...@mail.nih.gov> > > : > > > >> I do not believe the perceptron trainer is multithreaded. But it should > >> be fast. > >> > >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote: > >> > >> Hi WIlliam, thank you! > >> Is there a similar thing for perceptron (perceptron sequence) too? > >> > >> 2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>: > >> > >> > Damiano, > >> > > >> > If you are using Maxent, try TrainingParameters.THREADS_PARAM > >> > > >> > https://opennlp.apache.org/documentation/1.7.0/apidocs/ > >> > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR > >> EADS_PARAM > >> > > >> > William > >> > > >> > 2017-01-03 16:27 GMT-02:00 Damiano Porta <damianopo...@gmail.com > >: > >> > > >> > > I am training a new postagger and lemmatizer. > >> > > > >> > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] < > >> > dr...@mail.nih.gov > >> > > >: > >> > > > >> > > > Can you be a little more specific? What trainer are you > using? > >> > > > Thanks > >> > > > Daniel > >> > > > > >> > > > On 1/3/17, 1:22 PM, "Damiano Porta" <damianopo...@gmail.com> > >> wrote: > >> > > > > >> > > > Hello, > >> > > > I have a very very big training set, is there a way to > >> speed up the > >> > > > training process? I only have changed the Xmx option > inside > >> > > bin/opennlp > >> > > > > >> > > > Thanks > >> > > > Damiano > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > >> > >> > > > -- David Sanderson Natural Language Processing Developer CrowdCare Corporation wysdom.com