The error message indicates there is not enough virtual memory on the
machine or perhaps that the machine is not managing the allocated memory
effectively.

>From my experience increasing the number of threads will speed up the
training of a model; however, increasing this parameter can cause this
error message to appear when the training corpus reaches a certain critical
size. Decreasing the number of threads at that point has enabled me to work
around this issue.

It appears that there is a trade-off at some point between a faster
compilation speed and an effective memory management for the training
process on a given machine.

Isolating the precise causes of this behaviour and determining how to best
control it would be useful research.

For a discussion of Java memory management see this article:
http://stackoverflow.com/questions/25033458/memory-consumed-by-a-thread


D.S.


On Tue, Jan 3, 2017 at 2:02 PM, Damiano Porta <damianopo...@gmail.com>
wrote:

> I always get Exception in thread "main" java.lang.OutOfMemoryError: GC
> overhead limit exceeded
> I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB for
> training.
>
> Could the number of threads helps?
>
> 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>:
>
> > Ok, i think the best value is matching the number of CPU cores, right?
> >
> > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
> dr...@mail.nih.gov>
> > :
> >
> >> I do not believe the perceptron trainer is multithreaded.  But it should
> >> be fast.
> >>
> >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote:
> >>
> >>     Hi WIlliam, thank you!
> >>     Is there a similar thing for perceptron (perceptron sequence) too?
> >>
> >>     2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>:
> >>
> >>     > Damiano,
> >>     >
> >>     > If you are using Maxent, try TrainingParameters.THREADS_PARAM
> >>     >
> >>     > https://opennlp.apache.org/documentation/1.7.0/apidocs/
> >>     > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR
> >> EADS_PARAM
> >>     >
> >>     > William
> >>     >
> >>     > 2017-01-03 16:27 GMT-02:00 Damiano Porta <damianopo...@gmail.com
> >:
> >>     >
> >>     > > I am training a new postagger and lemmatizer.
> >>     > >
> >>     > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
> >>     > dr...@mail.nih.gov
> >>     > > >:
> >>     > >
> >>     > > > Can you be a little more specific?  What trainer are you
> using?
> >>     > > > Thanks
> >>     > > > Daniel
> >>     > > >
> >>     > > > On 1/3/17, 1:22 PM, "Damiano Porta" <damianopo...@gmail.com>
> >> wrote:
> >>     > > >
> >>     > > >     Hello,
> >>     > > >     I have a very very big training set, is there a way to
> >> speed up the
> >>     > > >     training process? I only have changed the Xmx option
> inside
> >>     > > bin/opennlp
> >>     > > >
> >>     > > >     Thanks
> >>     > > >     Damiano
> >>     > > >
> >>     > > >
> >>     > > >
> >>     > >
> >>     >
> >>
> >>
> >>
> >
>



-- 
David Sanderson
Natural Language Processing Developer
CrowdCare Corporation
wysdom.com

Reply via email to