I am using the default postagger tool.

I have many sentences like:

word1_pos word2_pos ...

I did not add anything. I know It is a java problem, but how is it possible
that 7GB cannot handle a training corpus of 1 GB ?



Il 03/Gen/2017 20:48, "William Colen" <william.co...@gmail.com> ha scritto:

> Review your context generator. Maybe it is getting too many features. Try
> to keep the strings small in the context generator.
>
>
> 2017-01-03 17:02 GMT-02:00 Damiano Porta <damianopo...@gmail.com>:
>
> > I always get Exception in thread "main" java.lang.OutOfMemoryError: GC
> > overhead limit exceeded
> > I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB for
> > training.
> >
> > Could the number of threads helps?
> >
> > 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>:
> >
> > > Ok, i think the best value is matching the number of CPU cores, right?
> > >
> > > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
> > dr...@mail.nih.gov>
> > > :
> > >
> > >> I do not believe the perceptron trainer is multithreaded.  But it
> should
> > >> be fast.
> > >>
> > >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote:
> > >>
> > >>     Hi WIlliam, thank you!
> > >>     Is there a similar thing for perceptron (perceptron sequence) too?
> > >>
> > >>     2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>:
> > >>
> > >>     > Damiano,
> > >>     >
> > >>     > If you are using Maxent, try TrainingParameters.THREADS_PARAM
> > >>     >
> > >>     > https://opennlp.apache.org/documentation/1.7.0/apidocs/
> > >>     > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR
> > >> EADS_PARAM
> > >>     >
> > >>     > William
> > >>     >
> > >>     > 2017-01-03 16:27 GMT-02:00 Damiano Porta <
> damianopo...@gmail.com
> > >:
> > >>     >
> > >>     > > I am training a new postagger and lemmatizer.
> > >>     > >
> > >>     > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
> > >>     > dr...@mail.nih.gov
> > >>     > > >:
> > >>     > >
> > >>     > > > Can you be a little more specific?  What trainer are you
> > using?
> > >>     > > > Thanks
> > >>     > > > Daniel
> > >>     > > >
> > >>     > > > On 1/3/17, 1:22 PM, "Damiano Porta" <damianopo...@gmail.com
> >
> > >> wrote:
> > >>     > > >
> > >>     > > >     Hello,
> > >>     > > >     I have a very very big training set, is there a way to
> > >> speed up the
> > >>     > > >     training process? I only have changed the Xmx option
> > inside
> > >>     > > bin/opennlp
> > >>     > > >
> > >>     > > >     Thanks
> > >>     > > >     Damiano
> > >>     > > >
> > >>     > > >
> > >>     > > >
> > >>     > >
> > >>     >
> > >>
> > >>
> > >>
> > >
> >
>

Reply via email to