I am training the model in this way:

opennlp POSTaggerTrainer -type maxent -model
/home/damiano/it-pos-maxent-new.bin -lang it -data
/home/damiano/postagger.train -encoding UTF-8

2017-01-03 21:01 GMT+01:00 Damiano Porta <damianopo...@gmail.com>:

> I am using the default postagger tool.
>
> I have many sentences like:
>
> word1_pos word2_pos ...
>
> I did not add anything. I know It is a java problem, but how is it
> possible that 7GB cannot handle a training corpus of 1 GB ?
>
>
>
> Il 03/Gen/2017 20:48, "William Colen" <william.co...@gmail.com> ha
> scritto:
>
>> Review your context generator. Maybe it is getting too many features. Try
>> to keep the strings small in the context generator.
>>
>>
>> 2017-01-03 17:02 GMT-02:00 Damiano Porta <damianopo...@gmail.com>:
>>
>> > I always get Exception in thread "main" java.lang.OutOfMemoryError: GC
>> > overhead limit exceeded
>> > I am using 5GB on Xmx for a 1GB training data...i will try adding 7GB
>> for
>> > training.
>> >
>> > Could the number of threads helps?
>> >
>> > 2017-01-03 19:57 GMT+01:00 Damiano Porta <damianopo...@gmail.com>:
>> >
>> > > Ok, i think the best value is matching the number of CPU cores, right?
>> > >
>> > > 2017-01-03 19:47 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
>> > dr...@mail.nih.gov>
>> > > :
>> > >
>> > >> I do not believe the perceptron trainer is multithreaded.  But it
>> should
>> > >> be fast.
>> > >>
>> > >> On 1/3/17, 1:44 PM, "Damiano Porta" <damianopo...@gmail.com> wrote:
>> > >>
>> > >>     Hi WIlliam, thank you!
>> > >>     Is there a similar thing for perceptron (perceptron sequence)
>> too?
>> > >>
>> > >>     2017-01-03 19:41 GMT+01:00 William Colen <co...@apache.org>:
>> > >>
>> > >>     > Damiano,
>> > >>     >
>> > >>     > If you are using Maxent, try TrainingParameters.THREADS_PARAM
>> > >>     >
>> > >>     > https://opennlp.apache.org/documentation/1.7.0/apidocs/
>> > >>     > opennlp-tools/opennlp/tools/util/TrainingParameters.html#THR
>> > >> EADS_PARAM
>> > >>     >
>> > >>     > William
>> > >>     >
>> > >>     > 2017-01-03 16:27 GMT-02:00 Damiano Porta <
>> damianopo...@gmail.com
>> > >:
>> > >>     >
>> > >>     > > I am training a new postagger and lemmatizer.
>> > >>     > >
>> > >>     > > 2017-01-03 19:24 GMT+01:00 Russ, Daniel (NIH/CIT) [E] <
>> > >>     > dr...@mail.nih.gov
>> > >>     > > >:
>> > >>     > >
>> > >>     > > > Can you be a little more specific?  What trainer are you
>> > using?
>> > >>     > > > Thanks
>> > >>     > > > Daniel
>> > >>     > > >
>> > >>     > > > On 1/3/17, 1:22 PM, "Damiano Porta" <
>> damianopo...@gmail.com>
>> > >> wrote:
>> > >>     > > >
>> > >>     > > >     Hello,
>> > >>     > > >     I have a very very big training set, is there a way to
>> > >> speed up the
>> > >>     > > >     training process? I only have changed the Xmx option
>> > inside
>> > >>     > > bin/opennlp
>> > >>     > > >
>> > >>     > > >     Thanks
>> > >>     > > >     Damiano
>> > >>     > > >
>> > >>     > > >
>> > >>     > > >
>> > >>     > >
>> > >>     >
>> > >>
>> > >>
>> > >>
>> > >
>> >
>>
>

Reply via email to