Re: Training NameFinder with large corpus

Jeffrey Zemerick Mon, 07 Oct 2013 05:19:55 -0700

Thanks. I used MapReduce to build the training input. I didn't realize that
the training can also be performed on Hadoop. Can I simply combine the
generated models at the completion of the job?



On Mon, Oct 7, 2013 at 8:00 AM, Miljana Mladenovic <[email protected]>wrote:

> Learn about map-reduce strategy over big data. For example:
> http://wiki.apache.org/hadoop/**MapReduce<http://wiki.apache.org/hadoop/MapReduce>
> Regards, Mixie
>
> On Mon, 07 Oct 2013 13:53:33 +0200, Jeffrey Zemerick <[email protected]>
> wrote:
>
>  Hi,
>>
>> I'm new to OpenNLP (and NLP in general) and I'm trying to train the
>> NameFinder on a large corpus (nearly 1 GB). After a few hours it will fail
>> with a GC overhead limit exception. Do you have any suggestions on how I
>> might could accomplish this? Is it possible to train the model on parts of
>> the input at a time? I tried increasing the memory available but that
>> seemed to just prolong the exception.
>>
>> Thanks for any help.
>>
>> Jeff
>>
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
>

Re: Training NameFinder with large corpus

Reply via email to