Hello,

you can train a model only once. After it is trained it is not possible to
continue with the training by adding more samples to it.

You need to create a stream of DocumentSample objects and a straight
forward way to do that is to just collection them all in a collection and
then create a stream from that collection.

HTH,
Jörn

On Wed, Sep 9, 2015 at 9:51 PM, AJ Weber <awe...@comcast.net> wrote:

> So I'm just getting started with openNLP and trying to spin-up the DocCat.
>
> I would like to process a series of files in batches to train the document
> categorizer.
>
> I assume it is possible to loop through documents:
>
> 1) extract the text (will probably try Tika for this), and then
> 2) send the DocumentSample to the categorizer to add to the model?
>
> I see how I can create a DocumentSample from a category (I will know this
> as part of the batch args) and the extracted text.  However, I can not
> figure out how to incrementally add that sample to a new (or existing)
> model for additional "training".
>
> Obviously, I would like to then save the model between batches so I can
> either leverage it for categorization or incrementally add more Document
> Sample's to it for further training at some later time.
>
> Does anyone have a java snippet I could look at to help me get started?
>
> Thank you!
>
> -AJ
>
>

Reply via email to