Hello, you can train a model only once. After it is trained it is not possible to continue with the training by adding more samples to it.
You need to create a stream of DocumentSample objects and a straight forward way to do that is to just collection them all in a collection and then create a stream from that collection. HTH, Jörn On Wed, Sep 9, 2015 at 9:51 PM, AJ Weber <awe...@comcast.net> wrote: > So I'm just getting started with openNLP and trying to spin-up the DocCat. > > I would like to process a series of files in batches to train the document > categorizer. > > I assume it is possible to loop through documents: > > 1) extract the text (will probably try Tika for this), and then > 2) send the DocumentSample to the categorizer to add to the model? > > I see how I can create a DocumentSample from a category (I will know this > as part of the batch args) and the extracted text. However, I can not > figure out how to incrementally add that sample to a new (or existing) > model for additional "training". > > Obviously, I would like to then save the model between batches so I can > either leverage it for categorization or incrementally add more Document > Sample's to it for further training at some later time. > > Does anyone have a java snippet I could look at to help me get started? > > Thank you! > > -AJ > >