NEWBIE: how to add DocumentSample's to model incrementally?

AJ Weber Wed, 09 Sep 2015 12:52:23 -0700

So I'm just getting started with openNLP and trying to spin-up the DocCat.

I would like to process a series of files in batches to train thedocument categorizer.


I assume it is possible to loop through documents:

1) extract the text (will probably try Tika for this), and then
2) send the DocumentSample to the categorizer to add to the model?

I see how I can create a DocumentSample from a category (I will knowthis as part of the batch args) and the extracted text. However, I cannot figure out how to incrementally add that sample to a new (orexisting) model for additional "training".

Obviously, I would like to then save the model between batches so I caneither leverage it for categorization or incrementally add more DocumentSample's to it for further training at some later time.


Does anyone have a java snippet I could look at to help me get started?

Thank you!

-AJ

NEWBIE: how to add DocumentSample's to model incrementally?

Reply via email to