Thanks Michael, that is very helpful.  I'll discuss with my employers if we
may be able to create a subset training set for public distribution.


On Wed, Mar 13, 2013 at 4:34 PM, Michael Schmitz
<[email protected]>wrote:

> I ran the OpenNlp stock models on wikipedia text at one time.  You may be
> able to use this.
>
> https://gist.github.com/3291931
>
> If you create superior models without licensing restrictions, please share.
>
> Peace.  Michael
>
>
> On Wed, Mar 13, 2013 at 12:25 PM, John Helmsen <
> [email protected]> wrote:
>
> > Gentlemen and Ladies,
> >
> > Currently, my group is undertaking a project that involves performing
> > english understanding of sentence fragments.  While the Apache parser
> with
> > the pre-trained binary is very good, we anticipate the need to retrain
> the
> > parser eventually on our own data sets to handle special terms and
> > idiosyncrasies that may arise in our particular context.
> >
> > The best way to retrain the parser is to mix our parsing solutions in
> with
> > the existing parser training set, so that we enhance the already good
> > performance of the parser in the direction of our particular input.
> >
> > Unfortunately, it seems that the online documentation for
> > en-parser-chunking.bin does not include links to the training sets that
> > were used.  Do any of you good people know what these might be? Thanks!
> >
> > John Helmsen
> >
>

Reply via email to