Re: abbdict format

Benedict Holland Thu, 13 Apr 2017 08:10:27 -0700

Hello William!

Thank you for the dictionary.

I am not looking specifically for ready-to-use models, or maybe I am, but
starting with all of the input files and generating example output. Given
that you do have ready to use models, when I use them in the context of
training against new documents, are the model input files overwritten or
updated?  This is important. I am guessing I will have to update names, and
to do so, I will use the TokenNameFinderTrainer. It accepts a model file
(en-ner*.bin) as an output. If the file exists, does it read in the model
file and update it accordingly or does it delete it?

The problem I am having is following along with the tutorials. I don't
think I have the input data. I do not know where to download it, what is
available, and how to get it into a format used by OpenNLP. For example,
en-sent.train is not available for download but many of the examples refer
to it. The names data requires a $25,000 subscription. It would be very
nice to have a replacement file, just as an example. It would also be nice
to have all of the example files in a single downloadable zip file with all
of the documents I need to run through the tutorial in a single location.
The script files are an excellent way to introduce this but I need data to
work with and I don't know where to get it or even how to create it.

It would also be extremely helpful to have links to good tutorials that you
think provide accurate information and a good description. There are many
online and available but I don't know which ones are good.

I don't mean to sound overly harsh or critiquing in a picky way. I think
this is an awesome project and I am thankful for it. I just think it could
be a bit clearer and presented in a different way, which would allow people
to grasp this product faster.

Thanks,
~Ben

On Wed, Apr 12, 2017 at 6:59 PM, William Colen <co...@apache.org> wrote:

> Hello, Ben,
>
> We have an example of an abbreviation dictionary in the tests:
> https://github.com/apache/opennlp/blob/master/opennlp-
> tools/src/test/resources/opennlp/tools/sentdetect/abb.xml
>
> Regarding ready-to-use models, we have many here: http://opennlp.
> sourceforge.net/models-1.5/
>
> If you need a tutorial, there are many online.
>
> Our docs are here. You can find code snippets and information how to use
> the command line interface.
> https://opennlp.apache.org/documentation/1.7.2/manual/opennlp.html
>
> Regards,
> William
>
> 2017-04-12 18:29 GMT-03:00 Benedict Holland <benedict.m.holl...@gmail.com>
> :
>
> > Hello All,
> >
> > I am getting into NLP for a project and this is the solution we are going
> > to use. I noticed that in many places there is something called the
> abbdict
> > flag but there is not a specification for it. I believe it is an xml
> > document. Could someone please provide a sample xml file and a brief
> > description of the file format?
> >
> > In addition, is there a quick guide on starting with text, going through
> > the various learning steps, example files, and expected output? I don't
> > mean the manual but more like a true beginners guide with all of the
> > example files and each of the commands run in a particular order and the
> > expected output? I noticed, for example, I cannot download a sentence
> > learning text en-sent.train because (I think) it is not free or can't be
> > distributed.
> >
> > It would be very helpful to provide .train files for each step of the
> > process, even as a simple example.
> >
> > Thanks,
> > ~Ben
> >
>

Re: abbdict format

Reply via email to