Hello William! Thank you for the dictionary.
I am not looking specifically for ready-to-use models, or maybe I am, but starting with all of the input files and generating example output. Given that you do have ready to use models, when I use them in the context of training against new documents, are the model input files overwritten or updated? This is important. I am guessing I will have to update names, and to do so, I will use the TokenNameFinderTrainer. It accepts a model file (en-ner*.bin) as an output. If the file exists, does it read in the model file and update it accordingly or does it delete it? The problem I am having is following along with the tutorials. I don't think I have the input data. I do not know where to download it, what is available, and how to get it into a format used by OpenNLP. For example, en-sent.train is not available for download but many of the examples refer to it. The names data requires a $25,000 subscription. It would be very nice to have a replacement file, just as an example. It would also be nice to have all of the example files in a single downloadable zip file with all of the documents I need to run through the tutorial in a single location. The script files are an excellent way to introduce this but I need data to work with and I don't know where to get it or even how to create it. It would also be extremely helpful to have links to good tutorials that you think provide accurate information and a good description. There are many online and available but I don't know which ones are good. I don't mean to sound overly harsh or critiquing in a picky way. I think this is an awesome project and I am thankful for it. I just think it could be a bit clearer and presented in a different way, which would allow people to grasp this product faster. Thanks, ~Ben On Wed, Apr 12, 2017 at 6:59 PM, William Colen <co...@apache.org> wrote: > Hello, Ben, > > We have an example of an abbreviation dictionary in the tests: > https://github.com/apache/opennlp/blob/master/opennlp- > tools/src/test/resources/opennlp/tools/sentdetect/abb.xml > > Regarding ready-to-use models, we have many here: http://opennlp. > sourceforge.net/models-1.5/ > > If you need a tutorial, there are many online. > > Our docs are here. You can find code snippets and information how to use > the command line interface. > https://opennlp.apache.org/documentation/1.7.2/manual/opennlp.html > > Regards, > William > > 2017-04-12 18:29 GMT-03:00 Benedict Holland <benedict.m.holl...@gmail.com> > : > > > Hello All, > > > > I am getting into NLP for a project and this is the solution we are going > > to use. I noticed that in many places there is something called the > abbdict > > flag but there is not a specification for it. I believe it is an xml > > document. Could someone please provide a sample xml file and a brief > > description of the file format? > > > > In addition, is there a quick guide on starting with text, going through > > the various learning steps, example files, and expected output? I don't > > mean the manual but more like a true beginners guide with all of the > > example files and each of the commands run in a particular order and the > > expected output? I noticed, for example, I cannot download a sentence > > learning text en-sent.train because (I think) it is not free or can't be > > distributed. > > > > It would be very helpful to provide .train files for each step of the > > process, even as a simple example. > > > > Thanks, > > ~Ben > > >