Different algorithms use different formats. In this patch to Solr, there is 
sample test data for the sentence detector, tokenizer, parts-of-speech, 
chunker, and named entity recognition tools.

https://issues.apache.org/jira/browse/LUCENE-2899

Unpack this as a patch directory, and ignore all of the "file not found" 
messages. In solr/contrib/opennlp/src/test-files/training/bin/trainall.sh is a 
shell script that turns all of these training files into miniature models. I 
created these only to create unit tests that work. There is no real training 
data here, but the files do show the formats.

Lance

----- Original Message -----
| From: "yuelin.sha" <[email protected]>
| To: "users" <[email protected]>
| Sent: Wednesday, October 24, 2012 9:16:38 PM
| Subject: Re:  Re: Can one add trained data into a existing model?
| 
| Hello, James
| your reply is very helpful. I will try to train a new model. but for
| now I haven't found much info on about the format of train data for
| opennlp. Can you give me some advises to let me know the train data
| format?
| 
| 2012-10-25
| 
| 
| 
| yuelin.sha
| 
| 
| 
| 发件人:James Kosin
| 发送时间:2012-10-25 11:51
| 主题:Re: Can one add trained data into a existing model?
| 收件人:"users"<[email protected]>
| 抄送:
| 
| Hello Juelin Sha,
| 
| 1)  Adding data to an existing model is not a trivial process.  It
| also
| isn't supported, or suggested you try or attempt to do so.
| 
| 2)  The sentence detection model is really quite easy.  And should be
| easily trained on only a few hundred really good samples of data.
|  I've
| got my own sentence detector model trained on only about 80 sentences
| and the model performs well for what I use it for.  I of course watch
| it
| and add new sentences as I find issues.
| 
| 3)  If you really need the detection in the final model, you may
| donate
| your sentences to the project (I hope you this means you are the
| owner
| of the sentences and that they are your work alone... copyright
| issues).  A single example sentence isn't going to be enough
| though...
| all the models are trained with several thousand sentences and have a
| default cut-off of 5 when training to eliminate rare exceptions.
|  But,
| for sentence detection it is mainly on the punctuation that
| determines
| the end of the sentence.
| 
| Hopefully this helps answer some of your questions.
| 
| James
| 
| On 10/24/2012 11:06 PM, yuelin.sha wrote:
| > Hello everyone,
| > 
| > We have bean using the offical english model for sentence
| > detection. but recently we want to train some more data into the
| > offical model. Is someone can give me a tip for this? I don't know
| > is such a work supported by opennlp.
| > 
| > thinks in advance.
| > 
| > 2012-10-25
| > 
| > 
| > 
| > yuelin.sha
| > 
| > --------------------------
| > Information contained in this e-mail (and any attachments) is
| > confidential and is intended for exclusive disclosure to the
| > addressee(s). Any unauthorized disclosure, reproduction,
| > distribution or other dissemination or use of this communication
| > is prohibited. If you have received this communication in error,
| > please delete this communication and contact us by replying to
| > this e-mail immediately.
| > 
| 
| --------------------------
| Information contained in this e-mail (and any attachments) is
| confidential and is intended for exclusive disclosure to the
| addressee(s). Any unauthorized disclosure, reproduction,
| distribution or other dissemination or use of this communication is
| prohibited. If you have received this communication in error, please
| delete this communication and contact us by replying to this e-mail
| immediately.
| 
| 

Reply via email to