Hi - Can you ensure that your training data is in format like mentioned in
wiki ? [0]

Like mentioned in wiki training should be something like this-

<START:person> Pierre Vinken <END> 61 years old , will join the board as a
nonexecutive director Nov. 29

Here Type of Entity is "person" and "Pierre Vinken" is one of the person in
training data.

I was looking at links you shared and your data looks in different format.
Can you ensure your eng.train is in above format?

I think you can write your own code to read training file and convert it
into OpenNLP format. Also look at [1] in case you can make use of some pre
trained model available for OpenNLP

HTH



[0] https://opennlp.apache.org/documentation/1.7.2/manual/opennl
p.html#tools.namefind.training
[1] http://opennlp.sourceforge.net/models-1.5/


--
Madhav Sharan


On Sun, Feb 26, 2017 at 9:42 PM, Madhvi Gupta <mgmahi....@gmail.com> wrote:

> Please let me know if anyone have any idea about this
>
> With Regards
> Madhvi Gupta
> *(Senior Software Engineer)*
>
> On Tue, Feb 21, 2017 at 10:51 AM, Madhvi Gupta <mgmahi....@gmail.com>
> wrote:
>
> > Hi Joern,
> >
> > Training data generated from reuters dataset is in the following format.
> > It has generated three files eng.train, eng.testa, eng.testb.
> >
> > A DT I-NP O
> > rare JJ I-NP O
> > early JJ I-NP O
> > handwritten JJ I-NP O
> > draft NN I-NP O
> > of IN I-PP O
> > a DT I-NP O
> > song NN I-NP O
> > by IN I-PP O
> > U.S. NNP I-NP I-LOC
> > guitar NN I-NP O
> > legend NN I-NP O
> > Jimi NNP I-NP I-PER
> >
> > Using this training data file when I ran the command:
> > ./opennlp TokenNameFinderTrainer -model en-ner-person.bin -lang en -data
> > /home/centos/ner/eng.train -encoding UTF-8
> >
> > It is giving me the following error:
> > ERROR: Not enough training data
> > The provided training data is not sufficient to create enough events to
> > train a model.
> > To resolve this error use more training data, if this doesn't help there
> > might
> > be some fundamental problem with the training data itself.
> >
> > The format required for training opennlp models is in the form of
> > sentences but training data prepared from reuters dataset is in the baove
> > said format. So please tell me how training data can be generated in the
> > required format or how the existing training data format can be used for
> > generating models.
> >
> > With Regards
> > Madhvi Gupta
> > *(Senior Software Engineer)*
> >
> > On Mon, Feb 20, 2017 at 5:52 PM, Joern Kottmann <kottm...@gmail.com>
> > wrote:
> >
> >> Please explain to us what is not working. Any error messages or
> >> exceptions?
> >>
> >> The name finder by default trains on the default format which you can
> see
> >> in the documentation link i shared.
> >>
> >> Jörn
> >>
> >> On Mon, Feb 20, 2017 at 6:04 AM, Madhvi Gupta <mgmahi....@gmail.com>
> >> wrote:
> >>
> >> > Hi Joern,
> >> >
> >> > I have got the data from the following link which consist of corpus of
> >> new
> >> > articles.
> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__trec.nis
> t.gov_data_reuters_reuters.html&d=DwIFaQ&c=clK7kQUTWtAVEOVIg
> vi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=lMnAkl
> nfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=0sEQ0deDkUi3w600Svja
> aKSVhtlEHEGzDh-l202X76o&e=
> >> >
> >> > Following the steps given in the below link I have created training
> and
> >> > test data but it is not working with the NameFinder of opennlp api.
> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.clip
> s.uantwerpen.be_conll2003_ner_000README&d=DwIFaQ&c=clK7kQUTW
> tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&
> m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=ijG9-HM4_WRl
> wIUM6VyvE0YB3arX5Z2BVN5SFKlmzN4&e=
> >> >
> >> > So can you please help me how to create training data out of that
> corpus
> >> > and use it to create name entity detection models?
> >> >
> >> > With Regards
> >> > Madhvi Gupta
> >> > *(Senior Software Engineer)*
> >> >
> >> > On Mon, Feb 20, 2017 at 1:00 AM, Joern Kottmann <kottm...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > to train the name finder you need training data that contains the
> >> > entities
> >> > > you would like to decect.
> >> > > Is that the case with the data you have?
> >> > >
> >> > > Take a look at our documentation:
> >> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__opennlp
> .apache.org_documentation_1.7.2_manual_&d=DwIFaQ&c=clK7kQUTW
> tAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&
> m=lMnAklnfFkmS3IfHhJy5PgR6CHe7-61J_5MAe3U8CJI&s=aLn09MB1cLHy
> ZI9a0NT3gLdj5ZNFrR_eg_PhHHQHYC4&e=
> >> > > opennlp.html#tools.namefind.training
> >> > >
> >> > > At the beginning of that section you can see how the data has to be
> >> > marked
> >> > > up.
> >> > >
> >> > > Please note you that you need many sentences to train the name
> finder.
> >> > >
> >> > > HTH,
> >> > > Jörn
> >> > >
> >> > >
> >> > > On Sat, Feb 18, 2017 at 11:28 AM, Madhvi Gupta <
> mgmahi....@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > I have got reuters data from NIST. Now I want to generate the
> >> training
> >> > > data
> >> > > > from that to create a model for detecting named entities. Can
> anyone
> >> > tell
> >> > > > me how the models can be generated from that.
> >> > > >
> >> > > > --
> >> > > > With Regards
> >> > > > Madhvi Gupta
> >> > > > *(Senior Software Engineer)*
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >>
> >
> >
>

Reply via email to