I am sorry for the imprecise information I sent before, in another thread. Actually the F-score I have is lower than 90%. I still developing it for my MsC thesis. Certainly it still have some bugs.
Although it is simple, and probably naïve, I think it is enough for what I need. Anyway I would be happy to receive comments, and I would appreciate if someone could point me similar work so I can compare with my results. I implemented a Portuguese shallow parser using the OpenNLP Chunker tool. To do that, I created a training corpus following the CoNLL2000 format, so I can train the chunker without changing the code: Os det|B-NP B-SUBJ fiscais n|I-NP I-SUBJ retirar v-inf|B-VP B-P propaganda n|B-NP B-ACC irregular adj|B-NP I-ACC . .|O O (The inspectors remove illegal advertising.) The first column is the token, the second the concatenation of the POS Tag and the Chunk tag, and the third column the clause tag. The heuristic to extract chunks and clauses from the tree where based o paper 'A Machine Learning Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero Santos and Ruy Milidiú). The corpus is the Brazilian portion of the Bosque corpus<http://www.linguateca.pt/floresta/ficheiros/gz/Bosque_CF_8.0.ad.txt.gz> (4212 sentences). The complete tagset can be found here: http://beta.visl.sdu.dk/visl/pt/info/symbolset-floresta.html I could get F1 > 90% only while finding predicator and subject. It is easy to find predicator (P), that is why I get 98%, but for the subject (SUBJ) I got only 80.86%. The overall result is 91.8% (10-fold cross-validation): Evaluated 4212 samples with 13522 entities; found: 12753 entities; correct: 12060. TOTAL: precision: 94,57%; recall: 89,19%; F1: 91,80%. P: precision: 98,82%; recall: 98,10%; F1: 98,46%. [target: 8196; tp: 8040; fp: 96] SUBJ: precision: 87,07%; recall: 75,48%; F1: 80,86%. [target: 5326; tp: 4020; fp: 597] If I add more clauses the overall F1 is 79.05% (10-fold cross-validation): Evaluated 4212 samples with 20847 entities; found: 18654 entities; correct: 15613. TOTAL: precision: 83,70%; recall: 74,89%; F1: 79,05%. P: precision: 98,78%; recall: 96,79%; F1: 97,78%. [target: 8199; tp: 7936; fp: 98] SUBJ: precision: 85,03%; recall: 79,56%; F1: 82,20%. [target: 5167; tp: 4111; fp: 724] ACC: precision: 63,85%; recall: 51,60%; F1: 57,08%. [target: 4663; tp: 2406; fp: 1362] SC: precision: 65,57%; recall: 46,05%; F1: 54,10%. [target: 1381; tp: 636; fp: 334] PIV: precision: 50,15%; recall: 38,91%; F1: 43,82%. [target: 1298; tp: 505; fp: 502] OC: precision: 58,62%; recall: 16,83%; F1: 26,15%. [target: 101; tp: 17; fp: 12] DAT: precision: 18,18%; recall: 5,26%; F1: 8,16%. [target: 38; tp: 2; fp: 9] P: predicator SUBJ: subject ACC: direct object SC: subject complement PIV: prepositional object OC: object complement DAT: dative object Again, I would like comments on this approach. Is it useful for somebody else? How can I improve it? Thank you, William On Sun, Jan 8, 2012 at 8:54 PM, william.co...@gmail.com < william.co...@gmail.com> wrote: > The documentation is here: > http://incubator.apache.org/opennlp/documentation/1.5.2-incubating/manual/opennlp.html#tools.chunker > > I implemented a Portuguese shallow parser using the Chunker, and it is > performing good enough (F1 > 90%). > > First I run POS Tagger and after the Chunker to find noun and verb phrases. > > Finally I run another chunker which model I trained to find subject, verb > and object. I concatenate the POS Tag and the phrase tag and used it in the > POS Tag field of the chunker. > > > > On Sun, Jan 8, 2012 at 8:37 PM, Sina Bahram <sbah...@nc.rr.com> wrote: > >> >> Incidentally, I am exploring this JavaDoc in the mean-time: >> >> http://opennlp.sourceforge.net/api/opennlp/tools/chunker/ChunkerME.html >> >> not sure if that's the latest/greatest way to do it? >> >> Website: www.SinaBahram.com >> Twitter: @SinaBahram >> >> >> -----Original Message----- >> From: Sina Bahram [mailto:sbah...@nc.rr.com] >> Sent: Sunday, January 08, 2012 5:36 PM >> To: opennlp-users@incubator.apache.org; johnstew...@aya.yale.edu >> Subject: RE: deriving commands from sentences >> >> Also, is there documentation on the chunker API? On the manual, it says >> "// todo" >> >> Take care, >> Sina >> >> Website: www.SinaBahram.com >> Twitter: @SinaBahram >> >> >> -----Original Message----- >> From: John Stewart [mailto:cane.c...@gmail.com] >> Sent: Sunday, January 08, 2012 4:20 PM >> To: opennlp-users@incubator.apache.org >> Subject: Re: deriving commands from sentences >> >> I would use the chunker to obtain verb and noun phrases from the >> input. You can then look for specific verbs and nouns you're >> interested in. >> >> jds >> >> On Sun, Jan 8, 2012 at 4:05 PM, Sina Bahram <sbah...@nc.rr.com> wrote: >> > Hi all, >> > >> > I just joined the list after reading through the OpenNlp manual and >> playing with some code. >> > >> > I think the tools provided by OpenNlp are fantastic; however my >> question is this. are there other collections of code, recipes, >> > papers, or any additional resources at all instructing one how to use >> these tools to achieve a specific purpose? >> > >> > For example, write now, I'd like to write a simple command parser. >> > >> > Maybe things like this: >> > >> > Make that bigger >> > Zoom in >> > Slice that 9 ways vertically >> > Give me 10 horizontal slices >> > Take me to Washington DC >> > Put North Carolina here >> > >> > How can I use the OpenNlp tools to do this? I understand that name >> finders can be used to find things like "North Carolina", >> > although I could also simply use a POS tagger, I'm guessing, and simply >> see what got marked as prp ... but how can I put together >> a >> > command parser, then start improving upon it through iteration, >> inclusion of other heuristics and algorithms, etc. etc. >> > >> > Can I use the parser to shortcut through some of this? for example, can >> I ask the parser what the direct object of the verb of a >> > sentence is, and expect it to reliably, within reason of course, give >> me the major verb (the action) and major direct object (the >> > subject, perhaps) of a given sentence? >> > >> > For example: >> > >> > Input: >> > Put North Carolina here >> > >> > >> > And then I can call my methods like this? >> > getMainAction() --> returns "put" >> > getMainSubject() --> returns "north Carolina" >> > >> > or whatever ... I understand it's maybe not that simple, but I'm simply >> wanting to know how and where to start, if that makes >> sense? >> > >> > Lastly, and definitely most importantly, thanks go to every single >> author and contributor of OpenNlp. I'm quite impressed just by >> > playing with examples. >> > >> > Thanks in advance for any help. >> > >> > take care, >> > Sina >> > >> > >> > >> > >> > >> > Website: www.SinaBahram.com >> > Twitter: @SinaBahram >> > >> > >> > >> >> >