After looking again, I think I probably didn't have them matched up perfectly. I just did a sort in Excel, and realized that maybe this would make more sense? (Sorry, been up for 51 hours straight!)
Penn Treebank Tag Set Definition Produced by OpenNLP API Tag Definition Tag NNS Noun, plural NNS NP Proper noun, singular NNP NPS Proper noun, plural NNPS PP Personal pronoun PRP PP$ Possessive pronoun PRP$ Is that right? -----Original Message----- From: Jörn Kottmann [mailto:kottm...@gmail.com] Sent: Tuesday, October 11, 2011 6:13 AM To: opennlp-users@incubator.apache.org Subject: EXTERNAL: Re: POS Tags The English POS Model from the SourceForge download page uses the Penn Treebank Tag Set. Here is a link which list all tags: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html Jörn On 10/11/11 6:56 AM, Fotiadis, Konstantinos wrote: > I am looking around the definition and have not found the definitions for the > POS tags. > > Can you help me with these? > > Example: > "This is not a long sentence. I like turtles. Happiness is great!" > > I then call SentenceDetectorME to detect sentences. Then loop through the > sentences and call Tokenizer on each one. I then pass the token String array > to POSTaggerME to get the POS. Here is my output: > > Number of Sentences=3 > SENTENCE_ID=1 - TOKENS=7 - This is not a long sentence. > TOKEN_ID=1 - POS=DT - This > TOKEN_ID=2 - POS=VBZ - is > TOKEN_ID=3 - POS=RB - not > TOKEN_ID=4 - POS=DT - a > TOKEN_ID=5 - POS=JJ - long > TOKEN_ID=6 - POS=NN - sentence > TOKEN_ID=7 - POS=. - . > SENTENCE_ID=2 - TOKENS=4 - I like turtles. > TOKEN_ID=1 - POS=PRP - I > TOKEN_ID=2 - POS=IN - like > TOKEN_ID=3 - POS=NNS - turtles > TOKEN_ID=4 - POS=. - . > SENTENCE_ID=3 - TOKENS=4 - Happiness is great! > TOKEN_ID=1 - POS=NNP - Happiness > TOKEN_ID=2 - POS=VBZ - is > TOKEN_ID=3 - POS=JJ - great > TOKEN_ID=4 - POS=. - ! > > > Just curious of the definitions... > > Thanks, Kosta