I have been using these: http://bulba.sdsu.edu/jeanette/thesis/PennTags.html
On Wed, Oct 12, 2011 at 1:32 AM, Jörn Kottmann <kottm...@gmail.com> wrote: > Sorry, looks like the link I posted here does not match up with the Penn > Treebank Tag Set we > are using. Some of the tags are not even included in our training data. > > I went on and tried to find a better description and looked at this page: > http://www.cis.upenn.edu/~**treebank/home.html<http://www.cis.upenn.edu/~treebank/home.html> > > The tagset described here seems to match with the tags we are using: > ftp://ftp.cis.upenn.edu/pub/**treebank/doc/tagguide.ps.gz<ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz> > > But there seem to be some missing in the list, especially tags for > punctuation, brackets, etc. > Does someone know where a complete list with descriptions of the Penn > Treebank > tags can be found? > > Jörn > > On 10/11/11 1:52 PM, Fotiadis, Konstantinos wrote: > >> After looking again, I think I probably didn't have them matched up >> perfectly. I just did a sort in Excel, and realized that maybe this would >> make more sense? (Sorry, been up for 51 hours straight!) >> >> >> Penn Treebank Tag Set Definition >> >> Produced by OpenNLP API >> >> Tag >> >> Definition >> >> Tag >> >> NNS >> >> Noun, plural >> >> NNS >> >> NP >> >> Proper noun, singular >> >> NNP >> >> NPS >> >> Proper noun, plural >> >> NNPS >> >> PP >> >> Personal pronoun >> >> PRP >> >> PP$ >> >> Possessive pronoun >> >> PRP$ >> >> >> >> >> Is that right? >> >> >> >> -----Original Message----- >> From: Jörn Kottmann [mailto:kottm...@gmail.com] >> Sent: Tuesday, October 11, 2011 6:13 AM >> To: opennlp-users@incubator.**apache.org<opennlp-users@incubator.apache.org> >> Subject: EXTERNAL: Re: POS Tags >> >> >> >> The English POS Model from the SourceForge download page >> >> uses the Penn Treebank Tag Set. >> >> >> >> Here is a link which list all tags: >> >> http://www.ims.uni-stuttgart.**de/projekte/CorpusWorkbench/** >> CQP-HTMLDemo/PennTreebankTS.**html<http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html> >> >> >> >> Jörn >> >> >> >> On 10/11/11 6:56 AM, Fotiadis, Konstantinos wrote: >> >> I am looking around the definition and have not found the definitions for >>> the POS tags. >>> Can you help me with these? >>> Example: >>> "This is not a long sentence. I like turtles. Happiness is great!" >>> I then call SentenceDetectorME to detect sentences. Then loop through the >>> sentences and call Tokenizer on each one. I then pass the token String array >>> to POSTaggerME to get the POS. Here is my output: >>> Number of Sentences=3 >>> SENTENCE_ID=1 - TOKENS=7 - This is not a long sentence. >>> TOKEN_ID=1 - POS=DT - This >>> TOKEN_ID=2 - POS=VBZ - is >>> TOKEN_ID=3 - POS=RB - not >>> TOKEN_ID=4 - POS=DT - a >>> TOKEN_ID=5 - POS=JJ - long >>> TOKEN_ID=6 - POS=NN - sentence >>> TOKEN_ID=7 - POS=. - . >>> SENTENCE_ID=2 - TOKENS=4 - I like turtles. >>> TOKEN_ID=1 - POS=PRP - I >>> TOKEN_ID=2 - POS=IN - like >>> TOKEN_ID=3 - POS=NNS - turtles >>> TOKEN_ID=4 - POS=. - . >>> SENTENCE_ID=3 - TOKENS=4 - Happiness is great! >>> TOKEN_ID=1 - POS=NNP - Happiness >>> TOKEN_ID=2 - POS=VBZ - is >>> TOKEN_ID=3 - POS=JJ - great >>> TOKEN_ID=4 - POS=. - ! >>> Just curious of the definitions... >>> Thanks, Kosta >>> >> >> >> > -- Gyuri 274 44 98 06 30 5888 744