I have been using these:
http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

On Wed, Oct 12, 2011 at 1:32 AM, Jörn Kottmann <kottm...@gmail.com> wrote:

> Sorry, looks like the link I posted here does not match up with the Penn
> Treebank Tag Set we
> are using. Some of the tags are not even included in our training data.
>
> I went on and tried to find a better description and looked at this page:
> http://www.cis.upenn.edu/~**treebank/home.html<http://www.cis.upenn.edu/~treebank/home.html>
>
> The tagset described here seems to match with the tags we are using:
> ftp://ftp.cis.upenn.edu/pub/**treebank/doc/tagguide.ps.gz<ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz>
>
> But there seem to be some missing in the list, especially tags for
> punctuation, brackets, etc.
> Does someone know where a complete list with descriptions of the Penn
> Treebank
> tags can be found?
>
> Jörn
>
> On 10/11/11 1:52 PM, Fotiadis, Konstantinos wrote:
>
>> After looking again, I think I probably didn't have them matched up
>> perfectly. I just did a sort in Excel, and realized that maybe this would
>> make more sense? (Sorry, been up for 51 hours straight!)
>>
>>
>> Penn Treebank Tag Set Definition
>>
>> Produced by OpenNLP API
>>
>> Tag
>>
>> Definition
>>
>> Tag
>>
>> NNS
>>
>> Noun, plural
>>
>> NNS
>>
>> NP
>>
>> Proper noun, singular
>>
>> NNP
>>
>> NPS
>>
>> Proper noun, plural
>>
>> NNPS
>>
>> PP
>>
>> Personal pronoun
>>
>> PRP
>>
>> PP$
>>
>> Possessive pronoun
>>
>> PRP$
>>
>>
>>
>>
>> Is that right?
>>
>>
>>
>> -----Original Message-----
>> From: Jörn Kottmann [mailto:kottm...@gmail.com]
>> Sent: Tuesday, October 11, 2011 6:13 AM
>> To: opennlp-users@incubator.**apache.org<opennlp-users@incubator.apache.org>
>> Subject: EXTERNAL: Re: POS Tags
>>
>>
>>
>> The English POS Model from the SourceForge download page
>>
>> uses the Penn Treebank Tag Set.
>>
>>
>>
>> Here is a link which list all tags:
>>
>> http://www.ims.uni-stuttgart.**de/projekte/CorpusWorkbench/**
>> CQP-HTMLDemo/PennTreebankTS.**html<http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html>
>>
>>
>>
>> Jörn
>>
>>
>>
>> On 10/11/11 6:56 AM, Fotiadis, Konstantinos wrote:
>>
>>  I am looking around the definition and have not found the definitions for
>>> the POS tags.
>>> Can you help me with these?
>>> Example:
>>> "This is not a long sentence. I like turtles. Happiness is great!"
>>> I then call SentenceDetectorME to detect sentences. Then loop through the
>>> sentences and call Tokenizer on each one. I then pass the token String array
>>> to POSTaggerME to get the POS. Here is my output:
>>> Number of Sentences=3
>>> SENTENCE_ID=1 - TOKENS=7 - This is not a long sentence.
>>>    TOKEN_ID=1 - POS=DT - This
>>>    TOKEN_ID=2 - POS=VBZ - is
>>>    TOKEN_ID=3 - POS=RB - not
>>>    TOKEN_ID=4 - POS=DT - a
>>>    TOKEN_ID=5 - POS=JJ - long
>>>    TOKEN_ID=6 - POS=NN - sentence
>>>    TOKEN_ID=7 - POS=. - .
>>> SENTENCE_ID=2 - TOKENS=4 - I like turtles.
>>>    TOKEN_ID=1 - POS=PRP - I
>>>    TOKEN_ID=2 - POS=IN - like
>>>    TOKEN_ID=3 - POS=NNS - turtles
>>>    TOKEN_ID=4 - POS=. - .
>>> SENTENCE_ID=3 - TOKENS=4 - Happiness is great!
>>>    TOKEN_ID=1 - POS=NNP - Happiness
>>>    TOKEN_ID=2 - POS=VBZ - is
>>>    TOKEN_ID=3 - POS=JJ - great
>>>    TOKEN_ID=4 - POS=. - !
>>> Just curious of the definitions...
>>> Thanks, Kosta
>>>
>>
>>
>>
>


-- 
Gyuri
274 44 98
06 30 5888 744

Reply via email to