The English POS Model from the SourceForge download page
uses the Penn Treebank Tag Set.

Here is a link which list all tags:
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html

Jörn

On 10/11/11 6:56 AM, Fotiadis, Konstantinos wrote:
I am looking around the definition and have not found the definitions for the 
POS tags.

Can you help me with these?

Example:
"This is not a long sentence. I like turtles. Happiness is great!"

I then call SentenceDetectorME to detect sentences. Then loop through the 
sentences and call Tokenizer on each one. I then pass the token String array to 
POSTaggerME to get the POS. Here is my output:

Number of Sentences=3
SENTENCE_ID=1 - TOKENS=7 - This is not a long sentence.
   TOKEN_ID=1 - POS=DT - This
   TOKEN_ID=2 - POS=VBZ - is
   TOKEN_ID=3 - POS=RB - not
   TOKEN_ID=4 - POS=DT - a
   TOKEN_ID=5 - POS=JJ - long
   TOKEN_ID=6 - POS=NN - sentence
   TOKEN_ID=7 - POS=. - .
SENTENCE_ID=2 - TOKENS=4 - I like turtles.
   TOKEN_ID=1 - POS=PRP - I
   TOKEN_ID=2 - POS=IN - like
   TOKEN_ID=3 - POS=NNS - turtles
   TOKEN_ID=4 - POS=. - .
SENTENCE_ID=3 - TOKENS=4 - Happiness is great!
   TOKEN_ID=1 - POS=NNP - Happiness
   TOKEN_ID=2 - POS=VBZ - is
   TOKEN_ID=3 - POS=JJ - great
   TOKEN_ID=4 - POS=. - !


Just curious of the definitions...

Thanks, Kosta

Reply via email to