The English POS Model from the SourceForge download page uses the Penn Treebank Tag Set.
Here is a link which list all tags: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQP-HTMLDemo/PennTreebankTS.html Jörn On 10/11/11 6:56 AM, Fotiadis, Konstantinos wrote:
I am looking around the definition and have not found the definitions for the POS tags. Can you help me with these? Example: "This is not a long sentence. I like turtles. Happiness is great!" I then call SentenceDetectorME to detect sentences. Then loop through the sentences and call Tokenizer on each one. I then pass the token String array to POSTaggerME to get the POS. Here is my output: Number of Sentences=3 SENTENCE_ID=1 - TOKENS=7 - This is not a long sentence. TOKEN_ID=1 - POS=DT - This TOKEN_ID=2 - POS=VBZ - is TOKEN_ID=3 - POS=RB - not TOKEN_ID=4 - POS=DT - a TOKEN_ID=5 - POS=JJ - long TOKEN_ID=6 - POS=NN - sentence TOKEN_ID=7 - POS=. - . SENTENCE_ID=2 - TOKENS=4 - I like turtles. TOKEN_ID=1 - POS=PRP - I TOKEN_ID=2 - POS=IN - like TOKEN_ID=3 - POS=NNS - turtles TOKEN_ID=4 - POS=. - . SENTENCE_ID=3 - TOKENS=4 - Happiness is great! TOKEN_ID=1 - POS=NNP - Happiness TOKEN_ID=2 - POS=VBZ - is TOKEN_ID=3 - POS=JJ - great TOKEN_ID=4 - POS=. - ! Just curious of the definitions... Thanks, Kosta