Hello Jörn, While testing I think I found some issues: Here is a made up sample sentence I tried just now to test punctuation :
" Dr. George wrote this book; it's his second publication after publishing tons of books such as "500 tips" and "kick by kick" on top of the list. " Tokenizer gives this: " Dr. George wrote this book ; it 's his second publication after publishing tons of books such as "500 tips " and "kick by kick " on top of the list . " It seems Dr. and the first double quotes are not tokenized. I guess Dr. should not be tokenized, while the double quotes are missed in this case. Cheers, Gyuri On Mon, Aug 15, 2011 at 2:20 PM, György Chityil <gyorgy.chit...@gmail.com>wrote: > Thanks Jörn, I was unaware I am supposed to tokenize first :) > > Just fed the essay straight to POSTagger. > > Will try to tokenize it now, and report back. > > > On Mon, Aug 15, 2011 at 2:15 PM, Jörn Kottmann <kottm...@gmail.com> wrote: > >> On 8/15/11 2:00 PM, György Chityil wrote: >> >>> I noticed the POSTagger adds info to words next to a punctuation like >>> this >>> >>> questions?_NN >>> specifics,_NN >>> >>> I guess it should be like >>> >>> questions_NN? >>> specifics_NN, >>> >> >> It looks like you don't tokenize the input sentence correctly. Maybe you >> can post >> a little more context, then I can give you a better answer. >> >> Jörn >> > > > > -- > Gyuri > 274 44 98 > 06 30 5888 744 > > -- Gyuri 274 44 98 06 30 5888 744