Hello Jörn,

While testing I think I found some issues:
Here is a made up sample sentence I tried just now to test punctuation :

"
Dr. George wrote this book; it's his second publication after publishing
tons of books such as "500 tips" and "kick by kick" on top of the list.
"


Tokenizer gives this:

"
Dr. George wrote this book ; it 's his second publication after publishing
tons of books such as "500 tips " and "kick by kick " on top of the list .
"

It seems Dr. and the first double quotes are not tokenized. I guess Dr.
should not be tokenized, while the double quotes are missed in this case.


Cheers,
Gyuri



On Mon, Aug 15, 2011 at 2:20 PM, György Chityil <gyorgy.chit...@gmail.com>wrote:

> Thanks Jörn, I was unaware I am supposed to tokenize first :)
>
> Just fed the essay straight to POSTagger.
>
> Will try to tokenize it now, and report back.
>
>
> On Mon, Aug 15, 2011 at 2:15 PM, Jörn Kottmann <kottm...@gmail.com> wrote:
>
>> On 8/15/11 2:00 PM, György Chityil wrote:
>>
>>> I noticed the POSTagger adds info to words next to a punctuation like
>>> this
>>>
>>> questions?_NN
>>> specifics,_NN
>>>
>>> I guess it should be like
>>>
>>> questions_NN?
>>> specifics_NN,
>>>
>>
>> It looks like you don't tokenize the input sentence correctly. Maybe you
>> can post
>> a little more context, then I can give you a better answer.
>>
>> Jörn
>>
>
>
>
> --
> Gyuri
> 274 44 98
> 06 30 5888 744
>
>


-- 
Gyuri
274 44 98
06 30 5888 744

Reply via email to