Re: Swedish sentence detection model not performing properly

Svetoslav Marinov Thu, 05 Jan 2012 01:59:07 -0800

Thanks Jörn, I did update to 1.5.2 but it still makes the same mistake. I
solve the problem by adding an extra white space between the lines.


Best, 
Svetoslav 

On 1/4/12 10:49 PM, "Jörn Kottmann" <kottm...@gmail.com> wrote:

>Which OpenNLP version do you use?
>
>We improved the "space" handling in the sentence detector for 1.5.2, if
>you are still on 1.5.1, I suggest that you update.
>
>Jörn
>
>On 1/4/12 12:20 PM, Svetoslav Marinov wrote:
>> Hi all,
>>
>> I have starting using OpenNLP with the available Swedish models. One
>>thing I noticed is that the sentence detection model does not perform
>>properly when the full-stop is immediately followed by a newline
>>character and the next sentence start immediately after that. So the
>>following example:
>> ---
>> Hunden blir hundstjärnan, Sirius.
>> Artemis skyddade de gravida kvinnorna
>> ---
>> Will be segmented as:
>>
>> <S>
>> Hunden blir hundstjärnan, Sirius.Artemis
>> </S>
>> <S>
>> skyddade de gravida kvinnorna
>> </S>
>>
>> I am curious if someone has experienced similar problems with Swedish
>>or other languages. And any ideas why it is so?
>>
>> I wonder how one can alleviate this behaviour. One way is to train a
>>new model but I doubt this will help. Or? Another way is to substitute
>>all newline characters with spaces. I do concatenate all lines into a
>>single string which I subsequently apply the sentence detection model
>>to. Is this the way it should be done (if I read the documentation
>>correctly).
>>
>> Best regards,
>>
>> Svetoslav
>>
>

Re: Swedish sentence detection model not performing properly

Reply via email to