Which OpenNLP version do you use? We improved the "space" handling in the sentence detector for 1.5.2, if you are still on 1.5.1, I suggest that you update.
Jörn On 1/4/12 12:20 PM, Svetoslav Marinov wrote:
Hi all, I have starting using OpenNLP with the available Swedish models. One thing I noticed is that the sentence detection model does not perform properly when the full-stop is immediately followed by a newline character and the next sentence start immediately after that. So the following example: --- Hunden blir hundstjärnan, Sirius. Artemis skyddade de gravida kvinnorna --- Will be segmented as: <S> Hunden blir hundstjärnan, Sirius.Artemis </S> <S> skyddade de gravida kvinnorna </S> I am curious if someone has experienced similar problems with Swedish or other languages. And any ideas why it is so? I wonder how one can alleviate this behaviour. One way is to train a new model but I doubt this will help. Or? Another way is to substitute all newline characters with spaces. I do concatenate all lines into a single string which I subsequently apply the sentence detection model to. Is this the way it should be done (if I read the documentation correctly). Best regards, Svetoslav