Hello,

Thanks for the reply. To clarify, the abbreviations parameter being passed to 
SentenceModel is not currently used, right? I would like to contribute some 
features if possible. Is this the correct forum for that?

Regards,
Sampath.


----- Original Message ----
From: Jörn Kottmann <[email protected]>
To: [email protected]
Sent: Tue, February 8, 2011 3:53:09 PM
Subject: Re: Abbreviations in SentDetector

On 2/8/11 8:54 AM, Sampath Kumar wrote:
> Hi,
> 
> I looked at the code and could not find the place where the abbreviations in
> SentenceDetector is being used. Is it used for detecting sentences. Also could
> someone let me know the format for inputting the same.

It might be an advantage to pass an abbreviations dictionary to the sentence
detector, right now we do not support this. The abbreviations could be used
to generate more features. A dictionary could also be used to not split inside
certain tokens, e.g. shortcuts which contains dots or certain domain
specific symbols with dots inside.
Abbreviations which end with a dot are sometimes used to mark a sentence,
end, just not splitting after these might reduce the accuracy of the sentence
detector.

Jörn




Reply via email to