Hello, Thanks for the reply. To clarify, the abbreviations parameter being passed to SentenceModel is not currently used, right? I would like to contribute some features if possible. Is this the correct forum for that?
Regards, Sampath. ----- Original Message ---- From: Jörn Kottmann <[email protected]> To: [email protected] Sent: Tue, February 8, 2011 3:53:09 PM Subject: Re: Abbreviations in SentDetector On 2/8/11 8:54 AM, Sampath Kumar wrote: > Hi, > > I looked at the code and could not find the place where the abbreviations in > SentenceDetector is being used. Is it used for detecting sentences. Also could > someone let me know the format for inputting the same. It might be an advantage to pass an abbreviations dictionary to the sentence detector, right now we do not support this. The abbreviations could be used to generate more features. A dictionary could also be used to not split inside certain tokens, e.g. shortcuts which contains dots or certain domain specific symbols with dots inside. Abbreviations which end with a dot are sometimes used to mark a sentence, end, just not splitting after these might reduce the accuracy of the sentence detector. Jörn
