Re: sentence detector with abbreviations not working

William Colen Sat, 05 Jan 2013 02:21:30 -0800

Try using the version from trunk. There was a bug that can be related to
this.


I am away from a place were I can check the source.

Regards
William

Sent from mobile.
Em 04/01/2013 23:16, "James Kosin" <[email protected]> escreveu:

> On 1/4/2013 10:32 AM, Adithya .R wrote:
>
>> Hi All,
>> I am trying to parse a string into sentences using the sentence detector.
>>
>> The data is in english, UTF-8 format, and has many abbreviations (medical
>> text).
>>
>> I need the sentence detector to accept a list of abbreviations. I am using
>> the Dictionary Class like this:
>>
>> Dictionary abbrDict = new Dictionary();
>>
>>          try {
>>              //abbrDict = new Dictionary( FileInputStream(new
>> File(pathToAbbr)));
>>              abbrString = readFile(pathToAbbr).**
>> replaceAll("(\\t|\\r?\\n)+",
>> " ");
>>              for (String abbr : abbrString.split(" ")) {
>>                  StringList abbrList = new StringList(abbr);
>>                  System.out.println( abbrList.getToken(0) );
>>                  abbrDict.put(abbrList);
>>
>>              }
>>          } catch (Exception ex) {
>>              ex.printStackTrace();
>>          }
>>
>>          System.out.println( abbrDict.size() + " is the size of dict "  +
>> abbrDict.toString() );
>>
>> ______________________________**______________________________**
>> ___________________
>>
>> The out put of the last line looks like this:
>> 9 is the size of dict [[L.M.P.], [D.O.A.], [L.S.A.], [R.S.T.], [A.G.A.],
>> [R.F.P.], [R.S.P.], [S.L.P.], [R.F.A.]]
>>
>> My question is is this the right way to do it? If yes, how come the
>> sentence detector still does not split sentences properly with these
>> abbreviations.
>>
>> Any help would be appreciated.
>>
>> Adi
>>
>>  Adi,
>
> Which sentence detector are you trying to use?
>
> I've been able to train the sentence detector model with many sentences
> and it has managed to figure out how to handle the abbreviations... like
> Inc., etc., and others.
>
> James
>

Re: sentence detector with abbreviations not working

Reply via email to