sentence detector with abbreviations not working

Adithya .R Fri, 04 Jan 2013 07:48:59 -0800

Hi All,
I am trying to parse a string into sentences using the sentence detector.


The data is in english, UTF-8 format, and has many abbreviations (medical
text).

I need the sentence detector to accept a list of abbreviations. I am using
the Dictionary Class like this:

Dictionary abbrDict = new Dictionary();

        try {
            //abbrDict = new Dictionary( FileInputStream(new
File(pathToAbbr)));
            abbrString = readFile(pathToAbbr).replaceAll("(\\t|\\r?\\n)+",
" ");
            for (String abbr : abbrString.split(" ")) {
                StringList abbrList = new StringList(abbr);
                System.out.println( abbrList.getToken(0) );
                abbrDict.put(abbrList);

            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }

        System.out.println( abbrDict.size() + " is the size of dict "  +
abbrDict.toString() );

_______________________________________________________________________________

The out put of the last line looks like this:
9 is the size of dict [[L.M.P.], [D.O.A.], [L.S.A.], [R.S.T.], [A.G.A.],
[R.F.P.], [R.S.P.], [S.L.P.], [R.F.A.]]

My question is is this the right way to do it? If yes, how come the
sentence detector still does not split sentences properly with these
abbreviations.

Any help would be appreciated.

Adi

sentence detector with abbreviations not working

Reply via email to