Try using the version from trunk. There was a bug that can be related to this.
I am away from a place were I can check the source. Regards William Sent from mobile. Em 04/01/2013 23:16, "James Kosin" <[email protected]> escreveu: > On 1/4/2013 10:32 AM, Adithya .R wrote: > >> Hi All, >> I am trying to parse a string into sentences using the sentence detector. >> >> The data is in english, UTF-8 format, and has many abbreviations (medical >> text). >> >> I need the sentence detector to accept a list of abbreviations. I am using >> the Dictionary Class like this: >> >> Dictionary abbrDict = new Dictionary(); >> >> try { >> //abbrDict = new Dictionary( FileInputStream(new >> File(pathToAbbr))); >> abbrString = readFile(pathToAbbr).** >> replaceAll("(\\t|\\r?\\n)+", >> " "); >> for (String abbr : abbrString.split(" ")) { >> StringList abbrList = new StringList(abbr); >> System.out.println( abbrList.getToken(0) ); >> abbrDict.put(abbrList); >> >> } >> } catch (Exception ex) { >> ex.printStackTrace(); >> } >> >> System.out.println( abbrDict.size() + " is the size of dict " + >> abbrDict.toString() ); >> >> ______________________________**______________________________** >> ___________________ >> >> The out put of the last line looks like this: >> 9 is the size of dict [[L.M.P.], [D.O.A.], [L.S.A.], [R.S.T.], [A.G.A.], >> [R.F.P.], [R.S.P.], [S.L.P.], [R.F.A.]] >> >> My question is is this the right way to do it? If yes, how come the >> sentence detector still does not split sentences properly with these >> abbreviations. >> >> Any help would be appreciated. >> >> Adi >> >> Adi, > > Which sentence detector are you trying to use? > > I've been able to train the sentence detector model with many sentences > and it has managed to figure out how to handle the abbreviations... like > Inc., etc., and others. > > James >
