Revision: 7132 http://languagetool.svn.sourceforge.net/languagetool/?rev=7132&view=rev Author: milek_pl Date: 2012-05-31 22:25:42 +0000 (Thu, 31 May 2012) Log Message: ----------- [en] fix word tokenization in "they had no use for the glottal stop?\226?\128?\148the first phoneme of the Phoenician pronunciation of the letter"
Modified Paths: -------------- trunk/JLanguageTool/src/java/org/languagetool/tokenizers/en/EnglishWordTokenizer.java Modified: trunk/JLanguageTool/src/java/org/languagetool/tokenizers/en/EnglishWordTokenizer.java =================================================================== --- trunk/JLanguageTool/src/java/org/languagetool/tokenizers/en/EnglishWordTokenizer.java 2012-05-31 20:35:03 UTC (rev 7131) +++ trunk/JLanguageTool/src/java/org/languagetool/tokenizers/en/EnglishWordTokenizer.java 2012-05-31 22:25:42 UTC (rev 7132) @@ -44,7 +44,7 @@ + "\u2028\u2029\u202a\u202b\u202c\u202d\u202e\u202f" + "\u205F\u2060\u2061\u2062\u2063\u206A\u206b\u206c\u206d" + "\u206E\u206F\u3000\u3164\ufeff\uffa0\ufff9\ufffa\ufffb" - + ",.;()[]{}!?:\"'’‘„“”…\\/\t\n", true); + + "—,.;()[]{}!?:\"'’‘„“”…\\/\t\n", true); while (st.hasMoreElements()) { tokens.add(st.nextToken()); } This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Languagetool-cvs mailing list Languagetool-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-cvs