You can use an ICU RuleBasedBreakIterator with custom rules. If you like to try that, it would be best to join the icu-support mailing list.
http://userguide.icu-project.org/boundaryanalysis http://icu-project.org/apiref/icu4j/com/ibm/icu/text/RuleBasedBreakIterator.html (There is also a C++ version, and a C wrapper.) http://site.icu-project.org/contacts ICU's grapheme cluster break rules: http://bugs.icu-project.org/trac/browser/icu/trunk/source/data/brkitr/char.txt Best regards, markus

