Hi,
for removing "category X excluded licenses" from Apache OpenOffice I
replaced the formerly used LGPL licensed module i18nregexp with the
regular expression engine of module ICU which is already widely use in
OpenOffice.
The replacement fixes a lot of problems: e.g. in a text "abcabc" trying
to "find all backwards" for "b" resulted in it only finding the last
"b", now it actually finds all of them. It also introduces some changes,
e.g. i18nregexp had two modes "classic" and "extended" regexp whereas
the ICU based engine treats all patterns as extended-regexp.
I18nregexp used an approach where it transliterated and compared each
codepoint pair of the pattern and text string. The new engine does the
transliteration only once per pattern and text string. This is much
faster, but it only works because the transliteration was tweaked to
preserve the special regexp control characters.
The reporters of any issues in the lists below are encouraged to check
the problems they saw with the new engine.
https://issues.apache.org/ooo/buglist.cgi?quicksearch=regexp
https://issues.apache.org/ooo/buglist.cgi?quicksearch=regular\ expression
Please make sure to have the "More Options -> Regular Expressions"
checkbox activated for testing.
I'm afraid the regexp replacement resulted in changes mostly for
Japanese users, because there a lot of non-trivial transliterations are
active. For reference I'm enumerating the active rules:
"ProlongedSoundMark", "IterationMark", "Ignore-Width", "BaFa", "SeZe",
"HyuByu", "IandEfollowedByYa" and "KiKuFollowedBySa".
Herbert