Java regex vs. Unicode TR#18 vs. ICU
Hello, Someone on the ICU team recently compared the use of \w between ICU, Java, and Unicode TR#18 http://www.unicode.org/reports/tr18/#Compatibility_Properties . The results are in the following ICU bug http://bugs.icu-project.org/trac/ticket/10006. A question for core-libs-dev is, does Java plan to change the semantics of \w to match TR#18's list? Thanks, Steven
Re: Java regex vs. Unicode TR#18 vs. ICU
On 03/06/2013 12:44 PM, Steven R. Loomis wrote: Hello, Someone on the ICU team recently compared the use of \w between ICU, Java, and Unicode TR#18 http://www.unicode.org/reports/tr18/#Compatibility_Properties . The results are in the following ICU bug http://bugs.icu-project.org/trac/ticket/10006. A question for core-libs-dev is, does Java plan to change the semantics of \w to match TR#18's list? It appears the standard has just added one more entry \p{Join_Control} during their last update :-( I may consider to update the spec/impl to match that, I would assume there is no any jdk7 application really has dependency on the updated \w (in jdk7). -Sherman
Re: Java regex vs. Unicode TR#18 vs. ICU
On 3/6/13 1:06 PM, Xueming Shen wrote: On 03/06/2013 12:44 PM, Steven R. Loomis wrote: Hello, Someone on the ICU team recently compared the use of \w between ICU, Java, and Unicode TR#18 http://www.unicode.org/reports/tr18/#Compatibility_Properties . The results are in the following ICU bug http://bugs.icu-project.org/trac/ticket/10006. A question for core-libs-dev is, does Java plan to change the semantics of \w to match TR#18's list? It appears the standard has just added one more entry \p{Join_Control} during their last update :-( I may consider to update the spec/impl to match that, I would assume there is no any jdk7 application really has dependency on the updated \w (in jdk7). -Sherman Thanks, Sherman Do you want to open a bug to track this? You can reference the above URLs Steven