Status: New
Owner: ----

New issue 3779 by [email protected]: isUnicodeStringWithCJK returns false on a string containing Kanji
http://code.google.com/p/pharo/issues/detail?id=3779

Pharo image: Pharo
Pharo core version: Pharo1.2rc2 build #12336
Virtual machine used: Pharo 1.2's One-click Cog

Steps to reproduce:

'In Japanese, Japanese is written 日本語' isUnicodeStringWithCJK. " returns false "
'日本語' isUnicodeStringWithCJK. " also returns false "

I'm pretty sure it should return true, since the both strings contain Kanji.

Contrast the behavior with:

'In Japanese, Japanese is written 日本語' anySatisfy: [ :c | Unicode isUnifiedKanji: c charCode ]. " returns true "

I think the cause is that #isUnicodeStringWithCJK calls both #isUnifiedKanji: and #isTraditionalDomestic and, to me at least, it looks like #isTraditionalDomestic is wrong.

My guess is that #isTraditionalDomestic did something sensible back in Squeak during the transition to Unicode WideStrings, but now is simply testing the wrong thing. Hard to tell not knowing the history though.

My sugestion is to rewrite #isUnicodeStringWithCJK purely in terms of

    anySatisfy: [ :c | Unicode isUnifiedKanji: c charCode ]

unless someone can figure out when the system is using a EncodedCharSet subclass that isn't Unicode. I, unfortunately, don't have any understanding currently of where EncodedCharSet's fit into the system. Hopefully someone who knows more about them then me can decide this straight away.


Reply via email to