Re: Lightweight detection of whether a keyword is CJK or not (language detection)

2013-03-11 Thread Gili Nachum
This character lies in the CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A block. Added extensions detection, I assume (not really knowing) that all of these characters are not phonetic as well. import java.lang.Character.UnicodeBlock; import java.util.Arrays; import java.util.HashSet; import java.util.Set; i

Re: Lightweight detection of whether a keyword is CJK or not (language detection)

2013-03-10 Thread Trejkaz
On Sun, Mar 10, 2013 at 8:19 PM, Gili Nachum wrote: > Answering myself for next generations' sake. > Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job. How about 㒨? TX - To unsubscribe, e-mail: java-user-unsubscr...@lu

Re: Lightweight detection of whether a keyword is CJK or not (language detection)

2013-03-10 Thread Gili Nachum
Answering myself for next generations' sake. Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS does the job. Example: import junit.framework.Assert; import org.junit.Test; public class DetectCJK { @Test public void test1() { Assert.assertEquals(Character.UnicodeBlock.BASIC_LATIN, Ch