hi,
In java we can sort Pinyin like this: (Sun provide a Comparator)
public int compare(String o1, String o2) {
return Collator.getInstance(Locale.CHINESE).compare(o1, o2);
}
But it's got some flaws.
You know there are so many homophones in Chinese.
But in Sun's Comparator they don't equals each other.
Assert.assertTrue(comparator.compare("怕", "帕") != 0); //怕 pà 帕 pà
And the unfamiliar Chinese character never sort userful in this Comparator.
Some like '怡'.
Assert.assertTrue(comparator.compare("怡", "张") > 0); //怡 yí 张 zhāng
With luck, there is a open source project at sf .
http://pinyin4j.sourceforge.net/
So we can Convert Chinese to Pinyin.Then it will be easy.
I can provide a Java code ,some not coding by myself.
-------------------------------------------------------------------------
/**
* @author Jeff
*
* Copyright (c)
*/
package chinese.utility;
import java.util.Comparator;
import net.sourceforge.pinyin4j.PinyinHelper;
public class PinyinComparator implements Comparator<String> {
public int compare(String o1, String o2) {
for (int i = 0; i < o1.length() && i < o2.length(); i++) {
int codePoint1 = o1.charAt(i);
int codePoint2 = o2.charAt(i);
if (Character.isSupplementaryCodePoint(codePoint1)
|| Character.isSupplementaryCodePoint(codePoint2)) {
i++;
}
if (codePoint1 != codePoint2) {
if (Character.isSupplementaryCodePoint(codePoint1)
|| Character.isSupplementaryCodePoint(codePoint2)) {
return codePoint1 - codePoint2;
}
String pinyin1 = pinyin((char) codePoint1);
String pinyin2 = pinyin((char) codePoint2);
if (pinyin1 != null && pinyin2 != null) { // Both of them
are Chinese character
if (!pinyin1.equals(pinyin2)) {
return pinyin1.compareTo(pinyin2);
}
} else {
return codePoint1 - codePoint2;
}
}
}
return o1.length() - o2.length();
}
/**
* If it is a polyphonic we got the first one.If not a Chinese
character return null.
*/
private String pinyin(char c) {
String[] pinyins = PinyinHelper.toHanyuPinyinStringArray(c);
if (pinyins == null) {
return null;
}
return pinyins[0];
}
}
-------------------------------------------------------------------
The junit4 Test.
-------------------------------------------------------------------
/**
* @author Jeff
*
* Copyright (c)
*/
package chinese.utility.test;
import java.util.Comparator;
import org.junit.Assert;
import org.junit.Test;
import chinese.utility.PinyinComparator;
public class PinyinComparatorTest {
private Comparator<String> comparator = new PinyinComparator();
/**
* Sight Words
*/
@Test
public void testCommon() {
Assert.assertTrue(comparator.compare("孟", "宋") < 0);
}
/**
* different length
*/
@Test
public void testDifferentLength() {
Assert.assertTrue(comparator.compare("天气真好", "天气真好啊") < 0);
}
/**
* compare with non-Chinese character
*/
@Test
public void testNoneChinese() {
Assert.assertTrue(comparator.compare("a", "阿") < 0);
Assert.assertTrue(comparator.compare("1", "阿") < 0);
}
/**
* unfamiliar characters (怡)
*/
@Test
public void testNoneCommon() {
Assert.assertTrue(comparator.compare("怡", "张") < 0);
}
/**
* homophones
*/
@Test
public void testSameSound() {
Assert.assertTrue(comparator.compare("怕", "帕") == 0);
}
/**
* polyphonic (曾[zēng,céng] )
*/
@Test
public void testMultiSound() {
Assert.assertTrue(comparator.compare("曾经", "曾迪") > 0);
}
}
----------------------------------------------------------------------
2011/9/5 Peter Neubauer <[email protected]>
> Yuanlong,
> can you provide Java code on how to sort Pinyin characters? In that case, I
> am sure there is a way to incorporate it into the Cypher sorting routines.
> It would be very helpful since we don't even know how to test Pinyin
> sorting
> for correctness :/
>
> Cheers,
>
> /peter neubauer
>
> GTalk: neubauer.peter
> Skype peter.neubauer
> Phone +46 704 106975
> LinkedIn http://www.linkedin.com/in/neubauer
> Twitter http://twitter.com/peterneubauer
>
> http://www.neo4j.org - Your high performance graph database.
> http://startupbootcamp.org/ - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
> On Mon, Sep 5, 2011 at 4:59 AM, iamyuanlong <[email protected]>
> wrote:
>
> > hi ,
> >
> > Sorry for disturb you,Please excuse for my bad english.
> >
> > i'd like use the CypherParser of neo4j.
> > when i query the user's info order by user.username desc.
> > i got the result that have a little difference from the result in
> > sqlserver.
> > i hope that the result can be sorted by chinese Pinyin.
> >
> > eg.
> > i got :
> > 风过这头
> > 镇定的猎豹
> > 达小鱼儿
> > 财富分享
> > 蝶儿菲菲
> > 脚一滑
> > 股童天尊
> > 股票赢家888
> >
> > i hope:
> > 镇定的猎豹
> > 脚一滑
> > 股童天尊
> > 股票赢家888
> > 风过这头
> > 蝶儿菲菲
> > 达小鱼儿
> > 财富分享
> >
> > --
> > View this message in context:
> >
> http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-how-Neo4j-work-for-sorting-chinese-character-tp3309754p3309754.html
> > Sent from the Neo4j Community Discussions mailing list archive at
> > Nabble.com.
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user