Simon Slavin <slav...@bigfraud.org> wrote:
> On 17 Nov 2009, at 5:52pm, Igor Tandetnik wrote:
> 
>> But for your goals, it has to be sortable, right? In a proper
>> Unicode collation, U+0041 U+0301 would behave quite differently from
>> U+0301 U+0041. Consider "A ' E" (where ' stands for a combining
>> acute accent). In most locales, this would sort between AE and BE.
>> Now, if we reverse it naively, we'll end up with "E ' A", with the
>> accent now attached to E and not A. The result would sort between EA
>> and FA, rather than between EA and EB as you would probably want.   
> 
> Obviously, your routine to reverse a string must be unicode-aware. 

Tim Romano seems to insist on precisely the opposite.

> First split the string into characters, then reassemble them in
> reverse order.

The problem is, in Unicode it's not quite clear what constitutes a "character". 
Are we talking about codepoints, sort elements, graphemes? Depending on the 
application, either definition might make sense.

Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to