Re: [sqlite] The byte order mark problem in fields with ICUcollation

Igor Tandetnik Mon, 07 Dec 2009 14:11:25 -0800

Alexey Pechnikov <pechni...@mobigroup.ru>
wrote: 
> Yes, the BOM is on the original string. But with ICU collation we can
> see that 17 symbols string is equal to 16 symbols string. I think
> this result is not right.


What's the basis for this belief? It's not at all uncommon for two Unicode 
strings of different length (in codepoints) to collate equal - for example, 
they could be canonically equivalent but in different normalization forms, or 
contain weightless characters such as a zero-width non-breaking space (U+FEFF), 
also known as BOM.

> May be automatically dropping the BOM for
> ICU collated fields is more correct way.  

Why don't you do just that in your application?

Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] The byte order mark problem in fields with ICUcollation

Reply via email to