Re: [sqlite] The byte order mark problem in fields with ICUcollation

Alexey Pechnikov Mon, 07 Dec 2009 14:25:43 -0800

Hello!

On Tuesday 08 December 2009 01:07:54 Igor Tandetnik wrote:
> Alexey Pechnikov <[email protected]>
> wrote: 
> > Yes, the BOM is on the original string. But with ICU collation we can
> > see that 17 symbols string is equal to 16 symbols string. I think
> > this result is not right.
> 
> What's the basis for this belief? It's not at all uncommon for two Unicode 
> strings of different length (in codepoints) to collate equal - for example, 
> they could be canonically equivalent but in different normalization forms, or 
> contain weightless characters such as a zero-width non-breaking space 
> (U+FEFF), also known as BOM.


The normalization is now performed by any string operation. But more fast and 
useful to do it once at data store.
> 
> > May be automatically dropping the BOM for
> > ICU collated fields is more correct way.  
> 
> Why don't you do just that in your application?

Yes, I fix it in my application, but this problem can be produced in any 
application.

Best regards, Alexey Pechnikov.
http://pechnikov.tel/
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] The byte order mark problem in fields with ICUcollation

Reply via email to