Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column?mode does not align UTF8-strings correctly

2010-11-29 Thread Nicolas Williams
On Fri, Nov 26, 2010 at 06:52:56AM +, Niklas Bäckman wrote: > Igor Tandetnik writes: > > Note that counting codepoints, while it happens to help with your > > particular data, won't help in general. Consider combining > > diacritics: U+00E4 (small A with diaeresis) looks the

Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column mode does not align UTF8-strings correctly

2010-11-26 Thread Jean-Christophe Deschamps
At 14:26 26/11/2010, you wrote: >N.b., there is a severe bug (pointers calculated based on truncated >16-bit >values above plane-0) in a popular Unicode-properties SQLite extension. >The extension only attempts covering a few high-plane characters—if >memory >serves, three of thhem in array

Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column mode does not align UTF8-strings correctly

2010-11-26 Thread Samuel Adam
On Fri, 26 Nov 2010 07:27:02 -0500, Simon Slavin wrote: > On 26 Nov 2010, at 6:52am, Niklas Bäckman wrote: > >> You are right of course. The shell should not count code points, but >> graphemes. >> >> http://unicode.org/faq/char_combmark.html#7 >> [snip] >> Or would it

Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column mode does not align UTF8-strings correctly

2010-11-26 Thread Simon Slavin
On 26 Nov 2010, at 6:52am, Niklas Bäckman wrote: > You are right of course. The shell should not count code points, but > graphemes. > > http://unicode.org/faq/char_combmark.html#7 > > I guess that this probably falls out of the "lite" scope of SQLITE though? There is absolutely no way

Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column mode does not align UTF8-strings correctly

2010-11-25 Thread Igor Tandetnik
Niklas Bäckman wrote: > Columns with special characters like ("å" "ä" "å") get too short widths when > output. > > I guess this is due to the shell not counting actual UTF8 *characters/code > points* when calculating the widths, but instead only > counting the plain bytes in