On 10/28/05, Joe Wilson <[EMAIL PROTECTED]> wrote: > --- Nuno Lucas <[EMAIL PROTECTED]> wrote: > > On 10/25/05, Joe Wilson <[EMAIL PROTECTED]> wrote: > > > The built-in Sqlite length() function works just fine on blobs: > > > > > > sqlite> select length(X'0000000000'); > > > 5 > > > > > > Why do you think it doesn't? > > > > I remember a few months ago noticing it would count UTF-8 chars, not > > bytes, so it would not return the right length for blobs in all cases > > (a '\0' is a valid UTF-8 char, but try with the '(c)' [copyright] sign, > > which is 2 bytes). > > > > I don't have the code in front of me, and can be just my memory, but I > > don't see that behaviour changing or would break a lot of SQL during > > normal text manipulation. > > > > Regards, > > ~Nuno Lucas > > You are mistaken. Text and blobs are different. > TEXT is of type "SQLITE_TEXT". BLOB is of type "SQLITE_BLOB". > length() has always correctly returned the size of a blob - look at the code: >
[EMAIL PROTECTED]:~/src/sqlite$ sqlite3 test.db3 SQLite version 3.2.1 Enter ".help" for instructions sqlite> create table x ( a TEXT, b BLOB ); sqlite> insert into x values ( 'Não', 'Não' ); sqlite> select * from x; Não|Não sqlite> select length(a), length(b) from x; 3|3 sqlite> .q [EMAIL PROTECTED]:~/src/sqlite$ hexdump -c test.db3 0000000 S Q L i t e f o r m a t 3 \0 0000010 004 \0 001 001 \0 @ \0 \0 \0 002 \0 \0 \0 \0 0000020 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 001 0000030 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 0000040 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0000060 \0 \0 \0 \0 \r \0 \0 \0 001 003 317 \0 003 317 \0 \0 0000070 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 00003c0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 / 00003d0 001 006 027 017 017 001 O t a b l e x x 002 C 00003e0 R E A T E T A B L E x ( 00003f0 a T E X T , b B L O B ) 0000400 \r \0 \0 \0 001 003 363 \0 003 363 \0 \0 \0 \0 \0 \0 0000410 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 00007f0 \0 \0 \0 \v 001 003 025 025 N 303 243 o N 303 243 o 0000800 As you see from the hexdump, "Não" is encoded as UTF-8. It's true it works if you insert a literal BLOB, like with x'00112233', but it only means sqlite lacks a byte counting function for the general case. I don't complain about it, as I only use the SQLite C API, but it means there is no coherency if different programs are used to manipulate the database (that could insert the data in different ways). Also note that I've seen a lot of sqlite wrappers/managers in the past that don't care the encoding used (still many people think they can live with only the first 127 ASCII chars), inserting Latin-1 or others literally, making the length function returning the wrong value when accented chars or other symbols - like the (c) symbol - are found. It would be nice if we could make a quick sanity check on the DB by comparing the byte count against the string length. For me, it's just a low priority feature request, but I believe there are others who may think otherwise. Regards, ~Nuno Lucas > /* > ** Implementation of the length() function > */ > static void lengthFunc( > sqlite3_context *context, > int argc, > sqlite3_value **argv > ){ > int len; > > assert( argc==1 ); > switch( sqlite3_value_type(argv[0]) ){ > case SQLITE_BLOB: > case SQLITE_INTEGER: > case SQLITE_FLOAT: { > sqlite3_result_int(context, sqlite3_value_bytes(argv[0])); > break; > } > case SQLITE_TEXT: { > const char *z = sqlite3_value_text(argv[0]); > for(len=0; *z; z++){ if( (0xc0&*z)!=0x80 ) len++; } > sqlite3_result_int(context, len); > break; > } > default: { > sqlite3_result_null(context); > break; > } > } > }