Re: [sqlite] built-in functrion suggestion: size of blob

Nuno Lucas Mon, 31 Oct 2005 00:42:49 -0800

On 10/28/05, Joe Wilson <[EMAIL PROTECTED]> wrote:
> --- Nuno Lucas <[EMAIL PROTECTED]> wrote:
> > On 10/25/05, Joe Wilson <[EMAIL PROTECTED]> wrote:
> > > The built-in Sqlite length() function works just fine on blobs:
> > >
> > >   sqlite> select length(X'0000000000');
> > >   5
> > >
> > > Why do you think it doesn't?
> >
> > I remember a few months ago noticing it would count UTF-8 chars, not
> > bytes, so it would not return the right length for blobs in all cases
> > (a '\0' is a valid UTF-8 char, but try with the '(c)' [copyright] sign,
> > which is 2 bytes).
> >
> > I don't have the code in front of me, and can be just my memory, but I
> > don't see that behaviour changing or would break a lot of SQL during
> > normal text manipulation.
> >
> > Regards,
> > ~Nuno Lucas
>
> You are mistaken. Text and blobs are different.
> TEXT is of type "SQLITE_TEXT". BLOB is of type "SQLITE_BLOB".
> length() has always correctly returned the size of a blob - look at the code:
>


[EMAIL PROTECTED]:~/src/sqlite$ sqlite3 test.db3
SQLite version 3.2.1
Enter ".help" for instructions
sqlite> create table x ( a TEXT, b BLOB );
sqlite> insert into x values ( 'Não', 'Não' );
sqlite> select * from x;
Não|Não
sqlite> select length(a), length(b) from x;
3|3
sqlite> .q
[EMAIL PROTECTED]:~/src/sqlite$ hexdump -c test.db3
0000000   S   Q   L   i   t   e       f   o   r   m   a   t       3  \0
0000010 004  \0 001 001  \0   @          \0  \0  \0 002  \0  \0  \0  \0
0000020  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0 001
0000030  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0  \0
0000040  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0000060  \0  \0  \0  \0  \r  \0  \0  \0 001 003 317  \0 003 317  \0  \0
0000070  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
00003c0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   /
00003d0 001 006 027 017 017 001   O   t   a   b   l   e   x   x 002   C
00003e0   R   E   A   T   E       T   A   B   L   E       x       (
00003f0   a       T   E   X   T   ,       b       B   L   O   B       )
0000400  \r  \0  \0  \0 001 003 363  \0 003 363  \0  \0  \0  \0  \0  \0
0000410  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
00007f0  \0  \0  \0  \v 001 003 025 025   N 303 243   o   N 303 243   o
0000800

As you see from the hexdump, "Não" is encoded as UTF-8.

It's true it works if you insert a literal BLOB, like with
x'00112233', but it only means sqlite lacks a byte counting function
for the general case.

I don't complain about it, as I only use the SQLite C API, but it
means there is no coherency if different programs are used to
manipulate the database (that could insert the data in different
ways).

Also note that I've seen a lot of sqlite wrappers/managers in the past
that don't care the encoding used (still many people think they can
live with only the first 127 ASCII chars), inserting Latin-1 or others
literally, making the length function returning the wrong value when
accented chars or other symbols - like the (c) symbol - are found. It
would be nice if we could make a quick sanity check on the DB by
comparing the byte count against the string length.

For me, it's just a low priority feature request, but I believe there
are others who may think otherwise.


Regards,
~Nuno Lucas


> /*
> ** Implementation of the length() function
> */
> static void lengthFunc(
>   sqlite3_context *context,
>   int argc,
>   sqlite3_value **argv
> ){
>   int len;
>
>   assert( argc==1 );
>   switch( sqlite3_value_type(argv[0]) ){
>     case SQLITE_BLOB:
>     case SQLITE_INTEGER:
>     case SQLITE_FLOAT: {
>       sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
>       break;
>     }
>     case SQLITE_TEXT: {
>       const char *z = sqlite3_value_text(argv[0]);
>       for(len=0; *z; z++){ if( (0xc0&*z)!=0x80 ) len++; }
>       sqlite3_result_int(context, len);
>       break;
>     }
>     default: {
>       sqlite3_result_null(context);
>       break;
>     }
>   }
> }

Re: [sqlite] built-in functrion suggestion: size of blob

Reply via email to