Re: [sqlite] UTF8 and NUL

J Decker Fri, 26 Jan 2018 17:42:36 -0800

char inserts two chars for these... so it's hard to generate a sequence
that looks like '1' for length function... inserting a C string that was
'hi\xc2\x93\x93\x93\x93\x93\x93' length would be 3.
'hi\x93\x93\x93\x93\x93\x93' length would be 9.


but without bind... this is as much as I can show.

insert into test (a) values
('hi'||char(192)||char(150)||char(150)||char(150)||char(150)||'there' );
insert into test (a) values ('hi'||char(0)||'there' );
select length(a),a from test;
(output to windows terminal)
12|hiÃ€Â–Â–Â–Â–there
2|hi

(in notepad++, )
12|hiÀ––––there
2|hi

(although that is what I would expect.  I wouldn't suggest changing any of
that, well ya, ...)

2|hi'||char(0)||'there

would look better - but in the select output context there aren't
quotes.... although that does work to preserve data for sqlite backup.

On Fri, Jan 26, 2018 at 5:22 PM, petern <[email protected]> wrote:

> That's an interesting idea, using BLOBs.  BLOB strings would be more
> practical if common SQL scalar operators { || , LIKE, =, <>,...} could be
> overloaded with user definable BLOB specific implementations.  At the same
> time subtype and pointer type would have to be improved to work in all
> cases.  This would be far more general solution than messing up the API
> with ubiquitous length argument.
>
> FYI, here are some current very serious deficiencies with BLOB type system
> including detailed test programs:
>

Blob also is binary, and I'm not dealing with binary, i'm dealing with UTF8
Text.  It is a totally different sort of thing than a BLOB would be.


>
> http://sqlite.1065341.n5.nabble.com/sqlite3-value-
> pointer-metadata-is-also-stripped-by-trivial-cross-
> join-Defect-tt100000.html
>
> http://sqlite.1065341.n5.nabble.com/Defect-trivial-cross-join-strips-BLOB-
> subtype-tt99982.html
>
> Still waiting for an answer about why BLOB types are lost in a trivial
> cross join.  No answer.
>

That's interesting; I have seen, in the shell, that if the column is blob,
it is just not shown.


>
> Is there a legitimate reason why BLOB type information cannot be passed
> through a join?
>
>



> Peter
>
> On Fri, Jan 26, 2018 at 4:36 PM, Keith Medcalf <[email protected]>
> wrote:
>
> >
> > I do not understand this at all.
> >
> > If the definition of a C-String is a "bunch-a-non-zero-byes-
> terminated-by-a-zero-byte",
> > then how is it possible to have a zero/null byte "embedded" within a
> > C-Style String?
> >
> > Similarly, if a C-Style-Wide-String is defined as a
> > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it
> > possible to have a zero/null word "embedded" within a
> C-Style-Wide-String?
> >
> > Given that SQLite3 is written in C and uses C-Strings or
> > C-Style-Wide-Strings, then you cannot have zero/null bytes embedded in
> > those strings.
> >
> > You may of course argue that perhaps SQLite3 should use something other
> > than C-Style-Strings, however, this is not what seems to be proposed.  It
> > seems to be proposing the use of some magical C-Style-String that is not
> > actually a C-Style-String, without explicitly stating this.
> >
> > SQLite3 does handle non-C-Ctyle-Strings.  They are called "blobs".
> >
> > ---
> > The fact that there's a Highway to Hell but only a Stairway to Heaven
> says
> > a lot about anticipated traffic volume.
> >
> >
> > >-----Original Message-----
> > >From: sqlite-users [mailto:sqlite-users-
> > >[email protected]] On Behalf Of J Decker
> > >Sent: Friday, 26 January, 2018 17:18
> > >To: SQLite mailing list
> > >Subject: Re: [sqlite] UTF8 and NUL
> > >
> > >On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
> > >[email protected]> wrote:
> > >
> > >> On 2018-01-26, at 17:05, J Decker <[email protected]> wrote:
> > >> > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
> > >> > [email protected]> wrote:
> > >> >> Sqlite uses NUL as the string terminator internally, the
> > >published API
> > >> >> specifies has stuff like this all over the place:
> > >>
> > >> >>> In those routines that have a fourth argument, its value is the
> > >number
> > >> of bytes in the parameter. To be clear: the value is the number of
> > >bytes in
> > >> the value, not the number of characters. If the fourth parameter to
> > >> sqlite3_bind_text() or sqlite3_bind_text16() is negative, then the
> > >length
> > >> of the string is the number of bytes UP TO THE FIRST ZERO
> > >TERMINATOR.
> > >>
> > >> > You stressed the wrong part there - *IS NEGATIVE*
> > >>
> > >> Why? Passing -1 as the length is a common way to tell sqlite3 to
> > >calculate
> > >> the length itself. It's a documented and widely used part of the
> > >API.
> > >
> > >
> > >Exactly, so on neither side, input or output is there a problem
> > >storing a
> > >length of valid characters.
> > >The deficiency is 1) the command line tool for diagnostics
> > >2) always scanning for a nul in prepare() unless the length is before
> > >that.  It's simple to add an option that could change that behavior;
> > >or
> > >move the string measuring up to prepare[_v2,_v3,_v4] and even add a
> > >V5 that
> > >just passes the length passed without a scan.
> > >
> > >The input is read by a tokenizer that returns in-buffer references to
> > >the
> > >next SQL token by length.
> > >Some tokens can be quoted, and those end up being a copy of the
> > >original;
> > >but the length of the SQL statement should already be known, so it
> > >doesn't
> > >need to scan for 0.
> > >
> > >Once tokenized it's converted into expressions; those expressions
> > >(have
> > >previously) stored only the char*.  It's not a lot of places to
> > >change to
> > >include storing the length; which is often known unless the mprintf
> > >internals are used; then any token passed through that does not pass
> > >%s.
> > >So %s cannot be used for UTF8 strings; but rather the literal string
> > >fwrite( buf, 1, stringlen, <output) ; gets all the standard character
> > >treatment as the file was opened with (O_BINARY or not, "b" or "t"
> > >specifiers for fopen, or stderr ).
> > >
> > >fprintf( out, "%s", (string) );
> > >is exactly the same as
> > >fwrite( out, 1, strlen( string ), string );
> > >
> > >(Can anyone dispute that?  I doubt that's specified)
> > >
> > >Other than, the fwrite will include outputing the NUL character and
> > >trust
> > >the length given to it. \n will still get promoted to \r\n depending
> > >on
> > >platform and C library personality.
> > >
> > >
> > >
> > >
> > >
> > >> Therefore:
> > >>
> > >
> > >
> > >>
> > >> >> It would be a huge push-up to change this, it would break
> > >everything,
> > >> >> including extensions. I don't think it would be possible until
> > >something
> > >> >> like sqlite4.
> > >>
> > >
> > >maybe I don't understand what you're saying 'it' is.
> > >
> > >
> > >> _______________________________________________
> > >> sqlite-users mailing list
> > >> [email protected]
> > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
> > >users
> > >>
> > >_______________________________________________
> > >sqlite-users mailing list
> > >[email protected]
> > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
> >
> > _______________________________________________
> > sqlite-users mailing list
> > [email protected]
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8 and NUL

Reply via email to