Re: [sqlite] UTF8 and NUL

petern Fri, 26 Jan 2018 19:25:18 -0800

Maybe you'll still miss my point if I don't explain about this.  I
understand you're talking about UTF8.  However,
you are free to define a user defined function which converts whatever
literal arguments into BLOB space and back again.


eg:  D3KRUTF8(<args>) -> TYPED POINTER (BLOB)

where TYPED POINTER is the D3KRUTF8 secret sauce UTF8 encoding that
supports NULs and anything else.

In the D3KRUTF8 BLOB space, any operation whatsoever is possible. Moreover,
queries are free to pass around the D3KRUTF8 BLOB to a nucleus of other
functions which provide a parallel implementation to the familar SUBSTR(),
LIKE(), LENGTH() SQL functions.  What's missing is the ability to overload
the punctuation operators like "||" and "=".

Richard,  why can't UDF's overload '||' (concat()) and '=" (equals())  ?

Peter



On Fri, Jan 26, 2018 at 7:09 PM, petern <peter.nichvolo...@gmail.com> wrote:

> BLOB will store UTF8 or any encoding for that matter.   Are you familiar
> with the general concept of operator overloading?
> https://en.wikipedia.org/wiki/Operator_overloading
>
> Arbitrary BLOB types including unterminated strings could be supported as
> first class object through user defined functions and more universal
> operator overloading.  Although they are presently crippled, user named
> BLOB types are already supported.
>
> https://www.sqlite.org/bindptr.html
> https://www.sqlite.org/c3ref/value_subtype.html
>
> For an example of overloading see the LIKE operator:[from
> https://sqlite.org/lang_expr.html]
> ---------
>  The sqlite3_create_function()
> <https://sqlite.org/c3ref/create_function.html> interface can be used to
> override the like() function and thereby change the operation of the LIKE
> <https://sqlite.org/lang_expr.html#like> operator. When overriding the
> like() function, it may be important to override both the two and three
> argument versions of the like() function. Otherwise, different code may be
> called to implement the LIKE <https://sqlite.org/lang_expr.html#like>
> operator depending on whether or not an ESCAPE clause was specified.
> --------
>
> BTW, a question for anyone familiar with it.  If one overloads the like()
> function, how can one call the default implementation from the overloaded
> function?
> "like()" isn't an API export, and calling exec() on the same DB handle
> will only reenter the user defined like().
> If calling the default implementation from the overloaded one is
> impossible, I'd say the LIKE overload system has a bug.
>
>
> Peter
>
>
>
>
>
>
>
>
>
> On Fri, Jan 26, 2018 at 5:42 PM, J Decker <d3c...@gmail.com> wrote:
>
>> char inserts two chars for these... so it's hard to generate a sequence
>> that looks like '1' for length function... inserting a C string that was
>> 'hi\xc2\x93\x93\x93\x93\x93\x93' length would be 3.
>> 'hi\x93\x93\x93\x93\x93\x93' length would be 9.
>>
>> but without bind... this is as much as I can show.
>>
>> insert into test (a) values
>> ('hi'||char(192)||char(150)||char(150)||char(150)||char(150)||'there' );
>> insert into test (a) values ('hi'||char(0)||'there' );
>> select length(a),a from test;
>> (output to windows terminal)
>> 12|hiÃ€Â–Â–Â–Â–there
>> 2|hi
>>
>> (in notepad++, )
>> 12|hiÀ––––there
>> 2|hi
>>
>> (although that is what I would expect.  I wouldn't suggest changing any of
>> that, well ya, ...)
>>
>> 2|hi'||char(0)||'there
>>
>> would look better - but in the select output context there aren't
>> quotes.... although that does work to preserve data for sqlite backup.
>>
>> On Fri, Jan 26, 2018 at 5:22 PM, petern <peter.nichvolo...@gmail.com>
>> wrote:
>>
>> > That's an interesting idea, using BLOBs.  BLOB strings would be more
>> > practical if common SQL scalar operators { || , LIKE, =, <>,...} could
>> be
>> > overloaded with user definable BLOB specific implementations.  At the
>> same
>> > time subtype and pointer type would have to be improved to work in all
>> > cases.  This would be far more general solution than messing up the API
>> > with ubiquitous length argument.
>> >
>> > FYI, here are some current very serious deficiencies with BLOB type
>> system
>> > including detailed test programs:
>> >
>>
>> Blob also is binary, and I'm not dealing with binary, i'm dealing with
>> UTF8
>> Text.  It is a totally different sort of thing than a BLOB would be.
>>
>>
>> >
>> > http://sqlite.1065341.n5.nabble.com/sqlite3-value-
>> > pointer-metadata-is-also-stripped-by-trivial-cross-
>> > join-Defect-tt100000.html
>> >
>> > http://sqlite.1065341.n5.nabble.com/Defect-trivial-cross-
>> join-strips-BLOB-
>> > subtype-tt99982.html
>> >
>> > Still waiting for an answer about why BLOB types are lost in a trivial
>> > cross join.  No answer.
>> >
>>
>> That's interesting; I have seen, in the shell, that if the column is blob,
>> it is just not shown.
>>
>>
>> >
>> > Is there a legitimate reason why BLOB type information cannot be passed
>> > through a join?
>> >
>> >
>>
>>
>>
>> > Peter
>> >
>> > On Fri, Jan 26, 2018 at 4:36 PM, Keith Medcalf <kmedc...@dessus.com>
>> > wrote:
>> >
>> > >
>> > > I do not understand this at all.
>> > >
>> > > If the definition of a C-String is a "bunch-a-non-zero-byes-
>> > terminated-by-a-zero-byte",
>> > > then how is it possible to have a zero/null byte "embedded" within a
>> > > C-Style String?
>> > >
>> > > Similarly, if a C-Style-Wide-String is defined as a
>> > > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it
>> > > possible to have a zero/null word "embedded" within a
>> > C-Style-Wide-String?
>> > >
>> > > Given that SQLite3 is written in C and uses C-Strings or
>> > > C-Style-Wide-Strings, then you cannot have zero/null bytes embedded in
>> > > those strings.
>> > >
>> > > You may of course argue that perhaps SQLite3 should use something
>> other
>> > > than C-Style-Strings, however, this is not what seems to be
>> proposed.  It
>> > > seems to be proposing the use of some magical C-Style-String that is
>> not
>> > > actually a C-Style-String, without explicitly stating this.
>> > >
>> > > SQLite3 does handle non-C-Ctyle-Strings.  They are called "blobs".
>> > >
>> > > ---
>> > > The fact that there's a Highway to Hell but only a Stairway to Heaven
>> > says
>> > > a lot about anticipated traffic volume.
>> > >
>> > >
>> > > >-----Original Message-----
>> > > >From: sqlite-users [mailto:sqlite-users-
>> > > >boun...@mailinglists.sqlite.org] On Behalf Of J Decker
>> > > >Sent: Friday, 26 January, 2018 17:18
>> > > >To: SQLite mailing list
>> > > >Subject: Re: [sqlite] UTF8 and NUL
>> > > >
>> > > >On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
>> > > >peter.dasi...@flightaware.com> wrote:
>> > > >
>> > > >> On 2018-01-26, at 17:05, J Decker <d3c...@gmail.com> wrote:
>> > > >> > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
>> > > >> > peter.dasi...@flightaware.com> wrote:
>> > > >> >> Sqlite uses NUL as the string terminator internally, the
>> > > >published API
>> > > >> >> specifies has stuff like this all over the place:
>> > > >>
>> > > >> >>> In those routines that have a fourth argument, its value is the
>> > > >number
>> > > >> of bytes in the parameter. To be clear: the value is the number of
>> > > >bytes in
>> > > >> the value, not the number of characters. If the fourth parameter to
>> > > >> sqlite3_bind_text() or sqlite3_bind_text16() is negative, then the
>> > > >length
>> > > >> of the string is the number of bytes UP TO THE FIRST ZERO
>> > > >TERMINATOR.
>> > > >>
>> > > >> > You stressed the wrong part there - *IS NEGATIVE*
>> > > >>
>> > > >> Why? Passing -1 as the length is a common way to tell sqlite3 to
>> > > >calculate
>> > > >> the length itself. It's a documented and widely used part of the
>> > > >API.
>> > > >
>> > > >
>> > > >Exactly, so on neither side, input or output is there a problem
>> > > >storing a
>> > > >length of valid characters.
>> > > >The deficiency is 1) the command line tool for diagnostics
>> > > >2) always scanning for a nul in prepare() unless the length is before
>> > > >that.  It's simple to add an option that could change that behavior;
>> > > >or
>> > > >move the string measuring up to prepare[_v2,_v3,_v4] and even add a
>> > > >V5 that
>> > > >just passes the length passed without a scan.
>> > > >
>> > > >The input is read by a tokenizer that returns in-buffer references to
>> > > >the
>> > > >next SQL token by length.
>> > > >Some tokens can be quoted, and those end up being a copy of the
>> > > >original;
>> > > >but the length of the SQL statement should already be known, so it
>> > > >doesn't
>> > > >need to scan for 0.
>> > > >
>> > > >Once tokenized it's converted into expressions; those expressions
>> > > >(have
>> > > >previously) stored only the char*.  It's not a lot of places to
>> > > >change to
>> > > >include storing the length; which is often known unless the mprintf
>> > > >internals are used; then any token passed through that does not pass
>> > > >%s.
>> > > >So %s cannot be used for UTF8 strings; but rather the literal string
>> > > >fwrite( buf, 1, stringlen, <output) ; gets all the standard character
>> > > >treatment as the file was opened with (O_BINARY or not, "b" or "t"
>> > > >specifiers for fopen, or stderr ).
>> > > >
>> > > >fprintf( out, "%s", (string) );
>> > > >is exactly the same as
>> > > >fwrite( out, 1, strlen( string ), string );
>> > > >
>> > > >(Can anyone dispute that?  I doubt that's specified)
>> > > >
>> > > >Other than, the fwrite will include outputing the NUL character and
>> > > >trust
>> > > >the length given to it. \n will still get promoted to \r\n depending
>> > > >on
>> > > >platform and C library personality.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >> Therefore:
>> > > >>
>> > > >
>> > > >
>> > > >>
>> > > >> >> It would be a huge push-up to change this, it would break
>> > > >everything,
>> > > >> >> including extensions. I don't think it would be possible until
>> > > >something
>> > > >> >> like sqlite4.
>> > > >>
>> > > >
>> > > >maybe I don't understand what you're saying 'it' is.
>> > > >
>> > > >
>> > > >> _______________________________________________
>> > > >> sqlite-users mailing list
>> > > >> sqlite-users@mailinglists.sqlite.org
>> > > >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-
>> > > >users
>> > > >>
>> > > >_______________________________________________
>> > > >sqlite-users mailing list
>> > > >sqlite-users@mailinglists.sqlite.org
>> > > >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > sqlite-users mailing list
>> > > sqlite-users@mailinglists.sqlite.org
>> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> > >
>> > _______________________________________________
>> > sqlite-users mailing list
>> > sqlite-users@mailinglists.sqlite.org
>> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UTF8 and NUL

Reply via email to