Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Gary R. Schmidt
On 27/01/2018 05:32, Peter Da Silva wrote: On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker" wrote: ctrl-z was end of file text character in DOS (wrote char 26; not FF) DOS wasn't an operating system.

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
gt; "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it >> > > possible to have a zero/null word "embedded" within a >> > C-Style-Wide-String? >> > > >> > > Given that SQLite3 is written in C and uses C-Strings or >>

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
> Given that SQLite3 is written in C and uses C-Strings or > > > > C-Style-Wide-Strings, then you cannot have zero/null bytes embedded > in > > > > those strings. > > > > > > > > You may of course argue that perhaps SQLite3 should use something

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
seems to be proposing the use of some magical C-Style-String that is > not > > > actually a C-Style-String, without explicitly stating this. > > > > > > SQLite3 does handle non-C-Ctyle-Strings. They are called "blobs". > > > > > > --

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
agical C-Style-String that is not > > actually a C-Style-String, without explicitly stating this. > > > > SQLite3 does handle non-C-Ctyle-Strings. They are called "blobs". > > > > --- > > The fact that there's a Highway to Hell but only a Stairway to Hea

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread petern
that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > > >-Original Message- > >From: sqlite-users [mailto:sqlite-users- > >boun...@mailinglists.sqlite.org] On Behalf Of J Decker > >Sent: Friday, 26 January, 20

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
> --- > The fact that there's a Highway to Hell but only a Stairway to Heaven says > a lot about anticipated traffic volume. > > > >-Original Message----- > >From: sqlite-users [mailto:sqlite-users- > >boun...@mailinglists.sqlite.org] On Behalf Of J Decker &

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Keith Medcalf
On Behalf Of J Decker >Sent: Friday, 26 January, 2018 17:18 >To: SQLite mailing list >Subject: Re: [sqlite] UTF8 and NUL > >On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva < >peter.dasi...@flightaware.com> wrote: > >> On 2018-01-26, at 17:05, J Decker <

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 2018-01-26, at 17:05, J Decker wrote: > > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < > > peter.dasi...@flightaware.com> wrote: > >> Sqlite uses NUL as the string terminator

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 2018-01-26, at 17:05, J Decker wrote: > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < > peter.dasi...@flightaware.com> wrote: >> Sqlite uses NUL as the string terminator internally, the published API >> specifies has stuff like this all over the place: >>> In those

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > Sqlite uses NUL as the string terminator internally, the published API > specifies has stuff like this all over the place: > > > In those routines that have a fourth argument, its value is the number > of

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Simon Slavin
On 26 Jan 2018, at 9:04pm, J Decker wrote: > I bet windows command line tools still use it because copy has /B and /A on > windows 10. Windows is indeed a problem. I don't know enough about it to know whether the above statement outlines the problem but Windows in general is

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
Sqlite uses NUL as the string terminator internally, the published API specifies has stuff like this all over the place: > In those routines that have a fourth argument, its value is the number of > bytes in the parameter. To be clear: the value is the number of bytes in the > value, not the

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 11:41 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > >doesn't get 26 either. 0x1a > > 26 isn't EOF, it's

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 2:34 PM, "sqlite-users on behalf of J. King" wrote: > Do you have a point in making either statement? If you do, I'm really not > seeing it. The point is that apart from CP/M and derivatives like DOS,

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J. King
On 2018-01-26 15:13:46, "Peter Da Silva" wrote: On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown" wrote: ​In the distant past (CP/M-80), the filesystem meta data

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown" wrote: > ​In the distant past (CP/M-80), the filesystem meta data did not include the > actual _length_ of the data for a text data file. Since

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread John McKown
On Fri, Jan 26, 2018 at 1:41 PM, Peter Da Silva < peter.dasi...@flightaware.com> wr > On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > >doesn't get 26 either. 0x1a > > 26 isn't EOF, it's SUB

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" wrote: >doesn't get 26 either. 0x1a 26 isn't EOF, it's SUB (substitute). It was used to represent untranslatable characters when converting (for example)

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:44 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com> > wrote: > > reads the bytes and does things with them. the EOF

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker" wrote: > reads the bytes and does things with them. the EOF would get returned with > fgetc() but not the character. Fgetc returns an int, not a byte. That

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:35 AM, Tim Streater wrote: > On 26 Jan 2018, at 18:12, Keith Medcalf wrote: > > > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an > ASCII > > byte-stream that indicates end-of-file. > > First I've heard

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Tim Streater
On 26 Jan 2018, at 18:12, Keith Medcalf wrote: > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII > byte-stream that indicates end-of-file. First I've heard of that. Which systems did that then? EOF is normally indicated by the file system, not by

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker" wrote: > ctrl-z was end of file text character in DOS (wrote char 26; not FF) DOS wasn't an operating system.

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 10:22 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf" < > sqlite-users-boun...@mailinglists.sqlite.org on behalf of > kmedc...@dessus.com> wrote: > > Actually, EOF (0xFF) *is* part of a text file,

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf" wrote: > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII > byte-stream that indicates end-of-file. In the "old days" the

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Keith Medcalf
:) ). --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-Original Message- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of Peter Da Silva >Sent: Friday, 26 January, 2

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
On Fri, Jan 26, 2018 at 5:55 AM, Peter Da Silva < peter.dasi...@flightaware.com> wrote: > What is the goal of this discussion? Changing the string terminator SQLite > uses? I think it's almost 50 years too late for that, but I'm sure that if > Unicode and UTF8 had been a thing in 1970 then C

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
On 1/26/18, 8:24 AM, "sqlite-users on behalf of Gary R. Schmidt" wrote: > But how would you differentiate EOF??? (Let me guess, 0. :-) ) End of file is not part of the contents of the file or a string.

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Gary R. Schmidt
On 27/01/2018 00:55, Peter Da Silva wrote: What is the goal of this discussion? Changing the string terminator SQLite uses? I think it's almost 50 years too late for that, but I'm sure that if Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the string terminator.

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Peter Da Silva
What is the goal of this discussion? Changing the string terminator SQLite uses? I think it's almost 50 years too late for that, but I'm sure that if Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the string terminator.

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread Clemens Ladisch
J Decker wrote: > U+009C 156 String Terminator ST "ST is used as the closing delimiter of a control string opened by APPLICATION PROGRAM COMMAND (APC), DEVICE CONTROL STRING (DCS), OPERATING SYSTEM COMMAND (OSC), PRIVACY MESSAGE (PM), or START OF STRING (SOS)." Regards, Clemens

Re: [sqlite] UTF8 and NUL

2018-01-26 Thread J Decker
https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes Even the Control codes within unicode aren't FF. U+009C 156 String Terminator ST literal bytes \xC2\x9c are string terminator ... Was thinking that like APC and ST were higher than that... more in the range of 0xF8-0xFF On