On 27/01/2018 05:32, Peter Da Silva wrote:
On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker"
wrote:
ctrl-z was end of file text character in DOS (wrote char 26; not FF)
DOS wasn't an operating system.
That will come as a surprise to the people who used DOS/360 and DOS/VSE
and th
tyle-Wide-String is defined as a
>> > > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it
>> > > possible to have a zero/null word "embedded" within a
>> > C-Style-Wide-String?
>> > >
>> > > Given that SQLite3 i
e String?
> > > >
> > > > Similarly, if a C-Style-Wide-String is defined as a
> > > > "bunch-a-non-zero-words-terminated-by-a-zero-word", then how is it
> > > > possible to have a zero/null word "embedded" within a
> > > C-Style-Wide-String?
> > &
es C-Strings or
> > > C-Style-Wide-Strings, then you cannot have zero/null bytes embedded in
> > > those strings.
> > >
> > > You may of course argue that perhaps SQLite3 should use something other
> > > than C-Style-Strings, however, this is not what seems
posed. It
> > seems to be proposing the use of some magical C-Style-String that is not
> > actually a C-Style-String, without explicitly stating this.
> >
> > SQLite3 does handle non-C-Ctyle-Strings. They are called "blobs".
> >
> > ---
> > The fact that th
to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-Original Message-
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Behalf Of J Decker
> >Sent: Friday, 26 January, 2018 17:18
> >
r a decade or more as char.
> ---
> The fact that there's a Highway to Hell but only a Stairway to Heaven says
> a lot about anticipated traffic volume.
>
>
> >-Original Message-
> >From: sqlite-users [mailto:sqlite-users-
> >boun...@mailinglists.sqlite.org] On Be
On Behalf Of J Decker
>Sent: Friday, 26 January, 2018 17:18
>To: SQLite mailing list
>Subject: Re: [sqlite] UTF8 and NUL
>
>On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
>peter.dasi...@flightaware.com> wrote:
>
>> On 2018-01-26, at 17:05, J Decker wrote:
On Fri, Jan 26, 2018 at 3:56 PM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> On 2018-01-26, at 17:05, J Decker wrote:
> > On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
> > peter.dasi...@flightaware.com> wrote:
> >> Sqlite uses NUL as the string terminator internally, the publishe
On 2018-01-26, at 17:05, J Decker wrote:
> On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
> peter.dasi...@flightaware.com> wrote:
>> Sqlite uses NUL as the string terminator internally, the published API
>> specifies has stuff like this all over the place:
>>> In those routines that have a fou
On Fri, Jan 26, 2018 at 1:21 PM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> Sqlite uses NUL as the string terminator internally, the published API
> specifies has stuff like this all over the place:
>
> > In those routines that have a fourth argument, its value is the number
> of byt
On 26 Jan 2018, at 9:04pm, J Decker wrote:
> I bet windows command line tools still use it because copy has /B and /A on
> windows 10.
Windows is indeed a problem. I don't know enough about it to know whether the
above statement outlines the problem but Windows in general is terrifically
diff
Sqlite uses NUL as the string terminator internally, the published API
specifies has stuff like this all over the place:
> In those routines that have a fourth argument, its value is the number of
> bytes in the parameter. To be clear: the value is the number of bytes in the
> value, not the nu
On Fri, Jan 26, 2018 at 11:41 AM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" <
> sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com>
> wrote:
> >doesn't get 26 either. 0x1a
>
> 26 isn't EOF, it's SU
On 1/26/18, 2:34 PM, "sqlite-users on behalf of J. King"
wrote:
> Do you have a point in making either statement? If you do, I'm really not
> seeing it.
The point is that apart from CP/M and derivatives like DOS, this kind of
behavior is strictly a leftover from the '60s. And CP/M only had th
On 2018-01-26 15:13:46, "Peter Da Silva"
wrote:
On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown"
john.archie.mck...@gmail.com> wrote:
In the distant past (CP/M-80), the filesystem meta data did not
include the actual _length_ of the data for a text data file.
Since DOS wasn't a
On 1/26/18, 2:11 PM, "sqlite-users on behalf of John McKown"
wrote:
> In the distant past (CP/M-80), the filesystem meta data did not include the
> actual _length_ of the data for a text data file.
Since DOS wasn't an OS, then CP/M certainly wasn't.
_
On Fri, Jan 26, 2018 at 1:41 PM, Peter Da Silva <
peter.dasi...@flightaware.com> wr
> On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker" <
> sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com>
> wrote:
> >doesn't get 26 either. 0x1a
>
> 26 isn't EOF, it's SUB (su
On 1/26/18, 1:37 PM, "sqlite-users on behalf of J Decker"
wrote:
>doesn't get 26 either. 0x1a
26 isn't EOF, it's SUB (substitute). It was used to represent untranslatable
characters when converting (for example) EBCDIC to ASCII.
___
sqlite-users
On Fri, Jan 26, 2018 at 10:44 AM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker" <
> sqlite-users-boun...@mailinglists.sqlite.org on behalf of d3c...@gmail.com>
> wrote:
> > reads the bytes and does things with them. the EOF wo
On 1/26/18, 12:40 PM, "sqlite-users on behalf of J Decker"
wrote:
> reads the bytes and does things with them. the EOF would get returned with
> fgetc() but not the character.
Fgetc returns an int, not a byte. That EOF is -1, not 0xFF.
___
sqlit
On Fri, Jan 26, 2018 at 10:35 AM, Tim Streater wrote:
> On 26 Jan 2018, at 18:12, Keith Medcalf wrote:
>
> > Actually, EOF (0xFF) *is* part of a text file, and is the byte in an
> ASCII
> > byte-stream that indicates end-of-file.
>
> First I've heard of that. Which systems did that then? EOF is
On 26 Jan 2018, at 18:12, Keith Medcalf wrote:
> Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII
> byte-stream that indicates end-of-file.
First I've heard of that. Which systems did that then? EOF is normally
indicated by the file system, not by file data.
--
Chee
On 1/26/18, 12:31 PM, "sqlite-users on behalf of J Decker"
wrote:
> ctrl-z was end of file text character in DOS (wrote char 26; not FF)
DOS wasn't an operating system.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://maili
On Fri, Jan 26, 2018 at 10:22 AM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf" <
> sqlite-users-boun...@mailinglists.sqlite.org on behalf of
> kmedc...@dessus.com> wrote:
> > Actually, EOF (0xFF) *is* part of a text file,
On 1/26/18, 12:12 PM, "sqlite-users on behalf of Keith Medcalf"
wrote:
> Actually, EOF (0xFF) *is* part of a text file, and is the byte in an ASCII
> byte-stream that indicates end-of-file. In the "old days" the bytes
> following the last-byte in a stream and the end of a storage block
> (se
:) ).
---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a
lot about anticipated traffic volume.
>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Peter Da Silva
>Sent: Friday, 26 Janua
On Fri, Jan 26, 2018 at 5:55 AM, Peter Da Silva <
peter.dasi...@flightaware.com> wrote:
> What is the goal of this discussion? Changing the string terminator SQLite
> uses? I think it's almost 50 years too late for that, but I'm sure that if
> Unicode and UTF8 had been a thing in 1970 then C would
On 1/26/18, 8:24 AM, "sqlite-users on behalf of Gary R. Schmidt"
wrote:
> But how would you differentiate EOF??? (Let me guess, 0. :-) )
End of file is not part of the contents of the file or a string. It's metadata.
___
sqlite-users mailing li
On 27/01/2018 00:55, Peter Da Silva wrote:
What is the goal of this discussion? Changing the string terminator SQLite
uses? I think it's almost 50 years too late for that, but I'm sure that if
Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the
string terminator.
But
What is the goal of this discussion? Changing the string terminator SQLite
uses? I think it's almost 50 years too late for that, but I'm sure that if
Unicode and UTF8 had been a thing in 1970 then C would have selected FF as the
string terminator.
__
J Decker wrote:
> U+009C 156 String Terminator ST
"ST is used as the closing delimiter of a control string opened by
APPLICATION PROGRAM COMMAND (APC), DEVICE CONTROL STRING (DCS),
OPERATING SYSTEM COMMAND (OSC), PRIVACY MESSAGE (PM), or START OF
STRING (SOS)."
Regards,
Clemens
_
https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes
Even the Control codes within unicode aren't FF.
U+009C 156 String Terminator ST
literal bytes \xC2\x9c are string terminator ... Was thinking that like
APC and ST were higher than that... more in the range of 0xF8-0xFF
On
NUL is a valid utf8 character
but FF is never valid. (would be like a 36 bit length specification)
and practically anthing more than F8 is invalid utf8 character.
Other than BOM
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
EF BB BF 239 187 191
// EF - 80 | 3b - 80 | 3f
( 0xfeff )
Many W
On 29 Jun 2017 at 08:01, Eric Grange wrote:
>> The sender, however, could be lying, and this needs to be considered
>
> This is an orthogonal problem: if the sender is sending you data that is
> not what it should be, then he could just as well be sending you
> well-encoded and well-formed but in
> The sender, however, could be lying, and this needs to be considered
This is an orthogonal problem: if the sender is sending you data that is
not what it should be, then he could just as well be sending you
well-encoded and well-formed but invalid data, or malware, or
confidential/personal data
On 28 Jun 2017 at 14:20, Rowan Worth wrote:
> On 27 June 2017 at 18:42, Eric Grange wrote:
>
>> So while in theory all the scenarios you describe are interesting, in
>> practice seeing an utf-8 BOM provides an extremely
>> high likeliness that a file will indeed be utf-8. Not always, but a memor
On Tue, Jun 27, 2017 at 4:18 AM, Richard Hipp wrote:
> The CSV import feature of the SQLite command-line shell expects to
> find UTF-8. It does not understand other encodings, and I have no
> plans to add converters for alternative encodings any time soon.
>
> The latest version of trunk skips ov
Thank you.
From: sqlite-users on behalf of
Richard Hipp
Sent: Tuesday, June 27, 2017 5:18:51 AM
To: SQLite mailing list
Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import
The CSV import feature of the SQLite command-line shell expects to
find UTF-8
The CSV import feature of the SQLite command-line shell expects to
find UTF-8. It does not understand other encodings, and I have no
plans to add converters for alternative encodings any time soon.
The latest version of trunk skips over a UTF-8 BOM at the beginning of
the input file.
--
D. Richa
Hello,
On 2017-06-26 17:26, Scott Robison wrote:
+1
FAQ quote:
Q: When a BOM is used, is it only in 16-bit Unicode text?
A: No, a BOM can be used as a signature no matter how the Unicode
text is transformed: UTF-16, UTF-8, or UTF-32.
Q: How I should deal with BOMs?
A: Here are some g
On 2017-06-26 15:01, jose isaias cabrera wrote:
I have made a desicion to always include the BOM in all my text files
whether they are UTF8, UTF16 or UTF32 little or big endian. I think
all of us should also.
I'm sorry, if I introduced ambiguity, but I had described SQLite's and
SQLite shell'
On Jun 26, 2017 9:02 AM, "Simon Slavin" wrote:
There is no convention for "This software understands both UTF-16BE and
UTF-16LE but nothing else.". If it handles any BOMs, it should handle all
five. However, it can handle them by identifying, for example, UTF-32BE
and returning an error indicat
I didn’t mean to imply you had to scan the whole content for a BOM, but rather
for illegal characters in the absence of a BOM.
On 6/26/17, 10:02 AM, "sqlite-users on behalf of Simon Slavin"
wrote:
Folks, I’m sorry to interrupt but I’ve just woken up to 11 posts in this
thread and I see a
On Jun 26, 2017 4:05 AM, "Rowan Worth" wrote:
On 26 June 2017 at 16:55, Scott Robison wrote:
> Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither
> is dialing a cell phone. Language evolves.
>
It's not descriptive in the slightest because UTF-8's byte order is
*specified
Folks, I’m sorry to interrupt but I’ve just woken up to 11 posts in this thread
and I see a lot of inaccurate 'facts' posted here. Rather than pick up on
statements in individual posts (which would unfairly pick on some people as
being less accurate than others) I’d like to post facts straight
Just occurred to me: another problem with the BOM is that some people who are
*not* writing UTF-8 are cargo-culting the BOM in anyway. So you may have to
scan the whole file to see if it’s really UTF-8 anyway.
You’re better off just assuming UTF-8 everywhere, generating an error (and
backing ou
At the bottom...
-Original Message-
From: Eric Grange
Sent: Monday, June 26, 2017 3:09 AM
To: SQLite mailing list
Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import
Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8
On 6/26/17, 2:09 AM, "sqlite-users on behalf of Eric Grange"
wrote:
> Alas, there is no end in sight to the pain for the Unicode decision to not
> make the BOM compulsory for UTF-8.
It’s not actually providing any “byte order” information. It’s only used for
round-tripping conversion from oth
On 6/26/17 3:09 AM, Eric Grange wrote:
Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.
Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
>Easily solved by never including a superflous BOM in UTF-8 text
And that easy option has worked beautifully for 20 years... not.
Yes, BOM is a misnommer, yes it "wastes" 3 bytes, but in the real world
"text files" have a variety of encodings.
No BOM = you have to fire a whole suite of heuristics
On 26 June 2017 at 16:55, Scott Robison wrote:
> Byte Order Mark isn't perfectly descriptive when used with UTF-8. Neither
> is dialing a cell phone. Language evolves.
>
It's not descriptive in the slightest because UTF-8's byte order is
*specified by the encoding*.
I'm not advocating one way
On Jun 25, 2017 1:16 PM, "Cezary H. Noweta" wrote:
Certainly, there are no objections to extend an import's functionality
in such a way that it ignores the initial 0xFEFF. However, an import
should allow ZWNBSP as the first character, in its basic form, to be
conforming to the standard.
If we'
On Jun 26, 2017 1:47 AM, "Rowan Worth" wrote:
On 26 June 2017 at 15:09, Eric Grange wrote:
> Alas, there is no end in sight to the pain for the Unicode decision to not
> make the BOM compulsory for UTF-8.
>
UTF-8 is byte oriented. The very concept of byte order is nonsense in this
context as t
On 26 June 2017 at 15:09, Eric Grange wrote:
> Alas, there is no end in sight to the pain for the Unicode decision to not
> make the BOM compulsory for UTF-8.
>
UTF-8 is byte oriented. The very concept of byte order is nonsense in this
context as there is no multi-byte storage primitives to worr
On Sun, Jun 25, 2017 at 12:16 PM, Cezary H. Noweta
wrote:
> Hello,
>
>
> The standard says: ``Only UTF-16/32 (even not UTF-16/32LE/BE) encoding
> forms can contain BOM''. Let's conform to this.
>
>
I concur with that.
Since UTF-8 is only bytes; what would a BOM even change? certainly longer
val
Alas, there is no end in sight to the pain for the Unicode decision to not
make the BOM compulsory for UTF-8.
Making it optional or non-necessary basically made every single text file
ambiguous, with non-trivial heuristics and implicit conventions required
instead, resulting in character corruptio
Hello,
On 2017-06-23 22:12, Mahmoud Al-Qudsi wrote:
I think you and I are on the same page here, Clemens? I abhor the
BOM, but the question is whether or not SQLite will cater to the fact
that the bigger names in the industry appear hell-bent on shoving it
in users’ documents by default.
Give
” commands, perhaps leeway
can be shown in breaking with standards for the sake of compatibility and
sanity?
Mahmoud
From: Clemens Ladisch
Sent: Friday, June 23, 2017 2:25 AM
To: sqlite-users@mailinglists.sqlite.org
Subject: Re: [sqlite] UTF8-BOM not disregarded in CSV import
Mahmoud Al-Qudsi wrote
Mahmoud Al-Qudsi wrote:
> with `.import ……`, SQLite3 includes a BOM (UTF-8) as part of the first
> column of the first record.
The Unicode Standard 9.0 says in section 3.10:
| When represented in UTF-8, the byte order mark turns into the byte
| sequence . Its usage at the beginning of a UTF-8 data
Hello all,
Let me start off with my apologies if this is a documented issue; I did search
the fossil tickets but did not find anything for “BOM”.
As of SQLite 3.19.3, under `.mode csv` and with `.import ……`, SQLite3 includes
a BOM (UTF-8) as part of the first column of the first record.
IMHO,
Vlczech - Tomáš Volf wrote:
> CREATE TABLE people (
> firstname TEXT,
> surname TEXT
> );
> INSERT INTO people('Tomáš', 'Surname');
>
> "SELECT * FROM people WHERE firstname LIKE ?"
> For binding I use: sqlite3_bind_text(stmt, 1, name.c_str(), -1,
> SQLITE_STATIC);
SQLITE_STATIC works only i
Hello,
I have some strange behaviout in LIKE query in SQLite. Letš see some very
simplified example:
Let's have a table
CREATE TABLE people (
firstname TEXT,
surname TEXT
);
and in it following data:
INSERT INTO people('Tomáš', 'Surname');
created by sqlite3_exec() function.
Then I use
Sorry for "spam", I hope that previous HTML form of mail (with bullet lists)
will be readable. There is, for sure and better readability for non-HTML clients, plain
text version of previous mail:
Hello,
I have some strange behaviout in LIKE query in SQLite. Letš see some very
simplified examp
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
William Kyngesburye wrote:
> So, sqlite supports UTF8 directly - UTF8 in, UTF8 out.
No. SQLite supports Unicode internally. The APIs let you supply and
receive Unicode strings in UTF8 and UTF16. The actual encoding
serialized to disk depends on a
On Oct 27, 2008, at 10:23 AM, MikeW wrote:
> William Kyngesburye <[EMAIL PROTECTED]> writes:
>
>>
>> Does SQlite support UTF8 directly? Or is this what the ICU extension
>> is for? Does the sqlite3 shell program support UTF8?
>>
>> There is this spatialite extension which includes a modified sql
William Kyngesburye <[EMAIL PROTECTED]> writes:
>
> Does SQlite support UTF8 directly? Or is this what the ICU extension
> is for? Does the sqlite3 shell program support UTF8?
>
> There is this spatialite extension which includes a modified sqlite3
> shell program that "implements full UNI
Does SQlite support UTF8 directly? Or is this what the ICU extension
is for? Does the sqlite3 shell program support UTF8?
There is this spatialite extension which includes a modified sqlite3
shell program that "implements full UNICODE support". So I'm a little
confused.
-
William Kyn
Hi,
I think you should provide sample data in order to dig out the problem.
Regards,
2006/9/20, 卢炎君 <[EMAIL PROTECTED]>:
Hi guys
First of all, all data be complied as UTF-8 stored in my DB.
Second, When I used sqlite browser tool (from sourceforge)to browsed my
DB, the result of chine
Hi guys
First of all, all data be complied as UTF-8 stored in my DB.
Second, When I used sqlite browser tool (from sourceforge)to browsed my
DB, the result of chinese characters are correct, then I write a function which
just call sqlite3_column_text inside it, Demo like below:
const char
Cesar David Rodas Maldonado wrote:
Thanks Daniel!
Now i have another question! Is any way to serialize all the dates
given a
preference to SELECT a delay to the insert.
I am building a Small Library in C & SQLite that will be under GPL, is
something like Lucene. Please help me how to give a p
Thanks Daniel!
Now i have another question! Is any way to serialize all the dates given a
preference to SELECT a delay to the insert.
I am building a Small Library in C & SQLite that will be under GPL, is
something like Lucene. Please help me how to give a preference to SELECT and
a delay to INS
Cesar David Rodas Maldonado wrote:
> I wanted to ask how can i know if a given text is UTF8 or ISO-8859-1?
Well, there might be a way if you only want to know if the text is UTF-8
or ISO-8859-1 (it means that you already know that is one is the other).
There are some invalid UTF-8 sequences. If yo
Thanks peter! :D
On 7/26/06, Peter Cunderlik <[EMAIL PROTECTED]> wrote:
> I wanted to ask how can i know if a given text is UTF8 or ISO-8859-1?
If you need conversions, the simplest would be to do it manually using
look-up tables. AFAIK none of the Latin-1 characters take more than 2
bytes in
I wanted to ask how can i know if a given text is UTF8 or ISO-8859-1?
If you need conversions, the simplest would be to do it manually using
look-up tables. AFAIK none of the Latin-1 characters take more than 2
bytes in UTF-8, so having 2*256 bytes long table won't hurt.
If you want to decode s
I'm sorry! English is not my first language!! :D
I wanted to ask how can i know if a given text is UTF8 or ISO-8859-1?
Thanks and please forgive me for my english! :D
On 7/26/06, Cory Nelson <[EMAIL PROTECTED]> wrote:
ASCII is completely valid UTF-8, so no conversion is necessary.
On 7/26/06
ASCII is completely valid UTF-8, so no conversion is necessary.
On 7/26/06, Cesar David Rodas Maldonado <[EMAIL PROTECTED]> wrote:
How can i know if a given text is UTF8 or ascii? and how can i convert
between ascii to UTF8?
--
Cory Nelson
http://www.int64.org
How can i know if a given text is UTF8 or ascii? and how can i convert
between ascii to UTF8?
SQLite will store anything UTF-8 or ANSI as long as its null terminated
and escaped for '. The only place where encoding makes a difference is
in functions like length.
Will
Steven Van Ingelgem wrote:
If I have a table like:
CREATE TABLE routing (
FIELD1 VARCHAR(40)
);
what if I get a stri
If I have a table like:
CREATE TABLE routing (
FIELD1 VARCHAR(40)
);
what if I get a string which is in ANSI = 40chars, but in UTF8 > 40chars?
(for example because it uses ü and such characters...)
Does it get stored correctly?
G00fy, (aka KaReL, aka Steven)
Main Webpage : http://komma.cjb.
80 matches
Mail list logo