[sqlite] sqlite3 and Unicode

2015-05-05 Thread Teg
Hello Peter,

I use UTF-8 exclusively in the engine parts my windows applications. I
just translate to UTF-16 for the GUI layer. In that way, no matter
what platform I support, the DB's remain the same.

C



Tuesday, May 5, 2015, 3:29:54 PM, you wrote:

PH> Thanks for all the replies.  I'm still confused on this.  It sounds like
PH> maybe UTF8 is the answer on OSX but UTF-16LE on Windows.

PH> Whatever the answer to that, it seems there isn't a way to control the
PH> encoding within sqlite3 so I guess I won't worry about it.

PH> Pete
PH> lcSQL Software 
PH> Home of lcStackBrowser  and
PH> SQLiteAdmin 
PH> ___
PH> sqlite-users mailing list
PH> sqlite-users at mailinglists.sqlite.org
PH> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
 Tegmailto:Teg at djii.com



[sqlite] sqlite3 and Unicode

2015-05-05 Thread Luuk
On 5-5-2015 12:25, Clemens Ladisch wrote:
> Staffan Tylen wrote:
>> I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
>> from ascii when the value is higher than '7f'x, but storing data in sqlite
>> as text with character values beteen 'x80'x and 'ff'x seems to be no
>> problem. I previously thought that this could only be done in blob format.
>>
>> create table t (a);
>> insert into t values(cast(x'ff' as text));
>> select a,length(a),hex(a) from t;
>>   |1|FF
>>
>> My conclusion is that storing single-byte characters of any value is
>> allowed, is this true?
>
> SQLite assumes that all strings you give it are encoded in UTF-8, and
> does not actually check the encoding.  It gives you the same string
> back, so you could, in theory, use a different encoding, as long as you
> do not use any database string processing functions.  In practice, I
> would not recommend this.
>
>> At the bottom it says: *Note to Windows users:* The encoding used for the
>> filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
>> not whatever codepage is currently defined. Filenames containing
>> international characters must be converted to UTF-8 prior to passing them
>> into sqlite3_open() or sqlite3_open_v2().
>>
>> So what does "must be converted" mean? I don't know how sqlite3.exe works
>> here but if I do
>>
>> sqlite3  ?.db
>>
>> where '?' as we've seen is '82'x it happily creates a file with a
>> non-displayable character in the first position that seems to be 1 byte
>> long.
>
> The shell assumes that its arguments are UTF-8, and gives the filename
> unchanged to sqlite3_open*().  When you've entered the filename in the
> Windows console with the default settings, it is not encoded in UTF-8.
>
>

now it becomes time that windows will do some things with UTF-8 
(sigh) ;-)

/me currently watching
"Your PC will restart serveral times. Sit back and relax"
while installing updates for technical preview to Windows 10




[sqlite] sqlite3 and Unicode

2015-05-05 Thread Luuk
On 5-5-2015 11:25, Kees Nuyt wrote:
> On Tue, 05 May 2015 10:50:00 +0200, Clemens Ladisch  
> wrote:
>
>> Luuk wrote:
>>> on Windows 7:
>>> C:\temp>sqlite3.exe encoding.sqlite
>>> sqlite> select * from test;
>>> ??n
>>
>> The Windows console does not support UTF-8 with the default settings and
>> the C stdio functions.  Any data you entered in the console is not
>> encoded correctly.
>
> I totally agree. It depends on the shell what representation is used,
> some MS Windows codepage or UTF-8.
>
> The example below is done in bash on MS Windows 7 (using Mobaxterm),
> with a (outdated) Windows sqlite3 executable.
> The result of .dump contains UTF-8.
>
>> sqlite3 t.sqlite
> SQLite version 3.7.13 2012-06-11 02:05:22
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> create table t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
> sqlite> insert into t1 values (1,'??n');
> sqlite> .dump
> PRAGMA foreign_keys=OFF;
> BEGIN TRANSACTION;
> CREATE TABLE t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
> INSERT INTO "t1" VALUES(1,'??n');
> COMMIT;
> sqlite> .q
>
>> sqlite3 t.sqlite .dump| grep INSERT | od -t x1
> 000 49 4e 53 45 52 54 20 49 4e 54 4f 20 22 74 31 22
> 020 20 56 41 4c 55 45 53 28 31 2c 27 c3 a9 c3 a9 6e
> 040 27 29 3b 0a
> 044
>

OK, thanks


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Peter Haworth
Thanks for all the replies.  I'm still confused on this.  It sounds like
maybe UTF8 is the answer on OSX but UTF-16LE on Windows.

Whatever the answer to that, it seems there isn't a way to control the
encoding within sqlite3 so I guess I won't worry about it.

Pete
lcSQL Software 
Home of lcStackBrowser  and
SQLiteAdmin 


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Clemens Ladisch
Staffan Tylen wrote:
> I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
> from ascii when the value is higher than '7f'x, but storing data in sqlite
> as text with character values beteen 'x80'x and 'ff'x seems to be no
> problem. I previously thought that this could only be done in blob format.
>
> create table t (a);
> insert into t values(cast(x'ff' as text));
> select a,length(a),hex(a) from t;
>  |1|FF
>
> My conclusion is that storing single-byte characters of any value is
> allowed, is this true?

SQLite assumes that all strings you give it are encoded in UTF-8, and
does not actually check the encoding.  It gives you the same string
back, so you could, in theory, use a different encoding, as long as you
do not use any database string processing functions.  In practice, I
would not recommend this.

> At the bottom it says: *Note to Windows users:* The encoding used for the
> filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
> not whatever codepage is currently defined. Filenames containing
> international characters must be converted to UTF-8 prior to passing them
> into sqlite3_open() or sqlite3_open_v2().
>
> So what does "must be converted" mean? I don't know how sqlite3.exe works
> here but if I do
>
> sqlite3  ?.db
>
> where '?' as we've seen is '82'x it happily creates a file with a
> non-displayable character in the first position that seems to be 1 byte
> long.

The shell assumes that its arguments are UTF-8, and gives the filename
unchanged to sqlite3_open*().  When you've entered the filename in the
Windows console with the default settings, it is not encoded in UTF-8.


Regards,
Clemens


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Staffan Tylen
>
>
>>> I think it assumes UTF8 in both cases.
>>>
>>
>> when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite
>> version 3.8.8.3)
>>
>
> It's UTF16-LE data.
>
>
>
I must admit that I'm a bit confused here. If I'm not wrong UTF-8 differs
from ascii when the value is higher than '7f'x, but storing data in sqlite
as text with character values beteen 'x80'x and 'ff'x seems to be no
problem. I previously thought that this could only be done in blob format.

create table t (a);
insert into t values(cast(x'ff' as text));
select a,length(a),hex(a) from t;
 |1|FF

My conclusion is that storing single-byte characters of any value is
allowed, is this true?

Trying to find more info in the sqlite doc I found this page:
https://www.sqlite.org/c3ref/open.html

At the bottom it says: *Note to Windows users:* The encoding used for the
filename argument of sqlite3_open() and sqlite3_open_v2() must be UTF-8,
not whatever codepage is currently defined. Filenames containing
international characters must be converted to UTF-8 prior to passing them
into sqlite3_open() or sqlite3_open_v2().

So what does "must be converted" mean? I don't know how sqlite3.exe works
here but if I do

sqlite3  ?.db

where '?' as we've seen is '82'x it happily creates a file with a
non-displayable character in the first position that seems to be 1 byte
long.

Still confused.
Staffan


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Kees Nuyt
On Tue, 05 May 2015 10:50:00 +0200, Clemens Ladisch  
wrote:

>Luuk wrote:
>> on Windows 7:
>> C:\temp>sqlite3.exe encoding.sqlite
>> sqlite> select * from test;
>> ??n
>
>The Windows console does not support UTF-8 with the default settings and
>the C stdio functions.  Any data you entered in the console is not
>encoded correctly.

I totally agree. It depends on the shell what representation is used,
some MS Windows codepage or UTF-8. 

The example below is done in bash on MS Windows 7 (using Mobaxterm), 
with a (outdated) Windows sqlite3 executable.
The result of .dump contains UTF-8.

> sqlite3 t.sqlite
SQLite version 3.7.13 2012-06-11 02:05:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
sqlite> insert into t1 values (1,'??n');
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE t1 (id INTEGER PRIMARY KEY NOT NULL, tx TEXT);
INSERT INTO "t1" VALUES(1,'??n');
COMMIT;
sqlite> .q

> sqlite3 t.sqlite .dump| grep INSERT | od -t x1
000 49 4e 53 45 52 54 20 49 4e 54 4f 20 22 74 31 22
020 20 56 41 4c 55 45 53 28 31 2c 27 c3 a9 c3 a9 6e
040 27 29 3b 0a
044

-- 
Regards,
Kees Nuyt



[sqlite] sqlite3 and Unicode

2015-05-05 Thread Clemens Ladisch
Luuk wrote:
> on Windows 7:
> C:\temp>sqlite3.exe encoding.sqlite
> sqlite> select * from test;
> ??n

The Windows console does not support UTF-8 with the default settings and
the C stdio functions.  Any data you entered in the console is not
encoded correctly.


Regards,
Clemens


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Jean-Christophe Deschamps
At 09:22 05/05/2015, you wrote:

>On 4-5-2015 20:54, Richard Hipp wrote:
>>On 5/4/15, Peter Haworth  wrote:
>>>When using the .dump command with .output to a filename, what 
>>>encoding does
>>>sqlite3 for the file? Same as the database encoding?  Is it possible to
>>>change whatever encoding is used?
>>>
>>>Similarly, when using the .import command, does sqlite3 assume the input
>>>file is the same encoding as the database?
>>
>>I think it assumes UTF8 in both cases.
>
>when using '.dump', it does not create a UTF8 file on Windows 7 
>(sQlite version 3.8.8.3)

It's UTF16-LE data.


--
jcd at antichoc.net  



[sqlite] sqlite3 and Unicode

2015-05-05 Thread Luuk
On 5-5-2015 09:22, Luuk wrote:
> On 4-5-2015 20:54, Richard Hipp wrote:
>> On 5/4/15, Peter Haworth  wrote:
>>> When using the .dump command with .output to a filename, what
>>> encoding does
>>> sqlite3 for the file? Same as the database encoding?  Is it possible to
>>> change whatever encoding is used?
>>>
>>> Similarly, when using the .import command, does sqlite3 assume the input
>>> file is the same encoding as the database?
>>>
>>
>> I think it assumes UTF8 in both cases.
>>
>
> when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite
> version 3.8.8.3)

or am i doing something wrong?

on Windows 7:
C:\temp>sqlite3.exe encoding.sqlite
SQLite version 3.8.8.3 2015-02-25 13:29:11
Enter ".help" for usage hints.
sqlite> select * from test;
een
??n
sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE test (x varchar(20));
INSERT INTO "test" VALUES('een');
INSERT INTO "test" VALUES('??n');
COMMIT;
sqlite> select x, length(x), hex(x) from test;
een|3|65656E
??n|3|82826E
sqlite> PRAGMA encoding;
UTF-8
sqlite>

The '82' is not a UTF-8 representation of the '?' (length=3 is correct)


[sqlite] sqlite3 and Unicode

2015-05-05 Thread Luuk
On 4-5-2015 20:54, Richard Hipp wrote:
> On 5/4/15, Peter Haworth  wrote:
>> When using the .dump command with .output to a filename, what encoding does
>> sqlite3 for the file? Same as the database encoding?  Is it possible to
>> change whatever encoding is used?
>>
>> Similarly, when using the .import command, does sqlite3 assume the input
>> file is the same encoding as the database?
>>
>
> I think it assumes UTF8 in both cases.
>

when using '.dump', it does not create a UTF8 file on Windows 7 (sQlite 
version 3.8.8.3)


[sqlite] sqlite3 and Unicode

2015-05-04 Thread Richard Hipp
On 5/4/15, Peter Haworth  wrote:
> When using the .dump command with .output to a filename, what encoding does
> sqlite3 for the file? Same as the database encoding?  Is it possible to
> change whatever encoding is used?
>
> Similarly, when using the .import command, does sqlite3 assume the input
> file is the same encoding as the database?
>

I think it assumes UTF8 in both cases.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] sqlite3 and Unicode

2015-05-04 Thread Peter Haworth
When using the .dump command with .output to a filename, what encoding does
sqlite3 for the file? Same as the database encoding?  Is it possible to
change whatever encoding is used?

Similarly, when using the .import command, does sqlite3 assume the input
file is the same encoding as the database?

Thanks,

Pete
lcSQL Software 
Home of lcStackBrowser  and
SQLiteAdmin