On 2015/02/02 19:37, Peter Haworth wrote:
On Mon, Feb 2, 2015 at 9:00 AM, <sqlite-users-requ...@sqlite.org> wrote:

From: RSmith <rsm...@rsweb.co.za>
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Encoding question
Message-ID: <54cebb71.8060...@rsweb.co.za>
Content-Type: text/plain; charset=windows-1252; format=flowed

In short, the UTF-8 Pragma settings /allows/ your data to be interpreted
as such. It doesn't /force/ it, nor magically /converts/
the data into UTF-8, and it most certainly does not under any
circumstances *guarantee* the UTF-8-ness of data. (Though it does
guarantee that /IF/ you put valid UTF-8 data in there, it will be handled
and returned correctly).

Thanks for this and the other responses, makes sense.  I suppose it's
similar to putting non-integer data into an INTEGER column.

This is in the context of an SQLite utility I sell which I'm trying to make
unicode compatible so I have no control over the data in the database, just
have to interpret it the best I can.  I've seen that there are algorithms
out there that will detect different encodings but it seems that the
algorithms are not 100% reliable.

I should also have mentioned that the question also included table names,
column names, constraint names, etc, but I'll assume the same applies to
them as for the data.

Good news here is that if you do set the DB to be UTF-8 compatible (with the discussed Pragma) all your table names, column names and database objects in general are very much UTF-8 enabled. You can name a table in Hebrew or Chinese without any issues and fill them with UTF8 data (as long as your program takes care of adding the data in correct UTF8, it willl get it back in correct UTF8).

Proof of concept - here is a script I made quickly adding some poetry from different nations to a table called Des Garçons (Boys) with two columns having names in Braille and Russian with an Index in Chinese, and querying it using other UTF-8 SQL statements. You should be able to copy-paste this and run it through another SQLite engine on any UTF8 enabled DB:

(I hope the mail forum reproduces this right)


  -- Processing Script for File: E:\Documents\SQLiteScripts\UTF8_Test.sql
  -- Script Items: 7          Parameter Count: 0
  -- 2015-02-02 20:48:22.804  |  [Success]    Script Started...
  -- 
================================================================================================

DROP TABLE IF EXISTS "Garçons";
CREATE TABLE "Garçons" (
  "ID" INTEGER PRIMARY KEY,
  "⠝⠙⠊⠞" TEXT,
  "пустынных" TEXT
);

CREATE INDEX "我能吞下" ON "Garçons" ("ID","⠝⠙⠊⠞");

INSERT INTO "Garçons" VALUES
 (1,'From the Anglo-Saxon Rune Poem (Rune version):',
'ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬')

,(2,'From Laȝamon''s Brut (The Chronicles of England, Middle English, West 
Midlands): ',
'An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde.')

,(3,'From the Tagelied of Wolfram von Eschenbach (Middle High German): ',
'Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.');


SELECT ID,"⠝⠙⠊⠞",Char(13)||"пустынных"
FROM "Garçons" WHERE "⠝⠙⠊⠞"<>'';

  -- ID    ⠝⠙⠊⠞    Char(13)||"пустынных"
  -- --    ----    ---------------------
  -- 1    From the Anglo-Saxon Rune Poem (Rune version):
ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬

  -- 2    From Laȝamon's Brut (The Chronicles of England, Middle English, West 
Midlands):
An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde.

  -- 3    From the Tagelied of Wolfram von Eschenbach (Middle High German):
Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.

  --    Item Stats:  Item No:           5             Query Size (Chars):  74
  --                 Result Columns:    3             Result Rows:         3
  --                 VM Work Steps:     35            Rows Modified:       0
  --                 Full Query Time:   0d 00h 00m and 00.001s
  --                 Query Result:      Success.
  -- 
------------------------------------------------------------------------------------------------


SELECT "пустынных"
FROM "Garçons" WHERE "пустынных" LIKE '%ᚠᚱᚩᚠᚢᚱ%';

  -- пустынных
  -- ---------
  -- ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬

  --    Item Stats:  Item No:           6             Query Size (Chars):  71
  --                 Result Columns:    1             Result Rows:         1
  --                 VM Work Steps:     23            Rows Modified:       0
  --                 Full Query Time:   -- --- --- --- --.----
  --                 Query Result:      Success.
  -- 
------------------------------------------------------------------------------------------------


SELECT "пустынных"
FROM "Garçons" WHERE "пустынных" LIKE '%tägelîch%';

  -- пустынных
  -- ---------
  -- Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.

  --    Item Stats:  Item No:           7             Query Size (Chars):  73
  --                 Result Columns:    1             Result Rows:         1
  --                 VM Work Steps:     23            Rows Modified:       0
  --                 Full Query Time:   -- --- --- --- --.----
  --                 Query Result:      Success.
  -- 
------------------------------------------------------------------------------------------------

  --   Script Stats: Total Script Execution Time:     0d 00h 00m and 00.044s
  --                 Total Script Query Time:         0d 00h 00m and 00.002s
  --                 Total Database Rows Changed:     3
  --                 Total Virtual-Machine Steps:     261
  --                 Last executed Item Index:        7
  --                 Last Script Error:
  -- 
------------------------------------------------------------------------------------------------

  -- 2015-02-02 20:48:22.824  |  [Success]    Script Success.
  -- 2015-02-02 20:48:22.825  |  [Success]    Transaction Rolled back.
  -- -------  DB-Engine Logs (Contains logged information from all DB 
connections during run)  ------
-- [2015-02-02 20:48:22.759] APPLICATION : Script E:\Documents\SQLiteScripts\UTF8_Test.sql started with Initialization at 20:48:22.759 on 02 February.
  -- 
================================================================================================

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to