Re: [sqlite] SQLite 3.0 - Why only one text encoding per database?

D. Richard Hipp Mon, 21 Jun 2004 16:17:47 -0700

Tim wrote:
> It sounds like SQLite 3.0 only supports one text representation per database
> ("Internally and in the disk file, the same text representation is used
> everywhere.").  Is there a particular reason for this limitation?

The reason is to keep the API simple.  If the programmer selects a single
text representation for all data (which they are advised to do) then they
only have to provide a single version for each user-defined function and
user-defined collating sequence.  Even if they do not stick to a single
text representation, the most they'll have to code up is three versions
of each user-defined function and collating sequence.  If we were to
allow mixed text representations in the database, 9 different versions
of each collating function would be required, and up to 3**N versions
of each user-defined function, where N is the number of parameters in
the function.

Of course, the programmer could be lazy and only provide one version of
each function and collating sequence and SQLite would automatically
convert text representations as needed.  But constantly flipping text
representations around is needlessly time consuming.

> What if I want to store one column as UTF8 text and another column as UTF16?
> Other databases offer a separate 16-bit "ntext" column type for this purpose.

You can still store one column as UTF8 and the other as UTF16 using
sqlite3_bind_text() and sqlite3_bind_text16().  It's just that one or
the other or both will be converted into the native representation of
the database before being written to disk.

> I could store anything with BLOBs, but then I'd lose the ability to use SORT.
>

BLOBs sort in SQLite.  They sort in memcmp() order.  And BLOBs sort after
text.

>
> Storing only one type of text data also seems to go against the idea of
> SQLite's "manifest typing" where "datatype is associated with the data
> itself, not with its container."
>

SQLite does not consider text representation part of the datatype.
Apart from performance and database file size, the programmer has
no way of knowing what text representation is being used internally.
The fact that one of three different representations might be used
internally shows through into the user-defined function and collating
sequence interfaces in order to allow programmers to optimize for
performance.  But apart from that, you don't really know what encodings
SQLite is using internally.  SQLite might convert all strings to
EBCDIC internally for all you know.  The main interface would be
the same.

Other database engines claim to store UTF-16 in some fields and UTF-8
in others.  But is that really what they are doing?  Are you certain
they are not converting the UTF-16 data into UTF-8 for storage then
converting it back when you request the data?  How can you tell?

Another way to look at it:  SQLite 2.8 made you choose between UTF-8 and
ISO8859 at compile-time.  SQLite 3.0 lets you choose between UTF-8,
UTF-16BE, and UTF-16LE and it lets you defer the choice to runtime.
But you still have to make the choice.


--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] SQLite 3.0 - Why only one text encoding per database?

Reply via email to