Re: [sqlite] Making sqlite support unicode?

Mrs. Brisby Sun, 26 Oct 2003 21:41:38 -0800

On Sun, 2003-10-26 at 19:11, [EMAIL PROTECTED] wrote:
> On Sun, 26 Oct 2003 12:39:40 -0500
> "Mrs. Brisby" <[EMAIL PROTECTED]> wrote:
> > Further: I always read statements like "Microsoft C/C++ is the largest
> > most popular language platform in the world" as foolish sentiment. These
> > people obviously don't know what they're talking about and need a good
> > healthy dose of some reality. SQLite made the right decision to support
> > UTF-8. It did it largely for technical reasons but maybe in SQLite 3.0
> > it'll be able to natively store binary blobs and MAYBE UCS-16 will be
> > possible and convenient at that point.
> 
> What is UCS-16? DO you mean UCS2?


UCS16 is unicode coding #3; it's almost the same thing as UTF-16, except
it's fixed width and has a much smaller range. UTF32 has fixed width and
covers all of unicode. AFAIK, UCS2 and UCS16 are the same except (IIRC)
UCS2 actually specifies endian.


> I, for one, like to see UCS2-enabled (or native-blob-enabled)
> SQLite just to ease my development on MS Windows. IIRC I've not
> heard anything about SQLite 3.0 and blob, or other future design,
> especially from D. Hipp himself. Please don't speak as it exists now.
> And what's so bad with someone who just wants to ease his burden?

Nothing at all. I too would like the ability to store blobs in SQLite. I
didn't think I spoke as if it were in there now, and again, there isn't
anything wrong with someone who "just wants to ease his burden" - it's
just that I don't see that happening.


> And for technical reason, it's the limit of current SQLite
> implementation and C language library, not today's general one.
> Null-terminated string can save space and time for relatively short
> strings, which are common in some simple address book database, 
> but for better performance with long strings that probably SQLite is
> heavily conscios of, I still think it's odd design all data are
> null-terminated, though I have no solid benchmark. (Note that I don't
> talk about UCS2 here, because as it takes more space on disk
> performance degrades)

It's good to use null-terminated in many cases; especially in collating
and sorting. It helps to understand that in those cases you stop
processing _after_ you see the terminator (and treat the terminator as
it is: zero.)

> As for Unicode, UTF-8 is good and very popular these days to be
> nearly standard, but it's clearly poor in character-wise computation
> performance compared to UTF-16, from programmer's point of view,
> or "technical reason". In UTF-8 you have 1-byte character or 3-byte
> character or 15-byte character, while in UTF-16 you have only 16bit
> characters or two 16bit character pairs. Not only VB has it as internal
> character representation, but also newer C# and .NET has, though it
> can export UTF-8 string. UTF-16 is also used in MacOS X and its
> HFS+ file system, as in MS Windows and NTFS.

UTF-16 is NOT used in HFS+. HFS+ still uses ASCII with some "tricks".
UFS is what's "preferred" in MacOS X, and it doesn't use UTF-16 either.
UTF-16 isn't what we're talking about anyway, it's UCS16.


> Since there are those filesystems with UTF-16 interface, I'd like to
> see UCS2/UTF-16 filename interface in SQLite while I don't want to
> store UCS2 natively as data at this time, please take a look at
> http://www.sqlite.org/cvstrac/tktview?tn=239 SQLite *does* provide
> MS Windows version anyway, why don't do as they do?

There are _better reasons_ to support blobs, and the collating/sorting
rules are simply much, well, simpler than UTF16 and less restrictive
than UCS16. I suggested a way to avoid the performance hit, and I'm
really surprised nobody is trying to disassemble that on it's technical
details...


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sqlite] Making sqlite support unicode?

Reply via email to