> What I would propose is to be more robust in handling such incorrect Unicode
> strings, so that application can e.g. insert really any string, not only
> those that comply Unicode standards.
>

Doing this can potentially lead to security exploits in programs
that use SQLite.  If you want to handle ill-formed UTF8 strings,
use a BLOB.

1. To explain a bit more - I don't plan to handle ill-formed UTF-16
(really UTF-16, not UTF-8) strings, it's just that strings to DB
application come from various sources, e.g. are read from some files
and such strings can easily be incorrect. That said, I'd still expect
that I can insert such strings to an ordinary text field in DB.

As for security exploits, I don't see any, Unicode 4.0 standard allows
applications to ignore such incorrect characters. Citation:

Applications are free to use any of these noncharacter code points
internally but should never attempt to exchange them. If a
noncharacter is received in open interchange, an application is not
required to interpret it in any way. It is good practice, however, to
recognize it as a noncharacter and to take appropriate action, such as
removing it from the text. Note that Unicode conformance freely allows
the removal of these characters. (See C10 in Section 3.2, Conformance
Requirements.) [End of citation]

2. No matter how you feel about 1., there's another problem: SQLite
fails e.g. on 0xE000 UTF-16 character, which, as far as I know, isn't
illegal. As a different example, SQLite doesn't fail on 0xFFFF
character, which is by definition of Unicode standard a 'noncharacter'
and isn't allowed in open interchange of Unicode text data.


So, the upshot is, that I think SQLite should simply discard any
Unicode 'noncharacters' in SQL statements and don't consider such
statements as invalid.

Jiri

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to