On 1/10/20 2:24 PM, Tim Streater wrote:
On 10 Jan 2020, at 18:55, Keith Medcalf <kmedc...@dessus.com> wrote:
On Friday, 10 January, 2020 11:44, Tim Streater <t...@clothears.org.uk> wrote:
On 10 Jan 2020, at 18:03, Richard Hipp <d...@sqlite.org> wrote:
On 1/10/20, Dominique Devienne <ddevie...@gmail.com> wrote:
There's no way at all, to know the length of a text column with
embedded NULLs?
You can find the true length of a string in bytes from C-code using
the sqlite3_column_bytes() interface. But I cannot, off-hand, think
of a way to do that from SQL.
But if I store UTF-8 in a TEXT column, surely I'm allowed to include
NULLs in that? They are after all valid UTF-8 characters.
No, they are not. The "NUL character" in Modified UTF-8 is the two-byte
sequence 0xC0 0x80. This is specifically so that 0x00 can be used as a string
terminator. Validly encoded UTF-8 encoded text stored in a C String (0x00
terminated sequence of bytes) must not contain an embedded 0x00 byte since
that byte terminates the sequence.
Nice, but Wikipedia has that as a "Derivative" and "incompatible with the UTF-8
specification and may be rejected by conforming UTF-8 applications." It appears (though I may
have missed it) not to be mentioned on this handy site either:
https://www.utf8-chartable.de/unicode-utf8-table.pl
I shall have to check what my preferred language's wrapper does.
It is incompatible, in the sense that it uses an encoding that the UTF-8
specification says in invalid, and thus an application that performs
fully all the tests on valid data forms would reject it. In many ways it
is a compatible extension in that excluding the test that specifically
makes the form invalid, doing the processing by the general rules of
UTF-8, gives the expected result.
C Strings do not allow 0 bytes in them. This would normally mean that
they do not allow the NUL character to be in a string. This extension
allows a character which would be interpreted as the NUL character to be
represented without needing a 0 byte.
It should be pointed out that most libraries won't be checking all the
strings that pass through them to see if they violate the rule, as that
is just adding a lot of overhead for very little benefit. It is really
expected that applications will do this sort of test at the borders,
when possibly untrusted strings come in, and know that if good strings
come in, the following processing will keep the strings valid.
--
Richard Damon
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users