Re: [sqlite] Unexplained table bloat

Richard Damon Fri, 10 Jan 2020 11:54:10 -0800

On 1/10/20 2:24 PM, Tim Streater wrote:

On 10 Jan 2020, at 18:55, Keith Medcalf <kmedc...@dessus.com> wrote:

On Friday, 10 January, 2020 11:44, Tim Streater <t...@clothears.org.uk> wrote:

On 10 Jan 2020, at 18:03, Richard Hipp <d...@sqlite.org> wrote:

On 1/10/20, Dominique Devienne <ddevie...@gmail.com> wrote:

There's no way at all, to know the length of a text column with
embedded NULLs?

You can find the true length of a string in bytes from C-code using
the sqlite3_column_bytes() interface. But I cannot, off-hand, think
of a way to do that from SQL.

But if I store UTF-8 in a TEXT column, surely I'm allowed to include
NULLs in that? They are after all valid UTF-8 characters.

No, they are not. The "NUL character" in Modified UTF-8 is the two-byte
sequence 0xC0 0x80. This is specifically so that 0x00 can be used as a string
terminator. Validly encoded UTF-8 encoded text stored in a C String (0x00
terminated sequence of bytes) must not contain an embedded 0x00 byte since
that byte terminates the sequence.

Nice, but Wikipedia has that as a "Derivative" and "incompatible with the UTF-8 
specification and may be rejected by conforming UTF-8 applications." It appears (though I may 
have missed it) not to be mentioned on this handy site either:

https://www.utf8-chartable.de/unicode-utf8-table.pl

I shall have to check what my preferred language's wrapper does.

It is incompatible, in the sense that it uses an encoding that the UTF-8specification says in invalid, and thus an application that performsfully all the tests on valid data forms would reject it. In many ways itis a compatible extension in that excluding the test that specificallymakes the form invalid, doing the processing by the general rules ofUTF-8, gives the expected result.

C Strings do not allow 0 bytes in them. This would normally mean thatthey do not allow the NUL character to be in a string. This extensionallows a character which would be interpreted as the NUL character to berepresented without needing a 0 byte.

It should be pointed out that most libraries won't be checking all thestrings that pass through them to see if they violate the rule, as thatis just adding a lot of overhead for very little benefit. It is reallyexpected that applications will do this sort of test at the borders,when possibly untrusted strings come in, and know that if good stringscome in, the following processing will keep the strings valid.


--
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Unexplained table bloat

Reply via email to