Re: [sqlite] Unexplained table bloat

Richard Damon Mon, 13 Jan 2020 04:29:21 -0800

On 1/13/20 5:24 AM, Dominique Devienne wrote:

On Mon, Jan 13, 2020 at 11:07 AM Keith Medcalf <kmedc...@dessus.com> wrote:

On Monday, 13 January, 2020 02:27, Dominique Devienne <ddevie...@gmail.com> 
wrote:

I'd vote for a lengthof(col) that's always O(1) for both text and blob

So what should lengthof(something) return the number of bytes in the 'database 
encoding' or something else?

Bytes of course. Of the data stored, i.e. excluding the header byte
and encoded size (if any) from the file-format.
Basically the same as length() *except* for text values, resulting in
O(1) behavior. --DD


PS: I keep forgetting length(text_val) returns the number of
code-points in fact :)
PPS: Surrogate pairs count as one or two code points? That's just
bait, I don't really want to know :)))

Re the PPS, UTF-8 isn't allowed to have Surrogate Pairs. Non-BMPcharacters which would use Surrogate Pairs in UTF-16 are supposed to beconverted to their fundamental 21 bit value and that encoded into UTF-8.If the code doesn't validate the data well enough to catch that issue,then I suspect the character counting would count each half of thesurrogate pairs as a code-point,


--
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Unexplained table bloat

Reply via email to