On 1/13/20 5:24 AM, Dominique Devienne wrote:
On Mon, Jan 13, 2020 at 11:07 AM Keith Medcalf <kmedc...@dessus.com> wrote:
On Monday, 13 January, 2020 02:27, Dominique Devienne <ddevie...@gmail.com> 
wrote:
I'd vote for a lengthof(col) that's always O(1) for both text and blob
So what should lengthof(something) return the number of bytes in the 'database 
encoding' or something else?
Bytes of course. Of the data stored, i.e. excluding the header byte
and encoded size (if any) from the file-format.
Basically the same as length() *except* for text values, resulting in
O(1) behavior. --DD

PS: I keep forgetting length(text_val) returns the number of
code-points in fact :)
PPS: Surrogate pairs count as one or two code points? That's just
bait, I don't really want to know :)))
Re the PPS, UTF-8 isn't allowed to have Surrogate Pairs. Non-BMP characters which would use Surrogate Pairs in UTF-16 are supposed to be converted to their fundamental 21 bit value and that encoded into UTF-8. If the code doesn't validate the data well enough to catch that issue, then I suspect the character counting would count each half of the surrogate pairs as a code-point,

--
Richard Damon

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to