On Mon, 25 Apr 2016 02:31:25 +0100
Simon Slavin <slavins at bigfraud.org> wrote:

> > These are different concerns, and they don't really pose any
> > difficulty.  Given an encoding, a column of N characters can take
> > up to x * N bytes.  Back in the day, "x" was 1.  Now it's something
> > else.  No big deal.  
> 
> No.  Unicode uses different numbers of bytes to store different
> characters.  You cannot tell from the number of bytes in a string how
> many characters it encodes, and the programming required to work out
> the string length is complicated.  

"up to", I said.  You're right that you can't know the byte-offset for a
letter in a UTF-8 string.  What I'm saying is that given an encoding
and a string, you *do* know the maximum number of bytes required.
>From the DBMS's point of view, a string of known size and encoding can
be managed with a fixed length buffer.  

> I would definitely be reading the documentation for the SQL engine I
> was using.

Well, yeah.  :-)  It's well to know how the software you're using
works, whether it's the DBMS or something else.  

Although I have to say I've never had to worry about the size of my
database as a function of string size.  When size matters, rows
dominate, and large numbers of rows never seem to come with big
strings.  

--jkl

Reply via email to