Re: [sqlite] Re: Re: Why are strings in hexadecimal notation always blobs?

2008-01-18 Thread Nicolas Williams
On Sat, Jan 19, 2008 at 03:23:32AM +0700, Lothar Scholz wrote:
> IT>Does this really make sense to you?
> 
> Yes the only reason left for a BLOB would be a containing zero byte and
> any illegal UTF8 sequence of bytes.

Or wanting to avoid collations that are aware of, say, Unicode
normalization, Unicode case transformations, ...

Really, a blob of bytes either is TEXT or not, and this distinction can
make a huge difference for some operations.

> For me it looks like the introduction of the current logic is just for
> backward compatibility that embedded 0 characters are not allowed in a
> string.

For me it looks like a form of string content encoding tagging, with a
one-bit content type tag: Unicode vs. arbitrary binary content types.
(SQLite also has a collation tag.)

Additional typing of octet strings could be really useful, or not.  I
don't know.  Perhaps some users have strings in many different codesets
and could use TEXT type variants that include codeset information.  But
such users can always add such tags as columns to existing tables, or
they can convert to UTF-8 (or UTF-16) and live with any lossiness.

> IMHO it would be cleaner if we conceptionally only have BLOBS
> and check for other datatypes on demand maybe with some caching.
> Exactly what TCL is doing it when it assumes everything is a string and
> (since version 7.X )we got the cached integer or double values.

That would only be true if the <, >, =, <=, >=, <=>, LIKE, GLOB and
other such operators behaved exactly the same for all strings regardless
of whether they were Unicode strings or blobs.  But they don't (or at
least, I don't want them to).

SQLite does not have a normalization-insensitive string comparison
operation/function today, but it might eventually, in which case even
the basic string equality/inequality comparison operation will behave
differently given TEXT inputs vs. BLOB inputs.

Please keep TEXT and BLOB as distinct types,

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Re: Why are strings in hexadecimal notation always blobs?

2008-01-18 Thread Lothar Scholz
Hello Igor,

Saturday, January 19, 2008, 12:02:15 AM, you wrote:

IT> You misunderstand the dynamics of datatypes then.

Yes maybe. With the current implementation i really do not understand
the point anyway neither with my understanding nor with yours.

IT> Wait a minute. Didn't you just say that you _want_ text strings to be 
IT> able to contain control characters? So what's left for the BLOB then?

IT> Suppose I want to insert, say, a bitmap image into the database - as a
IT> BLOB, naturally. You are saying that, if it doesn't just happen to 
IT> contain at least one zero byte, it will have to go in as a string. So if
IT> it has a black pixel, it's a BLOB. If it doesn't have any black pixels,
IT> it's a string. Does this really make sense to you?

Yes the only reason left for a BLOB would be a containing zero byte and
any illegal UTF8 sequence of bytes.

For me it looks like the introduction of the current logic is just for
backward compatibility that embedded 0 characters are not allowed in a
string.

IMHO it would be cleaner if we conceptionally only have BLOBS
and check for other datatypes on demand maybe with some caching.
Exactly what TCL is doing it when it assumes everything is a string and
(since version 7.X )we got the cached integer or double values.

But well i can code around this like usual but i will bring up the
topic again if there is a discussion about a 4.0 release.


-- 
Best regards,
 Lothar Scholzmailto:[EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Re: Why are strings in hexadecimal notation always blobs?

2008-01-18 Thread Igor Tandetnik

Lothar Scholz
 wrote:

Friday, January 18, 2008, 8:09:02 PM, you wrote:

Lothar Scholz

wrote:

it seems that "Lothar" is stored as a TEXT value but when i store
X'4C6F74686172' it is a BLOB.
What is the reason for it?



Same reason 1 is an integer literal but '1' is a string literal.
X'4C6F74686172' is a blob literal.


Sorry as far as i understand the dynamics of datatypes they should
depend on the bytes that are passed but not on the literal that
is used for notation inside a textual SQL statement.


You misunderstand the dynamics of datatypes then.

How do you plan to determine, just looking at a sequence of bytes, 
whether it looks like a string or not? Especially given your claim below 
that you want to support control characters in a string. In your 
hypothetical implementation, how would I construct a BLOB literal that 
would _not_ be interpreted as a string, but inserted as a BLOB as I 
intended?



Another question, how would you realiable represent contrl characters
in the range 1-31 in a string? It is not really good to add them as
plain code in text files and SQLite does not have C like backslash
quoting. Especially the automatic %R%N->%N conversions might be a
huge
problem. And i don't think we should restrict the TEXT data type to
anything more then non zero bytes.


What do you mean, data content? How is it supposed to know that a
particular sequence of bytes is supposed to represent a string,
without
the help of mind-reading hardware? After all, you don't expect the
number 48 to be magically interpreted as a string '0'. You don't,
right?


Well if it looks like a number it is a number. If it does not look
like a number it is either a TEXT or if it contains zero (or maybe
non text
control characters others then usually defined \f \v \r \n) it is a
BLOB. This would make sense for me.


Wait a minute. Didn't you just say that you _want_ text strings to be 
able to contain control characters? So what's left for the BLOB then?


Suppose I want to insert, say, a bitmap image into the database - as a 
BLOB, naturally. You are saying that, if it doesn't just happen to 
contain at least one zero byte, it will have to go in as a string. So if 
it has a black pixel, it's a BLOB. If it doesn't have any black pixels, 
it's a string. Does this really make sense to you?


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-