On 2016/07/04 10:22 AM, Rob Golsteijn wrote:
@Clemens,
It is indeed documented that the behaviour is undefined when using a bind_text
variant. I missed that part of documentation.
Hi Rob,
The behaviour is undefined in ALL instances where you pass null
characters through C strings because of a C string peculiarity, not
because of a shortcoming of SQLite.
I think you are missing an important bit in all of this - the strings in
C is the problem, they think a Null character indicates termination. It
has nothing to do with how SQL stores data - SQLite will store it with
all bytes intact, but you typically retrieve or set it via some C calls
using a C api.. and this is where the problem is. So whenever you want
to push strings into the DB or get them out, and they do contain char(0)
characters, then you need to read them into/from bytestreams, arrays,
blobs, hex-encoded strings, or some or other method that will not be
passing through a standard C string, because at that moment, and unless
you force the length, the string will become shortened to the first zero
byte found.
Thus, it is not the implementation that needs changing, but the usage.
I had this problem in a similar situation where I tried to store MBCS
strings with 16-bit chars and 32-bit (4-byte) character strings. Apart
from the enormous waste in 99% of characters, the trailing bytes were
all Zero bytes (so character 'A' would be represented by 0x65 00 00 00)
and if you try to store 'ABC' like that into the DB and then read it
with just a C string, you end up with just A - but the DB still contains
the full 'ABC', it's only your own string that doesn't know it. More
wicked still, SQLite likely pushes the entire string to the memory
location internally, so it is really there, but whatever next function
operates on that string will only regard everything up to that first
Zero byte thanks to C, and SQLite cannot help for that.
If SQLite could fix this - it wouldn't be documented as undefined, it
would have been fixed.
As far as documenting the above... Any C developer reading this would
probably giggle and think "Thanks captain obvious!", because this is
really first-week stuff in a C-Programming-101 course. However, coming
from other compiling platforms, this may not be very obvious. I prefer
how Lazarus/Delphi does it (wrt the Pascal variant options as opposed to
C++) where a string is a record with first the encoding, the length and
then the actual bytes given. You never have to walk the memory to figure
out the length and never care about null characters, it's all mapped in
one place - but it does add overhead for small-ish strings and they have
that stupid convention where the first character index is at 1 and not 0
- yuck, so pro's and con's for all.
Cheers,
Ryan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users