On 2016/07/04 10:22 AM, Rob Golsteijn wrote:
@Clemens,



It is indeed documented that the behaviour is undefined when using a bind_text 
variant. I missed that part of documentation.

Hi Rob,
The behaviour is undefined in ALL instances where you pass null characters through C strings because of a C string peculiarity, not because of a shortcoming of SQLite.

I think you are missing an important bit in all of this - the strings in C is the problem, they think a Null character indicates termination. It has nothing to do with how SQL stores data - SQLite will store it with all bytes intact, but you typically retrieve or set it via some C calls using a C api.. and this is where the problem is. So whenever you want to push strings into the DB or get them out, and they do contain char(0) characters, then you need to read them into/from bytestreams, arrays, blobs, hex-encoded strings, or some or other method that will not be passing through a standard C string, because at that moment, and unless you force the length, the string will become shortened to the first zero byte found.

Thus, it is not the implementation that needs changing, but the usage.

I had this problem in a similar situation where I tried to store MBCS strings with 16-bit chars and 32-bit (4-byte) character strings. Apart from the enormous waste in 99% of characters, the trailing bytes were all Zero bytes (so character 'A' would be represented by 0x65 00 00 00) and if you try to store 'ABC' like that into the DB and then read it with just a C string, you end up with just A - but the DB still contains the full 'ABC', it's only your own string that doesn't know it. More wicked still, SQLite likely pushes the entire string to the memory location internally, so it is really there, but whatever next function operates on that string will only regard everything up to that first Zero byte thanks to C, and SQLite cannot help for that.

If SQLite could fix this - it wouldn't be documented as undefined, it would have been fixed.

As far as documenting the above... Any C developer reading this would probably giggle and think "Thanks captain obvious!", because this is really first-week stuff in a C-Programming-101 course. However, coming from other compiling platforms, this may not be very obvious. I prefer how Lazarus/Delphi does it (wrt the Pascal variant options as opposed to C++) where a string is a record with first the encoding, the length and then the actual bytes given. You never have to walk the memory to figure out the length and never care about null characters, it's all mapped in one place - but it does add overhead for small-ish strings and they have that stupid convention where the first character index is at 1 and not 0 - yuck, so pro's and con's for all.

Cheers,
Ryan


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to