[sqlite] Feedback from a wrapper writer

Roger Binns Mon, 13 Dec 2004 12:29:13 -0800

These are various issues I had and some suggestions based on
writing a Python wrapper for SQLite 3.  Overall I was very
impressed by the quality of the doc, and of course SQLite 3
itself.  Overall the wrapper ended up being just under 3,000
lines of code and provides all the functionality of the C API
including the experimental functions.

Python supports both a String type (basically char[]) and a Unicode type. I could use the plain or -16 suffix versions of the apis as appropriate. Except that Python can store unicode as UTF16 or as UTF32. Several internal Python functions also put in byte order markers (BOMs). In some cases I have to convert from the String type to Unicode (the encoding of the String type is rarely UTF8) and then for UTF32 strings convert back to UTF8. The SQLite source only copes with UTF16 BOMs, but BOMs exist for UTF8, UTF16, UTF32 and even UTF64. To help get around all this confusion, I suggest another series of APIs for SQLite when dealing with strings. They take a void pointer and the length in bytes. There is also a parameter saying what the encoding is. This encoding is ignored if a BOM is found.

I had to use SQLITE_TRANSIENT for all supplied byte data.
This is because the pointer I supplied is not the pointer
on which the free function needs to be called.  The free
function needs to be called on the containing object (in
my case a PyObject*) of which the pointer is a field.  I
don't have suggestions for how to improve this since they
would just make the code more complicated either for the
wrapper writer or the SQLite author.

Unfortunately I ended up with four places (two times two) of almost duplicate code.

- When implementing user defined functions, I had to convert a sqlite3_value* to a Python type using the sqlite3_value_TYPE functions. For returning results from a row, an sqlite3_value* is not available so I had to use the sqlite3_column_TYPE functions. If possible it would be nice to get a sqlite3_value* for a column (assuming it is stored that way) and then I can use one piece of code for converting from SQLite types to Python types

- A similar situation occurs when going from Python types to SQLite types which happens when setting the results of a user defined function and setting bindings, using sqlite3_result_TYPE and sqlite3_bind_TYPE respectively. Being able to allocate a typed sqlite3_value* and then giving that to sqlite3_result/sqlite3_bind would reduce things to one piece of code. It is very hard for me to do garbage collection when a registered function or collation overwrites one of the same name. For example if I register a function "foo" and then register "FOO" I would assume the earlier one is no longer used in SQLite. I currently hold on to my per callback blocks of data until the whole database is closed, since I never know when an earlier function has been release. The simplest solution would be if I could ask SQLite if a particular void*userdata still belongs to any registered function. That way I would know when SQLite is done with it.

It would be nice to have an error code allocated for user/wrapper. For example when there is an error in a Python callback, I have to set an error code in SQLite. Currently I just use SQLITE_ERROR but later on it isn't clear who generated that code. If there was something like SQLITE_WRAPPER or SQLITE_USER then it would be unambiguous.

The only API I didn't use was sqlite3_collation_needed.
It is very difficult to use in a wrapper library since
I would have to have a way of tracking from the sqlite3*
back to the encapsulating PyObject.  I also remain unconvinced
of its utility.

There are several overlapping ranges of constants. The textual names don't have unique prefixes. For example 2 could be SQLITE_INTERNAL, SQLITE_FLOAT, SQLITE_CREATE_TABLE, SQLITE_IGNORE or one of SQLITE_UTF16BE, SQLITE_UTF16 (the latter two have conflicting definitions in the doc for create_collation and create_function). This makes it somewhat messy converting a number back to a name since you have to be very aware of the context and which textual names are part of which series of constants.

  For example if I wanted to convert the error code 2 back into
  the error code name, I can't just find the first name begining
  with SQLITE_ that has a value of two.  If error code names had
  a unique prefix such as SQLITE_E_ then it would be easy.  This
  issue arises because all the constants share the same name space.
  In my wrapper you just say apsw.SQLITE_INTERNAL rather than
  apsw.errors.SQLITE_INTERNAL.

Roger

[sqlite] Feedback from a wrapper writer

Reply via email to