Jesus Cea wrote: > > Working on the 3.0 version of bsddb, I have the following issue. > > Until 3.0, keys and values were strings. For bsddb, they are opaque, and > stored unchanged. > > In 3.0 the string type is replaced by unicode. A new "byte" type is > added. So, code like "db.put('key','value')" needs to be changed to > "db.put(bytes('key', 'utf-8'), bytes('value', 'utf-8'))", or something > similar. > > This is ugly and generates incompatible code with previous python releases. > > I was wondering what to do. The obvious path would be to put a proxy > object between application code and bsddb, doing the byte<->unicode > translation on the fly. This could be problematic when dealing with > legacy data, since it couldn't be a valid encoded bytestring. Data > misspresentation would be dangerous and can go undetected for a long > time, slowly corrupting the database data. > > Moreover, the data is application specific, so automatic conversion can > introduce incompatibilities and bugs. > > Another approach would be to add a new bsddb method to specify the > default encoding to use to convert unicode->bytes, and to do the > conversion internally when getting unicode data as a parameter. The > issue here is that "u'hi' != b'hi'", so the translation must be done > both when storing and when retrieving data. > > These problems are caused because now string!=bytes. In fact the > approach in 3.0 is the right one, and any try to hide this difference > with proxy objects or automatic conversion is going to bite us, someday. > > So, I'm thinking seriously in accepting *ONLY* "bytes" in the bsddb API > (when working under Python 3.0), and do the proxy thing *ONLY* in the > testsuite, to be able to reuse it. > > What do you think?. > > PS: Since most of the time keys/values are 7bit, a direct "ascii" > encoding would be fine... until we are required to store a 8 bit value.
I propose to do something similar to the io.open() function: add two parameters, 'encoding' and 'errors', that default to "ascii" and "strict". Then do the conversions, and raise exceptions on every failure... -- Amaury Forgeot d'Arc _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com