Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-15 Thread Nick Coghlan
Christian Heimes wrote: > Jesus Cea wrote: >> Current pybsddb code don't allow subclassing or adding new attibutes to >> a given instance. I will (probably) work on this for a future pybsddb >> version. Pointers to references to do this kind of magic welcomed :-) > > For making it subclass-able yo

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-14 Thread Christian Heimes
Jesus Cea wrote: Current pybsddb code don't allow subclassing or adding new attibutes to a given instance. I will (probably) work on this for a future pybsddb version. Pointers to references to do this kind of magic welcomed :-) For making it subclass-able you have to add Py_TPFLAGS_BASETYPE to

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-14 Thread M.-A. Lemburg
On 2008-08-14 07:10, Jesus Cea wrote: M.-A. Lemburg wrote: | BTW: If you make the database object subclassable, an application | could easily implement whatever strategy is needed on top of the | bytes-only interface. Current pybsddb code don't allow subclassing or adding new attibutes to a give

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-13 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 M.-A. Lemburg wrote: | Since bsddb is about storing arbitrary data, I think just accepting | bytes for both keys and values is more intuitive. Agreed. This is the approach I've done in current code. | The question of encoding is application and data

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-13 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Mike Klaas wrote: | You may find this thread to be relevant: | | http://mail.python.org/pipermail/python-3000/2007-August/009197.html Very relevant, indeed. I will think about the "callback" option to convert both keys and values (in both directions

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-08-13 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Martin v. Löwis wrote: |> So, I'm thinking seriously in accepting *ONLY* "bytes" in the bsddb API |> (when working under Python 3.0), and do the proxy thing *ONLY* in the |> testsuite, to be able to reuse it. |> |> What do you think?. | | I think you

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-30 Thread M.-A. Lemburg
On 2008-07-30 07:17, Andrew McNamara wrote: What about a new keyword argument to the constructor, "encoding". If specified, *only* accept unicode (and do the conversion internally). Would that apply to keys, values, or both? I admit that I deliberately glossed over that. 8-) One option is to

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Andrew McNamara
>> What about a new keyword argument to the constructor, "encoding". If >> specified, *only* accept unicode (and do the conversion internally). > >Would that apply to keys, values, or both? I admit that I deliberately glossed over that. 8-) One option is to say "both", just to keep it simple: if

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Martin v. Löwis
> What about a new keyword argument to the constructor, "encoding". If > specified, *only* accept unicode (and do the conversion internally). Would that apply to keys, values, or both? Regards, Martin ___ Python-3000 mailing list Python-3000@python.org

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Andrew McNamara
>Another approach would be to add a new bsddb method to specify the >default encoding to use to convert unicode->bytes, and to do the >conversion internally when getting unicode data as a parameter. The >issue here is that "u'hi' != b'hi'", so the translation must be done >both when storing and whe

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Mike Klaas
On 29-Jul-08, at 7:32 AM, Jesus Cea wrote: Working on the 3.0 version of bsddb, I have the following issue. Until 3.0, keys and values were strings. For bsddb, they are opaque, and stored unchanged. In 3.0 the string type is replaced by unicode. A new "byte" type is added. So, code like "d

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Martin v. Löwis
> So, I'm thinking seriously in accepting *ONLY* "bytes" in the bsddb API > (when working under Python 3.0), and do the proxy thing *ONLY* in the > testsuite, to be able to reuse it. > > What do you think?. I think you should write the test suite in terms of bytes. Regards, Martin __

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Antoine Pitrou
Hi, > In 3.0 the string type is replaced by unicode. A new "byte" type is > added. So, code like "db.put('key','value')" needs to be changed to > "db.put(bytes('key', 'utf-8'), bytes('value', 'utf-8'))", or something > similar. Why not "db.put(b'key', b'value')"? > This is ugly and generates in

Re: [Python-3000] Bytes and unicode conversion in C extensions

2008-07-29 Thread Amaury Forgeot d'Arc
Jesus Cea wrote: > > Working on the 3.0 version of bsddb, I have the following issue. > > Until 3.0, keys and values were strings. For bsddb, they are opaque, and > stored unchanged. > > In 3.0 the string type is replaced by unicode. A new "byte" type is > added. So, code like "db.put('key','value'