I have been looking at how to build the _ldap module for python3.x. It's a pretty straightforward binary port except for one major headache with strings.

So, here I'd like to propose and explain some API changes Python 3.x users. I'm focused on the _ldap module right now, but the (pure python) library modules will experience carry-on effects.

These are my goals with the _ldap module:
  • allow Python 2.x clients to keep working without changes
  • dual environment support: both 2.x and 3.x build environments
First, the easy one: int() is gone in 3.x. The _ldap module uses int objects to return asynchronous message IDs, and to define a bunch of constants. In Python 3.0 longs look and act just like old ints but with more precision, so there should be few, if any, visible problem with regard to this change in the _ldap API. (Unless you are relying on overflow effects, which sounds suspicious anyway.)

1. Python-ldap compiled for 2.x should continue to use int for Message IDs and constants, but when compiled for 3.x will use longs.

Next, and much harder to deal with is the loss of str(). There is heavy reliance of str() objects by _ldap to hold binary data, for attribute values. But, Python 3.0 does not have an 8-bit str(). It has what 2.x used to called unicode(). For 8-bit data we have a new type called bytes(). The issue is that conversion between the two is not automatic.

For existing LDAP applications, I expect this to open up a world of porting pain. This is because lots of actual attribute values are ASCII and having the values available in a string type has been a very handy convenience. But, strictly, it is wrong. LDAP attribute values are binary OCTET STRINGs, and in the unicode-only text world of Python 3.x, applications that want text will need to decode these binary strings in accordance with the attribute's schema. This problem is something I'd like feedback on.

Yet there is some good news. The text/binary problem appears to be restricted just to attribute values and to some authentication parameters like SASL passwords. Unaffected are attribute names, DNs and search filters. This is because they are transmitted as the ASN.1 LDAPString type, which is a UTF-8 encoded OCTET STRING. So it makes sense for the _ldap API to accept unicode strings for these. But, attribute values (OCTET STRING) surely must become bytes().

Using the bytes() type is going to cause much pain with potentially lots of "TypeError: Can't convert 'bytes' object to str implicitly" messages everywhere. But, it seems that this is part-and-parcel of porting to Python 3.x. If you really need strings, learn the encoding types of your attributes and call str(value, "UTF-8") or str(value, "ASCII") to convert them.

So, my other proposed API changes are:

2. Python-ldap compiled for 2.x should continue to accept and produce strings as before.
3. When compiled for 3.x,
values that are UTF-8 LDAPString on the wire (attribute names, DNs, search filters, etc) should be passed in and out as (unicode) strings. Attribute value data, and other places where BER binary values are passed, should be passed in and out as bytes(). There should be no automatic conversion between bytes() and unicode str().

A library class that automatically converts attribute values of type bytes() into various python types via an attribute schema is possible, but at the _ldap level this is not necessary. It may even be better for an application tightly coupled to an LDAP schema to do this conversion itself.

In summary, python-ldap should have no API change visible to 2.x clients. But, 3.x clients should need to use the bytes() type explicitly for attribute values.


Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
Python-LDAP-dev mailing list

Reply via email to