Re: unicode value

Alain Spineux Fri, 20 Jul 2007 05:52:40 -0700

Hi

First I'm not upset by anything.


You are responsible to maintain the package in an healthy state.
This also your responsibility to add or remove some features and then
to maintain them.
Thanks for doing that.

As you have suggested, I made a class wrapper that keep  both code as
independent as possible, and I'm happy with that.

Anyway I'have some comment about your answerer ...

On 7/19/07, Michael Ströder <[EMAIL PROTECTED]> wrote:

Alain,

Alain Spineux wrote:
>
> When investigating about python and unicode, I read somewhere (in a PEP
> I thing) that python functions should accept and manage unicode string
> as well as normal string.

Without knowing the PEP (reference?) I guess this affects functions
which takes a string as an argument and process it directly returning a
result. In context of python-ldap this would be directly applicable to
the functions in modules ldap.dn and ldap.filter.


Unicode string in python are made in a way that let the developer use them
in a complete transparent way. If the libraries are respecting this
principle too,
the developer can exchange data from different sources (user input,
SQL, ldap ...)
without never making any conversion.

The problem is strings are also used for binary storage and LDAP don't
make difference
between both usage (no charset and unicode types like in SQL), only
the developer know
and can make the conversion.


The basic problem here is that for the sake of backward-compability to
LDAPv2 the charset has to be passed around either. That's what I'm doing
in web2ldap.

> Of course if these strings could contains user
> readable characters.

Let's call that "textual strings".

> Anyway I see 2 solutions
>
> 1. Let result() return non unicode strings. _HERE_ The user know all
> returned
> strings are normal strings utf-8 encoded and he can do the encoding
> himself. A helper function doing the job for the result structure
> should be welcome.
>
> 2. Do the conversion regarding the info provided in the query, as my
> source sample does.
>
> I answer now some of your previous comment:
>
>> > In this case maybe is it possible to use [ '*', u'givenName', u'sn' ]
>> > to convert only 'givenName' and 'sn'
>
>> But then you will not gain much! Still the application has to know which
>> attributes have to be converted. => It's not worth hiding the conversion
>> within python-ldap.
>
> I don't really hide the conversion, because the user has to request it using
> unicode field name.

I don't like this approach. The type of the attribute names is causing a
type conversion side-effect. I don't consider this to be good design and
I guess most Python developers would not expect something like this.
Think about an application accidently passing in Unicode strings but is
not really prepared to get the Unicode/string mix.


Today passing unicode argument to ldap functions raise an exception, then
no accidents is possible :-)
On the other side, with unicode support, things could accidentally
work as expected.
But this is only speculation about witch inconvenient is the worst.


> Do you really consider to add the schema processing for unicode
> integration in the future?

Nope. It's up to the application programmer, especially based on whether
LDAPv2 support is still needed for a particular application or not. I
consider python-ldap to be rather a low-level API.

> Keep in mind, none of my code break compatibility with existing application.

Generally I don't want to discourage people to work on something. But
sorry, I won't add your code to python-ldap's Lib/. I hope you're not
upset. My proposal would be to add it under Demo/ so you're work can be
considered to be used by others. Or you can put it on your own web page
(for further development) and I'll put a link to it on
http://python-ldap.sourceforge.net/docs.shtml.


I have no website today :-(
Please use the last version in attachment.

Regards


Ciao, Michael.



--
--
Alain Spineux
aspineux gmail com
May the sources be with you

#!/usr/bin/env python
# 
# This library wrapper bring some Unicode support to python-ldap
#
# This wrapper helped the author to avoid any explicit
# pre and post unicode conversion, this was the goal.
#
# The main idea is that any unicode argument will be encoded 
# using utf-8 encoding before to reach the ldap server.
# On the other side any result will be decoded regarding the 
# tips you have given in the attribute list.
#
# Read the code at the end to have more info about 
# how to use it
#
# written by Alain Spineux <alain spineux AT gmail com>
#
# v0.1.0 Fri Jul 20 14:49:06 CEST 2007


__version__ = '0.1.0'

import types, datetime
import ldap, ldapurl, ldap.modlist

from ldap.ldapobject import LDAPObject
from ldap.ldapobject import ReconnectLDAPObject

def unicode2utf8(st):
    """Convert unicode (and only unicode) string into utf-8 raw string as expected by ldap"""
    if isinstance(st, types.UnicodeType):
        return st.encode('utf-8')
    else:
        return st

def utf82unicode(st):
    """encode st into utf-8"""
    return st.decode('utf-8')


def encode_modlist(modlist, no_op):
    """encode ldap modlist structure
       set no_op=True for Tuple of kind (int,str,[str,...])
       and False for (str, [str,...])
    """

    for i, mod in enumerate(modlist):
        if no_op:
            attr_name, attr_values=mod
        else:
            op, attr_name, attr_values=mod
        
        attr_name=unicode2utf8(attr_name)
        if isinstance(attr_values, (types.ListType, types.TupleType)):
            attr_values=map(unicode2utf8, attr_values)
        else:
            attr_values=unicode2utf8(attr_values)
        if no_op:
            modlist[i]=(attr_name, attr_values)
        else:
            modlist[i]=(op, attr_name, attr_values)

    return modlist

def _print_ldap_result(ldap_result):
    for dn, item in ldap_result:
        print 'DN=', repr(dn)
        for k, v in item.iteritems():
            print '\t%s: %s' % (k, repr(v))
        print


class UnicodeLDAPInterface:

    ldapbaseclass=None
    decoder_expiration_delay=300 # the expiration delay for an object in self.unicode_decoder
    
    def __init__(self, uri, **kwargs):
        self.ldapbaseclass.__init__(self, uri, **kwargs)
        self.unicode_decoder={} # { (msgid, expiration, decoder_data) ... }
        # I use an expiration time to avoid the list to become to big when the
        # server don't answer some requests

    def _set_unicode_decoder(self, msgid, value):
        """protect unicode_decoder against multi-threading
        update or add the decoder
         """
        self._ldap_object_lock.acquire()
        try:
            self.unicode_decoder[msgid]=value
        finally:
            self._ldap_object_lock.release()

    def _remove_unicode_decoder(self, msgid):
        """protect unicode_decoder against multi-threading
        remove the decoder
         """
        self._ldap_object_lock.acquire()
        try:
            del self.unicode_decoder[msgid]
        except KeyError:
            pass
        self._ldap_object_lock.release()

    def _get_unicode_decoder(self, msgid):
        """protect unicode_decoder against multi-threading
         read the decoder info for msgid
         """
        self._ldap_object_lock.acquire()
        try:
            return self.unicode_decoder[msgid]
        finally:
            self._ldap_object_lock.release()

    def _expire_unicode_decoder(self):
        """cleanup any expired decoder"""
        self._ldap_object_lock.acquire()
        now=datetime.datetime.now()
        for msgid in self.unicode_decoder.keys():
            if self.unicode_decoder[msgid][1]<now:
                del self.unicode_decoder[msgid]
        self._ldap_object_lock.release()


    def search_ext(self,base,scope, filterstr, attrlist, *args, **kwargs):
        # base,scope, filterstr='(objectClass=*)',attrlist=None,attrsonly=0,serverctrls=None,clientctrls=None,timeout=-1,sizelimit=0
        
        # convert filter
        filterstr_u=unicode2utf8(filterstr)
        
        # convert arglist and keep a copy of original values for later decoding
        attrlist_u=[]
        decoder={} # will keep only fields to decode
        if attrlist!=None:
            for attr in attrlist:
                if isinstance(attr, types.UnicodeType):
                    attr=attr.encode('utf-8')
                    decoder[attr]=True
                attrlist_u.append(attr)

        msgid=self.ldapbaseclass.search_ext(self,base,scope, filterstr_u, attrlist_u, *args, **kwargs)

        if decoder:
            timeout=kwargs.get('timeout', None)
            if timeout==None or timeout<=0:
                timeout=self.decoder_expiration_delay
            self._set_unicode_decoder(msgid,(msgid, datetime.datetime.now()+datetime.timedelta(seconds=timeout), decoder))
        return msgid

    def result3(self, *args, **kwargs):
        # kwargs=(self, msgid=_ldap.RES_ANY,all=1,timeout=None):
        rtype, rdata, rmsgid, decoded_serverctrls=self.ldapbaseclass.result3(self, *args, **kwargs)

        try:
            msgid, expire, decoder=self._get_unicode_decoder(rmsgid)
        except KeyError:
            pass
            # no decoder for this => nothing to decode
        else:
            if rtype not in [ ldap.RES_SEARCH_ENTRY, ldap.RES_SEARCH_REFERENCE ]:
                # this was the last result
                self._remove_unicode_decoder(rmsgid)
            else:
                # reset the timeout
                timeout=kwargs.get('timeout', None)
                if timeout==None or timeout<=0:
                    timeout=self.expiration_delay
                self._set_unicode_decoder(msgid, (msgid, datetime.datetime.now()+datetime.timedelta(seconds=timeout), decoder))
            
            # now decode the result
            if rdata:
                if rtype in [ldap.RES_SEARCH_ENTRY, ldap.RES_SEARCH_REFERENCE, ldap.RES_SEARCH_RESULT]:
                    # FIXME: I dont know what is a RES_SEARCH_REFERENCE
                    rdata_u=[]
                    for i, (dn, attrs) in enumerate(rdata):
                        # FIXME: should I handle the 'dn' the same way
                        if decoder.has_key('dn'):
                            dn=utf82unicode(dn)
                        for key in attrs.keys():
                            if decoder.has_key(key):
                                attrs[key]=map(utf82unicode, attrs[key])
                        # print '\tITEM=', dn, attrs
                        rdata[i]=(dn, attrs)

        self._expire_unicode_decoder()
        
        return rtype, rdata, rmsgid, decoded_serverctrls

    def add_ext(self, dn, modlist, *args, **kwargs):
        # args=(self,dn,modlist,serverctrls=None,clientctrls=None)
        dn=unicode2utf8(dn)
        # print 'MODLIST', modlist
        modlist=encode_modlist(modlist, True)
        # print 'MODLIST unicode', modlist
        return self.ldapbaseclass.add_ext(self, dn, modlist, *args, **kwargs)

    def modify_ext(self, dn, modlist, *args, **kwargs):
        # args=(self,dn,modlist,serverctrls=None,clientctrls=None)
        dn=unicode2utf8(dn)
        # print 'MODLIST', modlist
        modlist=encode_modlist(modlist, False)
        # print 'MODLIST unicode', modlist
        return self.ldapbaseclass.modify_ext(self, dn, modlist, *args, **kwargs)

    def delete_ext(self, dn, *args, **kwargs):
        # args=(self,dn,serverctrls=None,clientctrls=None)
        dn=unicode2utf8(dn)
        return self.ldapbaseclass.delete_ext(self, dn, *args, **kwargs)

    def abandon_ext(self, msgid, *args, **kwargs):
        # args=(self,msgid,serverctrls=None,clientctrls=None)
        result=self.ldapbaseclass.abandon_ext(self, msgid, *args, **kwargs)
        self._remove_unicode_decoder(msgid)
        return result

    def cancel_ext(self, cancelid, *args, **kwargs):
        # args=(self,msgid,serverctrls=None,clientctrls=None)
        result=self.ldapbaseclass.cancel_ext(self, cancelid, *args, **kwargs)
        self._remove_unicode_decoder(cancelid)
        return result

class UnicodeLDAPObject(UnicodeLDAPInterface, LDAPObject):
    ldapbaseclass=LDAPObject
    
class UnicodeReconnectLDAPObject(UnicodeLDAPInterface, ReconnectLDAPObject):
    ldapbaseclass=ReconnectLDAPObject


if __name__=='__main__':

    import sys, os, time
    
    host='localhost'
    port=389
    base_dn='dc=asxnet,dc=loc'
    
    if True:
        who='cn=manager,cn=internal,dc=asxnet,dc=loc'
        cred='vishnou'
    else:
        who='cn=nobody,cn=internal,dc=asxnet,dc=loc'
        cred='iMmTWz5pJ+lwY7i6M/BU61ngo1aBLyqQhRrrKbEc'
    
    ldap_url=ldapurl.LDAPUrl('ldap://%s:%d/%s' % (host, port, base_dn))
    ldap_url.applyDefaults({
       'who': who,
       'cred' : cred, })
    print ldap_url
    #l=LDAPObject(ldap_url.initializeUrl())
    #l=UnicodeLDAPObject(ldap_url.initializeUrl())
    l=UnicodeReconnectLDAPObject(ldap_url.initializeUrl())
    l.simple_bind_s(ldap_url.who, ldap_url.cred)
    print 'Connected as', l.whoami_s()
    
    first_name='Michael'
    first_name2=u'Micha\xebl'
    last_name=u'Str\xf6der'
    email='[EMAIL PROTECTED]'
    street=u'Hauptstra\xe1e'
    country='Germany'
    
    cn='%s %s' %(first_name, last_name)
    dn='cn=%s,%s' %(cn, base_dn)
    info={
        u'cn' : (cn, ),
        'mail' : (email, ),
        'objectClass' : ('top', 'inetOrgPerson', 'kolabInetOrgPerson',), 
        u'sn' : (last_name, ),
        u'givenName' : (first_name, ),
        u'street': (street, ),
        'c': (country, ),
        'telephoneNumber': '+49 1111111111',
    }
    
    ldap_result=l.search_s(base_dn, ldap.SCOPE_ONELEVEL, '(cn=%s)' % (cn,) , info.keys())
    if ldap_result:
        print '== Found'
        _print_ldap_result(ldap_result)
        l.delete_s(dn)    
        print '== Deleted'
        
    l.add_s(dn, ldap.modlist.addModlist(info))    
    print '== Created'
    ldap_result=l.search_s(base_dn, ldap.SCOPE_ONELEVEL, '(cn=%s)' % (cn,) , info.keys())
    _print_ldap_result(ldap_result)
    
    l.modify_s(dn, [(ldap.MOD_REPLACE, u'givenName', first_name2),
                    (ldap.MOD_ADD, 'telephoneNumber', ( '+49 1234567890', )),
                     ])
    
    print '==Modified'
    ldap_result=l.search_s(base_dn, ldap.SCOPE_ONELEVEL, '(cn=%s)' % (cn,) , info.keys())
    _print_ldap_result(ldap_result)

    print '==Display once more'
    ldap_result=l.search_s(base_dn, ldap.SCOPE_ONELEVEL, '(cn=%s)' % (cn,) , ['*', '+', u'dn', u'givenName', u'creatorsName'] )
    _print_ldap_result(ldap_result)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
Python-LDAP-dev mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/python-ldap-dev

Re: unicode value

Reply via email to