Re: How to handle character set in perl-ldap?

Chris Ridd Wed, 06 Aug 2003 09:30:44 -0700

On 6/8/03 4:08 pm, Graham Barr <[EMAIL PROTECTED]> wrote:

>> From: Dan Oscarsson <[EMAIL PROTECTED]>
>> Date: Wed Aug 6, 2003  12:34:54 Europe/London
>> To: [EMAIL PROTECTED]
>> Subject: How to handle character set in perl-ldap?
>> Reply-To: Dan Oscarsson <[EMAIL PROTECTED]>
>> 
>> Hi
>> 
>> I have been using my own LDAP module in perl for many years.
>> Now I am looking if I could use perl-ldap (Net::LDAP) instead.
>> I have looked at the code and much looks good. So far I have found one
>> minor thing and one major. As the code is quite advanced I am unsure
>> of the best way to do things. So I hope you can help me.
>> 
>> Minor detail: Why is not deleteoldrdn=1 default in moddn?
>> I have never in any code I have using LDAP allowed old rdn to remain.
>> I would expect most people to want the old rnd to be deleted
>> automatically.


There's no default specified in the ASN.1. It is probably dangerous to have
a default which removes data, but otherwise I can't think of a good reason
either way. 

You'd possibly break a lot of scripts if you changed the default :-(

>> Major problem: Character set conversion.
>> On my system I use ISO 8859-1 as the character set. But LDAPv3 uses
>> UTF-8.
>> So in my own LDAP code, the translation between protocol format (UTF-8)
>> and system character set (ISO 8859-1) is done automatically inside
>> the LDAP module. This means that all strings used (in DN, attributes,
>> filters)
>> are in ISO 8859-1. In is only internally when the LDAP server is called
>> the strings are translated to/from UTF-8.
>> I have looked at the code for perl-ldap to see where this translation
>> could be added. I would prefer to add it just before the data is
>> converted into ASN, and when decoded from ASN, but I am not sure if
>> this
>> is easily done as the code is written now.
>> Do you have any suggestions where it could be done or how it should be
>> done?
>> If you are interested I can send my patches to you to include in
>> perl-ldap,
>> if I can find a good way to add it in.

This is a bad idea IMHO.

Firstly, not everyone uses ISO 8859-1 as their local character set, so you'd
have to make this switchable.

Secondly, you'll break LDAPv2 because it doesn't use UTF-8. There's no good
reason to break LDAPv2 support.

(You can get around those two issues with suitable switches etc.)

Thirdly, how are you going to decide what needs translating into UTF-8 and
what doesn't? I don't think there's enough information in the ASN.1 to let
you make that decision. For example AttributeValue (used by modify, add,
search result, etc) uses OCTET STRING, and that must be able to carry text
as well as binary values, for example (but not limited to) JPEGs, passwords,
and certificates.

If you go for a half-way solution, like anything in LDAPv3 defined as
LDAPString (like DNs, attribute types) being handled as UTF-8 and everything
else being raw octets, it will get very confusing to the calling script.
What should be in UTF-8, what shouldn't be?? Should 'cn=Chris Ridd' as a
value of seeAlso be encoded in UTF-8 or sent raw? In this case it doesn't
make a difference, but you can see how it might.


>> Regards,
>> 
>>    Dan
>> --
>> Dan Oscarsson
>> Ki Consulting & Solutions AB         Email:
>> [EMAIL PROTECTED]
>> Box 85
>> 201 20  Malmo, Sweden
>> 
>> 
> 
> 

Cheers,

Chris

Re: How to handle character set in perl-ldap?

Reply via email to