Chris wrote:
>>> Minor detail: Why is not deleteoldrdn=1 default in moddn? >>> I have never in any code I have using LDAP allowed old rdn to remain. >>> I would expect most people to want the old rnd to be deleted >>> automatically. > >There's no default specified in the ASN.1. It is probably dangerous to have >a default which removes data, but otherwise I can't think of a good reason >either way. As you change the rdn, the data in the attribute would be expected to change also. So I would see removing old name as the expected case. > >You'd possibly break a lot of scripts if you changed the default :-( I doubt a lot, but probably some. > >>> Major problem: Character set conversion. >>> On my system I use ISO 8859-1 as the character set. But LDAPv3 uses >>> UTF-8. >>> So in my own LDAP code, the translation between protocol format (UTF-8) >>> and system character set (ISO 8859-1) is done automatically inside >>> the LDAP module. This means that all strings used (in DN, attributes, >>> filters) >>> are in ISO 8859-1. In is only internally when the LDAP server is called >>> the strings are translated to/from UTF-8. >>> I have looked at the code for perl-ldap to see where this translation >>> could be added. I would prefer to add it just before the data is >>> converted into ASN, and when decoded from ASN, but I am not sure if >>> this >>> is easily done as the code is written now. >>> Do you have any suggestions where it could be done or how it should be >>> done? >>> If you are interested I can send my patches to you to include in >>> perl-ldap, >>> if I can find a good way to add it in. > >This is a bad idea IMHO. > >Firstly, not everyone uses ISO 8859-1 as their local character set, so you'd >have to make this switchable. In the general case it should be between UTF-8 and character set of locale. > >Secondly, you'll break LDAPv2 because it doesn't use UTF-8. There's no good >reason to break LDAPv2 support. In my own LDAP module choosing LDAPv2 translates between T.61 and locale character set instead. Though there are some LDAPv2 servers that use other more non-standard character sets (T.61 was the X.500 standard). > >(You can get around those two issues with suitable switches etc.) > >Thirdly, how are you going to decide what needs translating into UTF-8 and >what doesn't? I don't think there's enough information in the ASN.1 to let >you make that decision. For example AttributeValue (used by modify, add, >search result, etc) uses OCTET STRING, and that must be able to carry text >as well as binary values, for example (but not limited to) JPEGs, passwords, >and certificates. The LDAP standard is unfortunately bad in this area. It should have use OCTET STRING for binary data and UTF-8 STRING for text data. And when LDAPv3 was introduced they did not even require ;binary on binary data (it ought to be jpegPhoto;binary). So it is a problem. The only good way I can see is to define that all DN, RDN, password and all attributes in a list (I might have forgotten something) to be translated. The attribute list could actually list the binary attributes as most are text. > >If you go for a half-way solution, like anything in LDAPv3 defined as >LDAPString (like DNs, attribute types) being handled as UTF-8 and everything >else being raw octets, it will get very confusing to the calling script. >What should be in UTF-8, what shouldn't be?? Should 'cn=Chris Ridd' as a >value of seeAlso be encoded in UTF-8 or sent raw? In this case it doesn't >make a difference, but you can see how it might. I cannot see that it would be confusing. Everything that is a string is passed between user code and LDAP module as strings are normally coded (using the character set of all other strings). Everything that is binary data is returned as binary data (in perl strings as the interface do not have a separate data type for binary data). As it is now it is very error prone and confusing. For every call I make I have to remember call translate-to-utf-8(string) on every parameter that is a string. For example: $ldap->bind(String2utf8("cn=xåx,o=example"), password=>String2utf8("myåpasswd")); $ldap->add(String2utf8("cn=xö,o=example"), attrs => [ sn => String2utf8('xää') ]); If I forget one String2utf8 above everything ends up wrong. In my directory I have non-ASCII everywhere. It is very messy to have to translate to utf-8 (or T.61 in LDAPv2) everywhere. A API should use local character set as default and internally convert to protocol character set. It could be an option to enable exposing protocol character set. Dan