Re: How to handle character set in perl-ldap?

Dan Oscarsson Sat, 09 Aug 2003 13:34:44 -0700

Chris wrote:


>> The only good way I can see is to define that all DN, RDN, password and
>> all attributes in a list (I might have forgotten something) to be translated.
>> The attribute list could actually list the binary attributes as most are 
text.
>
>The only way you can do it is use the schema, and then build in knowledge
>about what each syntax requires. You'd have to make that extensible, because
>new syntaxes are defined in some environments and markets.

Or let it be done as an option when doing new Net::LDAP


>I meant if you had to encode attribute values (OCTET STRINGs) yourself,
>while the API magically treated LDAPString values as UTF-8, then you'd need:
>
>    $ldap->add ( 'cn=Chris Ridd',
>                 attrs => [
>                   objectClass => [qw(top person extensibleObject)],
>                   cn => String2utf8('Chris Ridd'),
>                   sn => String2utf8('Ridd'),
>                   myAttr => $foo,
>                   seeAlso => String2utf8('cn=Chris Ridd')
>                 ]
>               );
>
>You would have to wrap the seeAlso value in something (eg String2utf8) to
>translate the string into UTF-8 before sending to the directory, while you
>would *not* have to translate the entry's DN into UTF-8 as you're proposing
>that the API does that.

In my proposal all strings will be translated (except those representing binary
values).
So I would not wrap any data with "String2utf8".
So it would not be confusing at all.


>> As it is now it is very error prone and confusing. For every call
>> I make I have to remember call translate-to-utf-8(string) on every
>> parameter that is a string. For example:
>> $ldap->bind(String2utf8("cn=xåx,o=example"),
>>             password=>String2utf8("myåpasswd"));
>
>Password's an interesting example because it is defined to be OCTET STRING
>and the character set used in it is defined to be a "local matter".

Yes I know password is a difficult thing. I have many problems with that
in my mixed Unix/MS Windows/Mac environment.
But the only way to get it to work, is to use the same character set
for all passwords in a database. As LDAPv3 says UTF-8 for strings and
passwords entered by humans normally are strings, I would expect the
normal case to be UTF-8 encoded passwords.


>> A API should use local character set as default and internally convert
>> to protocol character set. It could be an option to enable exposing
>> protocol character set.
>
>That isn't the case with other APIs. The C libldap API just sends the raw
>bytes to and fro. The various Java APIs use Unicode strings, because that's
>what's native to Java anyway.

I know that many API implementors do not think about this. I am very tired
of having to do different translations between system character set and protocol
character sets. Protocol character sets and formats should be hidden from
the programmer.

In Java it works as it should, system character set is UTF-16 and
the Java APIs do the translation to the protocol character set.
Here you do not have to think about character set issues.
In the Java LDAP API (JNDI) it has a list of attributs that are known
to be non-string (and you can add to that list). Those attributes will
not be translated.

>There's certainly scope for doing all this in a schema-aware version of the
>Net::LDAP API, but I think the way Net::LDAP currently works without messing
>around with the data is also very useful.

A schema aware version could be done, though schema support is bad in some
LDAP servers. Also it would slow done opening an ldap connection if the
schema must be fetched. Simple to do like Java - have a list of the known
non-string attributes with a possibility for the user to add more.

   Dan

Re: How to handle character set in perl-ldap?

Reply via email to