Hi, On Friday, 20. January 2006 17:31, Chris Ridd wrote: > On 20/1/06 4:13, Peter Marschall <[EMAIL PROTECTED]> wrote: > > On Friday, 20. January 2006 15:00, Chris Ridd wrote: > >> On 20/1/06 1:49, Peter Marschall <[EMAIL PROTECTED]> wrote: > >>> as announced in my post from 28. October (see below) I have committed > >>> a patch to perl-ldap's SVN that allows to get attribute values from > >>> LDAP directories and LDIF files correctly encoded as Perl strings. > >>> > >>> This is especially interesting for people who have strings with > >>> non-ASCII values in their directories. > >> > >> This sounds like a good idea, but the name "binary" is horribly > >> overloaded and misused in LDAP. > >> > >> Can you think of a better name? Is something like "non-utf8" better? > > > > I was pondering quite a long time about the name or this option. > > "binary" sounded the most correct name for it, since I consider the > > values to be binary. > > Sure, but then the LDAP RFC defined "binary" as a mechanism for > transferring attribute values in BER, and then there are the > "non-human-readable" attributes defined as OCTET STRINGs, like userPassword > and jpegPhoto (and audio). It is a horrible mess. > > > (I ruled out "exact" since it might cause people to thank that the other > > attrributes are not returned exactly but only as approximations ;-) > > > > But I am open for discussion. If possible I prefer names that can be used > > without quotes as hash keys. (i.e. binary => qr/..../ vs. "non-utf8" => > > qr/..../) > > > > Other ideas welcome ? > > Something along the lines "bitwise-identical", "one-to-one", .... only > > shorter and without "-" ;-) > > How about "bytes" or "raw"?
I like "raw" ;-) > >> You're also assuming that the received values will be UTF-8, which is > >> not correct if you're talking to an LDAPv2 server. > > > > I am aware of that but I did not want to restrict this option to LDAPv3 > > only as it may have some uses even with v2. E.g. I know of at least one > > LDAP implementation that sends UTF8 even with LDAPv2. > > OK, so there are some very broken servers out there. > > > Maybe I should document it better ;-) > > Yes. > > > BTW did I mention that this option is only effective with Perl 5.8+ ? > > Vaguely. Any specific version of 5.8? I'm not really familiar with how perl > handles UTF-8 nowadays... It depends on the ITF-8 support that started with Perl 5.8 where each value of a variable knows whether it is treated as bytes or as characters. The whole trick is the Encode module that has all necessary functions available starting with Perl 5.8.0 (for some LDIF writing stuff it might be 5.8.1) > > Another point worth mentioning is that with directories it only works for > > reading, while with LDIF files it works for reading and writing. > > The writing case with directories is IMHO handled by Convert::ASN1. > > How does that know which scalars are "bytes" or need converting from the > locale into UTF-8 or T.61 (or ISO 8859-1 if the server's broken)? The necessary information is * starting from Perl 5.8 each value knows whether it is treated as "bytes" or "characters". * Net::LDAP::search() returns values as "bytes" from the server * LDAPv3 servers use UTF8 for most internationalized attributes (this is why the "binary"/"raw" regex is required ;-) * Perl internally uses UTF-8 flagged as "characters" to store non-ASCII strings (with a bit of kludge for Latin-1) Then all I need to do is to flag the value as "characters" ! It's a bit of a trick but it works for the majority of attributes in LDAPv3 And it is harmless even for values that follow another syntax but consist entirely of ASCII characters. > I think if you tell the search operation, you need to tell all the operations. It is not necessary as all other operations only send data to the directory server which is converted in Convert::ASN1. All I get back from bind(), add(), modify(), ... ist the status code and possibly some error string. (I did not deal with this string now, but this should not be to much of a problem. I guess it will be in English in 99,9+% of the cases anyway) The only difference is search(): It queries the server for data with differing syntax. > Ooh - what happens to the DN arguments, do they get translated? Here I used a little trick. As no argument in LDAP may have the name 'dn' I used this name to check the regex for DNs. So the answer is: yes provided it does not match the "binary"/"raw" regex. Peter -- Peter Marschall eMail: [EMAIL PROTECTED]