Hi Graham, On Wednesday 13 August 2003 22:01, Graham Barr wrote: > On Wed, 2003-08-13 at 20:50, Kurt D. Zeilenga wrote: > > I don't think the LDAP API should do any trancoding, the API should > > be a simple conduit between the application and the wire. Any > > transcoding desires should be done by the application (by directly > > calling APIs specifically designed to do transcoding). > > I have been following this thread a bit. And while I would agree that > the API (Net::LDAP in this case) should not do transcoding, I don't see > any reason why it cannot provide hooks to make the application > developers life easier.
I remember a thread this spring about perl-ldap, Convert::ASN1 and Unicode support. There were some issues with character semantics. Are they still there or is it possible to feed strings with character semantics into perl-ldap and get strings in character semantics back ? Oops, wanting them back in character semantics is dangerous (as I wrote in various previous mails in this thread ;-)), because it needs knowledge of the data (schema, ...) But maybe here an option to get_value() can help. The idea is to have a string e.g "Hägar" (Latin1) in character semantics "Hägar" (UTF8, but also only 5 characters long because of the character semantics: the "ä" is 2 bytes, but only one character) and accept this regular Perl string as an input to operations in perl-ldap. Of course even a string like "Any \x{0021} string \N{SMILEY FACE}" should work ;-) I know it works with byte semantics where "Hägar" is a string of length() six and Perl has no idea that the "Ã" and the "¤" are actually the UTF8 encoding for the Latin1 "ä". I also know the character semantics will not work with versions < 5.8 because there the Unicode support was not so complete (IIRC the utf8 flag was lexically scoped and not an attribute of each variable). For the checks whether a string is in byte or character semantics and the appropriate conversion from character semantics to byte semantics the Encode module (or Perl's 5.8.1 utf8 package) should do e.g. with Encode: # convert to byte semantics if string is in character semantics $octets = encode("utf8", $string) if (is_utf8($string)); When reading an attribute's value, an additional option [e.g. chars => 1] can tell get_value() to use character semantics instead of the default byte semantics. This allows the user to get Perl strings from attributes he knows to be encoded in UTF8. e.g. # get givenName as a string in character semantics $string = $entry->get_value('givenname', chars => 1) # get jpegPhoto as a sequence of bytes $octets = $entry->get_value('jpegPhoto'); Peter PS: Having written this I notice that the conversion from character semantics strings to UTF8-encoded byte semantics in perl-ldap when writing might be risky too since it assumes that this attribute is UTF8 encoded in LDAP. Here, I think, the risk is tolerable as long as byte semantics is supported in perl-ldap and the behaviour with character semantics is explanied in the man page. PPS: For me this is all pure theory since I only have Perl 5.6.0 at work (and for compatibility's sake at home too). -- Peter Marschall eMail: [EMAIL PROTECTED]