Hi, On Monday 11 August 2003 13:27, Dan Oscarsson wrote: > >At a first glance the second case seems easier for the application > >programmer, but it is really broken: > >Consider the following cases: > >1) A German and a Czech shall be added to the directory > > during the same connection. > > Each one might have attributes that need to be > > converted from the resp. char set (Latin1 vs Latin2) > >2) LDAP allows different represenations for the same attribute > > (using the lang-.. qualifiers). Now, try to add a chinese > > name and it's transcription to Latin1 (or Latin2 ....) > > in one LDAP session. > > All attributes use UTF-8 in LDAP, even with different lang- qualifiers. > If you have a program with one connection internally in the program > you do not have German and Czech strings using different character sets. > So neither 1) nor 2) is any problem.
You are right. With the current API it isn't a problem. But IIRC this thread started with your complaint that perl-ldap does no automatic conversion (controlled by parameters to Net::LDAP->new) of string attributes from the local character set to UTF-8. If I got that right, that means that you do not convert from your "local character set" to to UTF-8 in your application but let your version of perl-ldap do it. That again means that your version of perl-ldap expects strings in your local character set and interprets the strings it gets in that character set. Now if you have a string in a different character set, what do you do now ? That's what my example with German and Chech with the condition "during the same connection" is about. > >Both cases are absolutely legal and possible with the current API. > >With an API that is not ablsolutley transparent, they will not work. > > What is the problem? All attributes in LDAPv3 uses UTF-8 encoding. The problem is that your version of perl-ldap can not be fed strings that cannot be represented in the "local" character set. It also cannot interpret correctly strings in character sets different from your local character set. E.g. if you have perl-ldap set to convert strings from Latin1 to UTF-8, it will interpret 0xD2 as "LATIN CAPITAL LETTER O WITH GRAVE". Unfortunately the string containing 0xD2 was a Latin2 string where 0xD2 is "LATIN CAPITAL LETTER N WITH CARON". Ooops, information lost !! And what about characters that need more than one byte: chinese, japanese or mathematical symbols like 0x2228 "NOT PARALLEL TO" ? when doing conversion from Latin1 those cannot even be fed into the API. > >But then we have the same problem: How can I enter a character > >of a character set different from my default input character set ? > > Inside an application I prefer to use ONE character set encoding as > working with character data gets so much easier then. Accepted. But this character set must be capable of representing all legal characters for the API or else you will loose information or forbid some use cases. That means using Latin1 (or any other 8bit charqacter set) will restrict your possible use cases (see my examples). If you have to take Unicode anyway, you now can choose between the various representations. So, why not taking UTF-8 if it is the optimal encoding for perl-ldap (since it does not need any mapping) ? > So it is during input/output the translation takes place. > If my program uses a protocol, the translation from internal > character encoding to protocol encoding will take place when data > is to betransferred to the protocol. ... Why not doing it right and doing the conversion when reading from byte oriented input / writing to byte oriented output such as files, terminals and working with UTF-8 inside your application. Once the date crosses the border from the outside (file, terminal) you are safe since you are in UTF-8. > ...If possible I want the fact that > the protocol uses another format to be invisible. It should be hidden > under the APIs. Sometimes you want to access the protocol encoding > directely, so an API should allow it to be exposed. > Most people will prefer not to have to think about it. This is a kludge. You are fattening the API making, it more error prone, more obscure for more complicated use cases, slower and restrict use cases in general just to stay in 8bit instead of doing it correct with UTF-8. But wait: AFAIK Net::LDAP is subclassable. You do not need to change Net::LDAP but can make your private sublass that does it with conversion if you want it. Peter -- Peter Marschall eMail: [EMAIL PROTECTED]