> >I fully concur with Chris' opinion about APIs: >They have to be transparent. >Character set conversion should happen at the input side of the application, >not at the interface between application and API. >The API cannot guess what the application programmer wants.
No, but the programmer can tell the API what is wanted. I have now extended perl-ldap so that during new you can with some options define translation to/from utf-8, if you want. Those not doing that get things like before, and those who want a simplified translation can use that. > >At a first glance the second case seems easier for the application >programmer, but it is really broken: >Consider the following cases: >1) A German and a Czech shall be added to the directory > during the same connection. > Each one might have attributes that need to be > converted from the resp. char set (Latin1 vs Latin2) >2) LDAP allows different represenations for the same attribute > (using the lang-.. qualifiers). Now, try to add a chinese > name and it's transcription to Latin1 (or Latin2 ....) > in one LDAP session. All attributes use UTF-8 in LDAP, even with different lang- qualifiers. If you have a program with one connection internally in the program you do not have German and Czech strings using different character sets. So neither 1) nor 2) is any problem. >Both cases are absolutely legal and possible with the current API. >With an API that is not ablsolutley transparent, they will not work. What is the problem? All attributes in LDAPv3 uses UTF-8 encoding. >> Yes I know password is a difficult thing. I have many problems with that >> in my mixed Unix/MS Windows/Mac environment. >> But the only way to get it to work, is to use the same character set >> for all passwords in a database. ... >What do you do in an international company with different character sets ? >Restrict all passwords to plain ASCII ? >Not the best idea, IMHO. Store passwords using for example UTF-8 encoded UCS. UCS can handle all characters. That is why UTF-8 is used in LDAP. If you do not do that, a person in a company visiting somebody in the company in a different locale cannot login. >> ... As LDAPv3 says UTF-8 for strings and >> passwords entered by humans normally are strings, I would expect the >> normal case to be UTF-8 encoded passwords. >Who does the encoding from the local character set to UTF-8 ? Best is if the APIs do it. In the extension I have done to perl-ldap you supply the translation routine, so you (the programmer) can select from/to you want to convert (this because I find no easy way to get a standard local set to UTF-8 conversion routne from perl). >> In Java it works as it should, system character set is UTF-16 and >> the Java APIs do the translation to the protocol character set. >> Here you do not have to think about character set issues. >> In the Java LDAP API (JNDI) it has a list of attributs that are known >> to be non-string (and you can add to that list). Those attributes will >> not be translated. > >To be correct: the hard work of character set conversion needs to happen when >entering data into a Java application i.e. the Java application has to know >about my "default input character set" and interpret the data accordingly. >I doubt if the Java VM reprograms my keyboard to send UTF-16 ;-) >But then we have the same problem: How can I enter a character >of a character set different from my default input character set ? Inside an application I prefer to use ONE character set encoding as working with character data gets so much easier then. So it is during input/output the translation takes place. If my program uses a protocol, the translation from internal character encoding to protocol encoding will take place when data is to betransferred to the protocol. If possible I want the fact that the protocol uses another format to be invisible. It should be hidden under the APIs. Sometimes you want to access the protocol encoding directely, so an API should allow it to be exposed. Most people will prefer not to have to think about it. Dan