Hi, On Saturday, 19. August 2006 19:42, Peter Mogensen wrote: > But since RFC 2252 explicitly specifies that syntax > 1.3.6.1.4.1.1466.115.121.1.15 is an UTF-8 string it would be very > usefull to have an option which no matter what the attribute is named > would set the UTF-8 flag on all "Directory string" attributes. This way > you could avoid defining a lot of regex's if you just used your > LDAP-server correct.
I thought about this approach too, but I saw a few disadvantages with that: - IIRC the schema is only part of LDAPv3 Parsing would then be impossible for LDAPv2 - schema parsing would be quite complex (attribute type definitions can refer to other attribute type definitions) - schema parsing would give me a potential long list of attribute names (I know servers with more than 1500 attribute type definitions) - schema parsing does not take attribute options into account (Things like e.g. "givenname;x"givenname;binary") - schema parsing would force me to find out all attribute definitions that might be UTF8-encoded. In fact most are (if you consider ASCII a subset of UITF-8) "Directory String" is not alone. What about "Distinguished name", ... ? - sometimes the (local) use of attributes might differ from their definition E.g. userPassword is an "Octet String", but I guess I am not the only one who would interpret a 2 byte sequence "\xC3\xA4" as "รค" (ä) Considering all these things I found the solution with _one_ regex quite simple. It even gives me the opportunity to treat some attributes as bytes even if they are UTF-8 by their attribute type definition, while at te same time to treat other attributes as UTF-8. Did you try the CVS ? Regards Peter -- Peter Marschall [EMAIL PROTECTED]