Hi,

On Saturday, 19. August 2006 19:42, Peter Mogensen wrote:
> But since RFC 2252 explicitly specifies that syntax
> 1.3.6.1.4.1.1466.115.121.1.15 is an UTF-8 string it would be very
> usefull to have an option which no matter what the attribute is named
> would set the UTF-8 flag on all "Directory string" attributes. This way
> you could avoid defining a lot of regex's if you just used your
> LDAP-server correct.

I thought about this approach too, but I saw a few disadvantages with that:
- IIRC the schema is only part of LDAPv3
  Parsing would then be impossible for LDAPv2
- schema parsing would be quite complex (attribute type definitions can
  refer to other attribute type definitions)
- schema parsing would give me a potential long list of attribute names
  (I know servers with more than 1500 attribute type definitions)
- schema parsing does not take attribute options into account
  (Things like e.g. "givenname;x"givenname;binary")
- schema parsing would force me to find out all attribute definitions
  that might be UTF8-encoded. In fact most are (if you consider ASCII
  a subset of UITF-8)
  "Directory String" is not alone. What about "Distinguished name", ... ?
- sometimes the (local) use of attributes might differ from their definition
  E.g. userPassword is an "Octet String", but I guess I am not the only one
  who would interpret a 2 byte sequence "\xC3\xA4" as "รค" (ä)

Considering all these things I found the solution with _one_ regex quite 
simple. It even gives me the opportunity to treat some attributes as bytes 
even if they are UTF-8 by their attribute type definition, while at te same 
time to treat other attributes as UTF-8.

Did you try the CVS ?

Regards
Peter

-- 
Peter Marschall
[EMAIL PROTECTED]

Reply via email to