On May 5, 10:44 pm, John Machin <[EMAIL PROTECTED]> wrote: > "UTF-8 Unicode" is meaningless. Python has internal unicode string > objects, with comprehensive support for converting to/from str (8-bit) > string objects. The re module supports unicode patterns and strings. > PCRE "supports" patterns and strings which are encoded in UTF-8. This > is quite different, a kludge, incomparable. Operations which inspect/ > modify UTF-8-encoded data are of interest only to folk who are > constrained to use a language which has nothing resembling a proper > unicode datatype.
Sure, I know it's a mediocre support for Unicode for an application, but we're not talking an application here. If I get the PCRE module done, I'll just PyArg_ParseTuple(args, "et#", "utf-8", &str, &len), which will be fine for Python's Unicode support and what PCRE does, and I won't have to deal with this string at all so I couldn't care less how it's encoded and if I have proper Unicode support in C or not. (I'm unsure of how Pyrex or SWIG would treat this so I'll just hand-craft it. It's not like it would be complex; most of the magic will be pure C, dealing with PCRE's API.) > There's also the YAGNI factor; most folk would restrict using regular > expressions to simple grep-like functionality and data validation -- > e.g. re.match("[A-Z][A-Z]?[0-9]{6}[0-9A]$", idno). The few who want to > recognise yet another little language tend to reach for parsers, using > regular expressions only in the lexing phase. Well, I find these features very useful. I've used a complex, LALR parser to parse complex grammars, but I've solved many problems with just the PCRE lib. Either way seeing nobody's interested on these features, I'll see if I can expose PCRE to Python myself; it sounds like the fairest solution because it doesn't even deal with the re module - you can do whatever you want with it (though I'd rather have it stay as it is or enhance it), and I'll still have PCRE. That's if I find the time to do it though, even having no life. -- http://mail.python.org/mailman/listinfo/python-list