>BUT Xerces-J (IBM regex4j) was implemented \p feature. So, I implemented it
>'\d' is included only '0'-'9' ASCII characters with Xerces-J. But \p is
>included all unicode
>numerical characters. '\d' in ORO isn't so.
I'll have to review what regex4j does (didn't even know the package
existed even though I use xerces). I am guessing it implements it
because it processes a raw byte stream coming from a file. I still
hold that \p has no meaning your input is always in Unicode. I guess
this brings to the fore the need to write up what the principles of
being "compatible" with Perl mean for the org.apache.oro.text.regex
package. There is a general idea of omitting those things that
are present in Perl regular expressions that don't make any sense
in the Java environment (e.g., we will never implement (?{ code })).
>I can remove it. I never use \p expression with ORO:)
>What do you think about this?
I think we should keep it out until there's a compelling reason to put
it in since I would posit no one will ever use it unless they have a
bunch of Perl regular expressions stored in a file somewhere that they
feed as input to a Java rewrite of a Perl program. I'd rather focus on
adding things like zero-width lookbehind assertions that people have been
clamoring for.
daniel