Re: What are we doing in regards to JDK 1.4?
At 17:33 20-6-2001 -0400, Berin wrote: I have been looking through JDK 1.4, and there are a few instances where what is included in the JDK steps on some of our projects. Most notably: java.util.regex --- This steps on the toes of both jakarta-regexp and jakarta-oro. In fact by its existence in the JDK, the Apache projects will either loose mindshare, or remain static in their mindshare. org.apache.regexp 1.2 is pretty much broken. It has some major flaws since 1.0 and they are still not addressed. See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp for a list of bugs (BTW none of them is assigned). Maybe it's better to just throw regexp out of the window and use Sun's official regex. Bye, Edwin. (Who really do like all the other Jakarta projects). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What are we doing in regards to JDK 1.4?
At 09:42 21-6-2001 -0700, Jon wrote: Edwin, on 6/21/01 7:16 AM, Edwin Martin [EMAIL PROTECTED] wrote: org.apache.regexp 1.2 is pretty much broken. It has some major flaws since 1.0 and they are still not addressed. See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp for a list of bugs (BTW none of them is assigned). Sending in bug reports doesn't get the problems fixed. This is a community of VOLUNTEERS. You can't just magically put in a bug report and then someone is going to jump up and fix it...you have to submit patches or try to nicely motivate people to fix it for you. http://jakarta.apache.org/site/understandingopensource.html With the opensource system, if you find any deficiency in the project, the onus is on you to redress that deficiency. I thought submitting bug reports is also an important way to support Open Source. Well, I looked at the regexp-code and saw one of the bugs: RECompiler.java, line 664: // Premature end of range. define up to Character.MAX_VALUE if ((idx + 1) len pattern.charAt(++idx) == ']') { simpleChar = Character.MAX_VALUE; break; } The code makes any minus a range. The RE [a-] becomes the character a and anything after it. A minus at the beginning or the end should be just a minus. The code should be something like this: // Premature end of range. define up to Character.MAX_VALUE if ((idx + 1) len pattern.charAt(++idx) == ']') { definingRange = false; break; } Futhermore, RECompiler.java, line 697: if ((idx + 1) = len || pattern.charAt(idx + 1) != '-') Should become something like: if ((idx + 1) = len || !(pattern.charAt(idx + 1) == '-' !((idx + 2) = len pattern.charAt(idx + 2) == ']'))) Which means: Do not include a char when followed by a minus, but DO include the char when the minus is followed by a ']'. The code still does not address the possibility of a charclass which starts with a minus, like [-a] or [^-a], but that shouldn't be too difficult to implement. It isn't really that hard to fix these bugs, I just wonder if there's anybody responsible for the regexp package. And by the way, you don't have to shout. Bye, Edwin Martin. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Regexp 1.2 weirdness
[I already posted this problem to the regexp mailinglist, without results. Maybe anybody on this list can help?] Hello, I stumbled upon a problem with regexp 1.2 which I can't match with any general regex-documentation, either old or new. In short: [a-z0-9-] doesn't match alphanumerics and '-'. Here's an JSP test page I made: begin retest.jsp %@ page import=org.apache.regexp.* % h2RE test/h2 % String s = lt;[EMAIL PROTECTED]gt;; out.print(s); out.print(p1br); RE emailRE1 = new RE(([a-z0-9]+)@); if ( emailRE1.match( s ) ) out.print( emailRE1.getParen(1) ); out.print(p2br); RE emailRE2 = new RE(([a-z0-9.]+)@); if ( emailRE2.match( s ) ) out.print( emailRE2.getParen(1) ); out.print(p3br); RE emailRE3 = new RE(([a-z0-9.-]+)@); if ( emailRE3.match( s ) ) out.print( emailRE3.getParen(1) ); out.print(p4br); RE emailRE4 = new RE(([a-z0-9-]+)@); if ( emailRE4.match( s ) ) out.print( emailRE4.getParen(1) ); s = lt;[EMAIL PROTECTED]gt;; out.print(hr); out.print(s); out.print(p5br); RE emailRE5 = new RE(([a-z0-9-]+)@); if ( emailRE5.match( s ) ) out.print( emailRE5.getParen(1) ); out.print(p6br); RE emailRE6 = new RE(([a-z0-9.-]+)@); if ( emailRE6.match( s ) ) out.print( emailRE6.getParen(1) ); out.print(p7br); RE emailRE7 = new RE(([a-z0-9.]+)@); if ( emailRE7.match( s ) ) out.print( emailRE7.getParen(1) ); % end retest.jsp This is the output: begin output RE test [EMAIL PROTECTED] 1 002 2 001.002 3 001.002 4 john.doe-001.002 - [EMAIL PROTECTED] 5 john.doe.001-002 6 002 7 002 end output Points 1 and 2 are as expected. Point 3 should match john.doe-001.002 Point 4 (removing the dot) matches all! Point 5, 6 and 7 are added to see what happens when the dot and minus are swapped. The same strange behavior :-( Do I overlook something? Bye, Edwin Martin. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]