Re: What are we doing in regards to JDK 1.4?

2001-06-21 Thread Edwin Martin

At 17:33 20-6-2001 -0400, Berin wrote:
I have been looking through JDK 1.4, and there are a few
instances where what is included in the JDK steps on some
of our projects.  Most notably:

java.util.regex
---
This steps on the toes of both jakarta-regexp and jakarta-oro.
In fact by its existence in the JDK, the Apache projects will
either loose mindshare, or remain static in their mindshare.

org.apache.regexp 1.2 is pretty much broken. It has some
major flaws since 1.0 and they are still not addressed.

See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp
for a list of bugs (BTW none of them is assigned).

Maybe it's better to just throw regexp out of the window and
use Sun's official regex.

Bye,
Edwin.

(Who really do like all the other Jakarta projects).





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: What are we doing in regards to JDK 1.4?

2001-06-21 Thread Edwin Martin

At 09:42 21-6-2001 -0700, Jon wrote:
Edwin,

on 6/21/01 7:16 AM, Edwin Martin [EMAIL PROTECTED] wrote:

  org.apache.regexp 1.2 is pretty much broken. It has some
  major flaws since 1.0 and they are still not addressed.
 
  See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp
  for a list of bugs (BTW none of them is assigned).

Sending in bug reports doesn't get the problems fixed. This is a community
of VOLUNTEERS. You can't just magically put in a bug report and then someone
is going to jump up and fix it...you have to submit patches or try to nicely
motivate people to fix it for you.

http://jakarta.apache.org/site/understandingopensource.html

With the opensource system, if you find any deficiency in the project, the
onus is on you to redress that deficiency.

I thought submitting bug reports is also an important
way to support Open Source.

Well, I looked at the regexp-code and saw one of the bugs:

RECompiler.java, line 664:

// Premature end of range. define up to Character.MAX_VALUE
 if ((idx + 1)  len  pattern.charAt(++idx) == ']')
 {
 simpleChar = Character.MAX_VALUE;
 break;
 }

The code makes any minus a range.

The RE [a-] becomes the character a and anything after it.

A minus at the beginning or the end should be just a minus.

The code should be something like this:

 // Premature end of range. define up to 
Character.MAX_VALUE
 if ((idx + 1)  len  pattern.charAt(++idx) == ']')
 {
 definingRange = false;
 break;
 }

Futhermore, RECompiler.java, line 697:

 if ((idx + 1) = len || pattern.charAt(idx + 1) != '-')

Should become something like:

 if ((idx + 1) = len || !(pattern.charAt(idx + 1) == '-' 
 !((idx + 2) = len  pattern.charAt(idx + 2) == ']')))

Which means: Do not include a char when followed by a minus, but DO include the
char when the minus is followed by a ']'.

The code still does not address the possibility of a charclass which starts 
with a
minus, like [-a] or [^-a], but that shouldn't be too difficult to 
implement.

It isn't really that hard to fix these bugs, I just wonder if there's anybody
responsible for the regexp package.

And by the way, you don't have to shout.

Bye,
Edwin Martin.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Regexp 1.2 weirdness

2001-06-06 Thread Edwin Martin

[I already posted this problem to the regexp mailinglist,
  without results. Maybe anybody on this list can help?]

Hello,

I stumbled upon a problem with regexp 1.2 which I can't match
with any general regex-documentation, either old or new.

In short: [a-z0-9-] doesn't match alphanumerics and '-'.

Here's an JSP test page I made:

 begin retest.jsp 

%@ page import=org.apache.regexp.* %

h2RE test/h2

%

String s = lt;[EMAIL PROTECTED]gt;;

out.print(s);

out.print(p1br);
RE emailRE1 = new RE(([a-z0-9]+)@);
if ( emailRE1.match( s ) )
 out.print( emailRE1.getParen(1) );

out.print(p2br);
RE emailRE2 = new RE(([a-z0-9.]+)@);
if ( emailRE2.match( s ) )
 out.print( emailRE2.getParen(1) );

out.print(p3br);
RE emailRE3 = new RE(([a-z0-9.-]+)@);
if ( emailRE3.match( s ) )
 out.print( emailRE3.getParen(1) );

out.print(p4br);
RE emailRE4 = new RE(([a-z0-9-]+)@);
if ( emailRE4.match( s ) )
 out.print( emailRE4.getParen(1) );

s = lt;[EMAIL PROTECTED]gt;;

out.print(hr);
out.print(s);

out.print(p5br);
RE emailRE5 = new RE(([a-z0-9-]+)@);
if ( emailRE5.match( s ) )
 out.print( emailRE5.getParen(1) );

out.print(p6br);
RE emailRE6 = new RE(([a-z0-9.-]+)@);
if ( emailRE6.match( s ) )
 out.print( emailRE6.getParen(1) );

out.print(p7br);
RE emailRE7 = new RE(([a-z0-9.]+)@);
if ( emailRE7.match( s ) )
 out.print( emailRE7.getParen(1) );
%

 end retest.jsp 

This is the output:

 begin output  
RE test
[EMAIL PROTECTED]
1
002
2
001.002
3
001.002
4
john.doe-001.002
-
[EMAIL PROTECTED]
5
john.doe.001-002
6
002
7
002
 end output 

Points 1 and 2 are as expected.

Point 3 should match john.doe-001.002

Point 4 (removing the dot) matches all!

Point 5, 6 and 7 are added to see what happens when
the dot and minus are swapped. The same strange
behavior :-(

Do I overlook something?

Bye,
Edwin Martin.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]