DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3730>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3730 Perl5Matcher sometimes confuses the begin/end offsets on similar sub patterns in a regular expression Summary: Perl5Matcher sometimes confuses the begin/end offsets on similar sub patterns in a regular expression Product: ORO Version: 2.0 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: Main AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] here is the test program: import com.oroinc.text.regex.*; import java.io.*; public class bug_report { public static void main(String[] args) throws Exception { String regex = "\010[(]GAME +GID:([^;]+); +GDATE:([^;]*); +GSTART:([^;] *); +GSITE:([^;]*); +GNEUTRAL:([^;]*); +GSTAT:([^;]*); +GPERIOD:([^;]*);[^\r\n]* [\r\n]+" +"(" +"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3} +THOME: *([Yy][Ee][Ss]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)" +"|" +"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3} +THOME: *([Nn][Oo]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)" +"){2}"; String input = "(GAME GID:13805; GDATE:11/01/2000; GSTART:19:30; GSITE:Charlotte Coliseum; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n" +"(TEAM TNAME:Hornets; TLOCALE:Charlotte; TCONF:Eastern; TDIV:Central; THOME:YES; TSCORE:77; TSTAT:LOST; TID:9;)\n" +"(TEAM TNAME:Wizards; TLOCALE:Washington; TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:95; TSTAT:WON; TID:7;))\n"; String input2 = "(GAME GID:13789; GDATE:10/31/2000; GSTART:19:30; GSITE:TD Waterhouse Centre; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n" +"(TEAM TNAME:Magic; TLOCALE:Orlando; TCONF:Eastern; TDIV:Atlantic; THOME:YES; TSCORE:97; TSTAT:WON; TID:5;)\n" +"(TEAM TNAME:Wizards; TLOCALE:Washington; TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:86; TSTAT:LOST; TID:7;))\n"; Perl5Compiler p5compiler = new Perl5Compiler(); Perl5Pattern p5pattern = null; Perl5Matcher p5matcher = new Perl5Matcher(); PatternMatcherInput p5input = new PatternMatcherInput(input2); try { p5pattern = (Perl5Pattern) p5compiler.compile(regex, Perl5Compiler.SINGLELINE_MASK | Perl5Compiler.READ_ONLY_MASK ); } catch(MalformedPatternException e) { System.out.println("Error: Bad Perl5 pattern."); System.out.println(e.getMessage()); } boolean result = p5matcher.matchesPrefix(p5input, p5pattern); if( result ) { MatchResult mr = p5matcher.getMatch(); int groups = mr.groups(); int start = -1; int end = -1; String matchStr = null; for( int x = 0; x < groups; x++ ) { start = mr.beginOffset(x); end = mr.endOffset(x); //matchStr = mr.group(x); //System.out.print ("Pos: "+x+"\tStart: "+start+"\tEnd: "+end+"\tMatch: "+matchStr); System.out.print("Pos: "+x+"\tStart: "+start+"\tEnd: "+end); if( start > end ) System.out.println( " -- ERROR" ); else System.out.println(); } } else { System.out.println("No Match"); } System.out.println("Program terminating"); } } and here is some output: Pos: 0 Start: 0 End: 338 Pos: 1 Start: 11 End: 16 Pos: 2 Start: 24 End: 34 Pos: 3 Start: 43 End: 48 Pos: 4 Start: 56 End: 76 Pos: 5 Start: 87 End: 89 Pos: 6 Start: 97 End: 102 Pos: 7 Start: 112 End: 113 Pos: 8 Start: 224 End: 338 Pos: 9 Start: 224 End: 224 Pos: 10 Start: 237 End: 237 Pos: 11 Start: 280 End: 295 Pos: 12 Start: 302 End: 192 -- ERROR Pos: 13 Start: 201 End: 203 Pos: 14 Start: 211 End: 214 Pos: 15 Start: 224 End: 338 Pos: 16 Start: 237 End: 244 Pos: 17 Start: 280 End: 295 Pos: 18 Start: 302 End: 304 Pos: 19 Start: 313 End: 315 Pos: 20 Start: 323 End: 327 Program terminating if you'll notice, Pos 12 and Pos 18 share the same Start value. In the regex they have the same pattern. Granted, there are many similar sub patterns as a matter of fact lines 2 and 3 of the pattern are almost exatly the same except for [Yy][Ee][Ss] and [Nn][Oo]...
