Hi. Jon Stevens > Q: There are two tests that don't pass. Why? Having got home and looked at the tests I think I did not answer the question properly earlier. I assume you mean why should these not match? Jon Stevens > #159 If so then #159 the www is preceded by a period. The re requires that the the first character of the domain name be alpanumeric or a hyphen. http://.www.test.com Jon Stevens > #161 The re only matchs ftp and http protocols. But not Fttp. NOTE: The "Match: NO" means a successful non-matching. A big... ********************************************* ************Failure************************* ********************************************* type message appears if one of the tests "fails". As to the test(); as opposed to new RETest( args ); I have included another patch to clean this up( this is a repeat of the previous patch with additions). I think someone intended to clean it up earlier but did not finish or was distracted as the javadocs says one thing and the code does something other. I think it on track now... Michael
Index: docs/RETest.txt =================================================================== RCS file: /home/cvspublic/jakarta-regexp/docs/RETest.txt,v retrieving revision 1.1 diff -r1.1 RETest.txt 886a887,980 > > #149 > (?:a) > a > YES > a > > #150 > (?:a) > aa > YES > a > > #151 > (?:\w) > abc > YES > a > > #152 > (?:\w\s\w)+ > a b c > YES > a b > > #153 > (a\w)(?:,(a\w))+ > ab,ac,ad > YES > ab,ac,ad > ab > ad > > #154 > z(\w\s+(?:\w\s+\w)+)z > za b bc cd dz > YES > za b bc cd dz > a b bc cd d > > #155 > (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > http://www.test.com > YES > http://www.test.com > http:// > http > .com > > #156 > ((?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > ftp://www.test.com > YES > ftp://www.test.com > ftp:// > .com > > #157 > (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)* > htTp://www.test.com > YES > htTp://www.test.com > htTp:// > htTp > > #158 > (?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > FTP://www.test.com > YES > FTP://www.test.com > FTP > .com > > #159 > ^(?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*$ > http://.www.test.com > NO > > #160 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > FtP://www.test.com > YES > FtP://www.test.com > > #161 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > FtTP://www.test.com > NO > > #162 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > www.test.com > YES > www.test.com Index: src/java/org/apache/regexp/RE.java =================================================================== RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RE.java,v retrieving revision 1.6 diff -r1.6 RE.java 176,186c176,186 < * [:alnum:] Alphanumeric characters. < * [:alpha:] Alphabetic characters. < * [:blank:] Space and tab characters. < * [:cntrl:] Control characters. < * [:digit:] Numeric characters. < * [:graph:] Characters that are printable and are also visible. (A space is printable, but not visible, while an `a' is both.) < * [:lower:] Lower-case alphabetic characters. < * [:print:] Printable characters (characters that are not control characters.) < * [:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters). < * [:space:] Space characters (such as space, tab, and formfeed, to name a few). < * [:upper:] Upper-case alphabetic characters. --- > * [:alnum:] Alphanumeric characters. > * [:alpha:] Alphabetic characters. > * [:blank:] Space and tab characters. > * [:cntrl:] Control characters. > * [:digit:] Numeric characters. > * [:graph:] Characters that are printable and are also visible. (A >space is printable, but not visible, while an `a' is both.) > * [:lower:] Lower-case alphabetic characters. > * [:print:] Printable characters (characters that are not control >characters.) > * [:punct:] Punctuation characters (characters that are not letter, >digits, control characters, or space characters). > * [:space:] Space characters (such as space, tab, and formfeed, to >name a few). > * [:upper:] Upper-case alphabetic characters. 188c188 < * --- > * 199c199 < * --- > * 254a255 > * (?:A) Used for subexpression clustering (just like grouping but >no backrefs) 399a401 > static final char OP_OPEN_CLUSTER = '<'; // opening cluster 400a403 > static final char OP_CLOSE_CLUSTER = '>'; // closing cluster 421c424 < static final char POSIX_CLASS_ALPHA = 'a'; // Alphabetics --- > static final char POSIX_CLASS_ALPHA = 'a'; // Alphabetics 947a951,955 > > case OP_OPEN_CLUSTER: > case OP_CLOSE_CLUSTER: > // starting or ending the matching of a subexpression which has >no backref. > return matchNodes( next, maxNode, idx ); Index: src/java/org/apache/regexp/RECompiler.java =================================================================== RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RECompiler.java,v retrieving revision 1.2 diff -r1.2 RECompiler.java 1191c1191 < boolean paren = false; --- > int paren = -1; 1196,1198c1196,1208 < idx++; < paren = true; < ret = node(RE.OP_OPEN, parens++); --- > // if its a cluster ( rather than a proper subexpression ie with >backrefs ) > if ( idx + 2 < len && pattern.charAt( idx + 1 ) == '?' && >pattern.charAt( idx + 2 ) == ':' ) > { > paren = 2; > idx += 3; > ret = node( RE.OP_OPEN_CLUSTER, 0 ); > } > else > { > paren = 1; > idx++; > ret = node(RE.OP_OPEN, parens++); > } 1223c1233 < if (paren) --- > if ( paren > 0 ) 1233c1243,1250 < end = node(RE.OP_CLOSE, closeParens); --- > if ( paren == 1 ) > { > end = node(RE.OP_CLOSE, closeParens); > } > else > { > end = node( RE.OP_CLOSE_CLUSTER, 0 ); > } Index: src/java/org/apache/regexp/RETest.java =================================================================== RCS file: /home/cvspublic/jakarta-regexp/src/java/org/apache/regexp/RETest.java,v retrieving revision 1.2 diff -r1.2 RETest.java 58c58 < */ --- > */ 85c85 < public static void main(String[] arg) --- > public static void main(String[] args) 89,90c89 < //new RETest(arg); < test(); --- > test( args ); 103c102 < public static boolean test() throws Exception --- > public static boolean test( String[] args ) throws Exception 106c105,121 < test.runAutomatedTests("docs/RETest.txt"); --- > // Run interactive tests against a single regexp > if (args.length == 2) > { > test.runInteractiveTests(args[1]); > } > else if (args.length == 1) > { > // Run automated tests > test.runAutomatedTests(args[0]); > } > else > { > System.out.println( "Usage: RETest ([-i] [regex]) >([/path/to/testfile.txt])" ); > System.out.println( "By Default will run automated tests from file >'docs/RETest.txt' ..." ); > System.out.println(); > test.runAutomatedTests("docs/RETest.txt"); > } 118,146d132 < * Constructor for test < * @param arg Command line arguments < */ < public RETest(String[] arg) < { < try < { < // Run interactive tests against a single regexp < if (arg.length == 2) < { < runInteractiveTests(arg[1]); < } < else if (arg.length == 1) < { < // Run automated tests < runAutomatedTests(arg[0]); < } < else < { < System.out.println ( "Usage: RETest ([-i] [regex]) ([/path/to/testfile.txt])" ); < } < } < catch (Exception e) < { < e.printStackTrace(); < } < } < < /** 162c148 < --- > Index: xdocs/RETest.txt =================================================================== RCS file: /home/cvspublic/jakarta-regexp/xdocs/RETest.txt,v retrieving revision 1.1 diff -r1.1 RETest.txt 886a887,980 > > #149 > (?:a) > a > YES > a > > #150 > (?:a) > aa > YES > a > > #151 > (?:\w) > abc > YES > a > > #152 > (?:\w\s\w)+ > a b c > YES > a b > > #153 > (a\w)(?:,(a\w))+ > ab,ac,ad > YES > ab,ac,ad > ab > ad > > #154 > z(\w\s+(?:\w\s+\w)+)z > za b bc cd dz > YES > za b bc cd dz > a b bc cd d > > #155 > (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > http://www.test.com > YES > http://www.test.com > http:// > http > .com > > #156 > ((?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > ftp://www.test.com > YES > ftp://www.test.com > ftp:// > .com > > #157 > (([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)* > htTp://www.test.com > YES > htTp://www.test.com > htTp:// > htTp > > #158 > (?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)* > FTP://www.test.com > YES > FTP://www.test.com > FTP > .com > > #159 > ^(?:([hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(\.[a-zA-Z0-9\-]+)*$ > http://.www.test.com > NO > > #160 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > FtP://www.test.com > YES > FtP://www.test.com > > #161 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > FtTP://www.test.com > NO > > #162 > ^(?:(?:[hH][tT]{2}[pP]|[fF][tT][pP]):\/\/)?[a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*$ > www.test.com > YES > www.test.com