Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
I think this OK. The comments with the o--o did not do much for me though and found them a bit confusing but perhaps I need more coffee this morning ? Also, not sure we need the @author tag but I think its usage varies in the workspace Best Lance On Mar 19, 2014, at 7:10 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ Existing tests: JAXP SQE and unit tests passed. Test cases added for typo fix in RangeToken.intersectRanges. Code also updated to fix a bug where regular expression intersection returns incorrect value when first range ends later than second range. Example below. Test cases have been added to cover any scenarios that the code changes affect. new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct) new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect) Thanks, David P.S. Notes on bug fixes. 1) Line 404 removal of while loop. This fixes a new bug where incorrect results are given when first range ends later than second range. In the old code we got (?[a-r][b-d]) - returns [b-de-r] By removing the while loop, we get [b-d]. This while loop looks like a copy-paste error from subtractRanges. In subtractRanges we need to keep the leftover portion from the first range, but this does not apply to intersection. 2) Line 388, addition of src2 += 2; This code change affects anything of the form (?[a-r][b-eg-j]). The code execution is diagrammed below. oo (src1) o--o o--o (src2) For the first match we get oo (src1) o--o (src2) Next we want to run src2+=2 to get the second pair of endpoints (since the first two endpoints are already used). Notice how src1begin has been updated to this.ranges[src1] = src2end+1, which is directly from the code. o--o (src1) o--o (src2) The src2+=2 statement was left out of the old code, and is added in this webrev. If we leave out the src2+=2 at line 388, on the next iteration of the large while loop we will reach case } else if (src2end src1begin) { which also executes src2+=2. This means the correct final result is generated, but on a later loop. We want to add the new code because it's better to have all associated variable updated in the sameloop. In addition, all the other conditions have similar src1 or src2 updates. Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 3/20/2014 8:40 AM, Lance Andersen - Oracle wrote: I think this OK. The comments with the o--o did not do much for me though and found them a bit confusing but perhaps I need more coffee this morning ? Black or white? :-) It illustrates an intersection. For example, the result of the following would be o--o: o--o (src1) o--o (src2) -Joe Also, not sure we need the @author tag but I think its usage varies in the workspace Best Lance On Mar 19, 2014, at 7:10 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ Existing tests: JAXP SQE and unit tests passed. Test cases added for typo fix in RangeToken.intersectRanges. Code also updated to fix a bug where regular expression intersection returns incorrect value when first range ends later than second range. Example below. Test cases have been added to cover any scenarios that the code changes affect. new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct) new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect) Thanks, David P.S. Notes on bug fixes. 1) Line 404 removal of while loop. This fixes a new bug where incorrect results are given when first range ends later than second range. In the old code we got (?[a-r][b-d]) - returns [b-de-r] By removing the while loop, we get [b-d]. This while loop looks like a copy-paste error from subtractRanges. In subtractRanges we need to keep the leftover portion from the first range, but this does not apply to intersection. 2) Line 388, addition of src2 += 2; This code change affects anything of the form (?[a-r][b-eg-j]). The code execution is diagrammed below. oo (src1) o--o o--o (src2) For the first match we get oo (src1) o--o (src2) Next we want to run src2+=2 to get the second pair of endpoints (since the first two endpoints are already used). Notice how src1begin has been updated to this.ranges[src1] = src2end+1, which is directly from the code. o--o (src1) o--o (src2) The src2+=2 statement was left out of the old code, and is added in this webrev. If we leave out the src2+=2 at line 388, on the next iteration of the large while loop we will reach case } else if (src2end src1begin) { which also executes src2+=2. This means the correct final result is generated, but on a later loop. We want to add the new code because it's better to have all associated variable updated in the sameloop. In addition, all the other conditions have similar src1 or src2 updates. Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com
RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ Existing tests: JAXP SQE and unit tests passed. Test cases added for typo fix in RangeToken.intersectRanges. Code also updated to fix a bug where regular expression intersection returns incorrect value when first range ends later than second range. Example below. Test cases have been added to cover any scenarios that the code changes affect. new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct) new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect) Thanks, David P.S. Notes on bug fixes. 1) Line 404 removal of while loop. This fixes a new bug where incorrect results are given when first range ends later than second range. In the old code we got (?[a-r][b-d]) - returns [b-de-r] By removing the while loop, we get [b-d]. This while loop looks like a copy-paste error from subtractRanges. In subtractRanges we need to keep the leftover portion from the first range, but this does not apply to intersection. 2) Line 388, addition of src2 += 2; This code change affects anything of the form (?[a-r][b-eg-j]). The code execution is diagrammed below. oo (src1) o--o o--o (src2) For the first match we get oo (src1) o--o (src2) Next we want to run src2+=2 to get the second pair of endpoints (since the first two endpoints are already used). Notice how src1begin has been updated to this.ranges[src1] = src2end+1, which is directly from the code. o--o (src1) o--o (src2) The src2+=2 statement was left out of the old code, and is added in this webrev. If we leave out the src2+=2 at line 388, on the next iteration of the large while loop we will reach case } else if (src2end src1begin) { which also executes src2+=2. This means the correct final result is generated, but on a later loop. We want to add the new code because it's better to have all associated variable updated in the sameloop. In addition, all the other conditions have similar src1 or src2 updates.
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 13/03/2014 23:07, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ New test case was added for code change in RangeToken.intersectRanges. Other minor issues from last review comments are updated. Existing tests: JAXP SQE and unit tests passed. A bit of side question but this regex code seems very old (Hashtable, Vector ...) and I'm wondering how feasible it would be to replace it completely with java.util.regex. Joe might know if this has ever been discussed upstream (and I appreciate there is a desire to keep the code the same where possible). -Alan
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 3/14/2014 7:38 AM, Alan Bateman wrote: On 13/03/2014 23:07, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ New test case was added for code change in RangeToken.intersectRanges. Other minor issues from last review comments are updated. Existing tests: JAXP SQE and unit tests passed. A bit of side question but this regex code seems very old (Hashtable, Vector ...) and I'm wondering how feasible it would be to replace it completely with java.util.regex. Joe might know if this has ever been discussed upstream (and I appreciate there is a desire to keep the code the same where possible). We are open to newer features. Xerces still support JDK 1.4, but we have moved on and accepted JDK 7 features. We will update and replace the old code such as Hashtable, Vector along the way and where possible. Note that, I mentioned to the corelib before, that Xalan has a strip-down version of Hashtable that actually performs better than Hashmap. As for the Xerces-flavor of regex, it's a XML Schema conformant implementation. It supports that defined in the XML Schema specification: http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be possible to replace with java.util.regex but not without some sort of pattern-translation mechanism. -Joe -Alan
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
I may be wrong, but the test passes without the patch. It looks like we may need to fix the bug David found: https://bugs.openjdk.java.net/browse/JDK-8037324 -Joe On 3/14/2014 10:01 AM, huizhe wang wrote: On 3/14/2014 7:38 AM, Alan Bateman wrote: On 13/03/2014 23:07, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ New test case was added for code change in RangeToken.intersectRanges. Other minor issues from last review comments are updated. Existing tests: JAXP SQE and unit tests passed. A bit of side question but this regex code seems very old (Hashtable, Vector ...) and I'm wondering how feasible it would be to replace it completely with java.util.regex. Joe might know if this has ever been discussed upstream (and I appreciate there is a desire to keep the code the same where possible). We are open to newer features. Xerces still support JDK 1.4, but we have moved on and accepted JDK 7 features. We will update and replace the old code such as Hashtable, Vector along the way and where possible. Note that, I mentioned to the corelib before, that Xalan has a strip-down version of Hashtable that actually performs better than Hashmap. As for the Xerces-flavor of regex, it's a XML Schema conformant implementation. It supports that defined in the XML Schema specification: http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be possible to replace with java.util.regex but not without some sort of pattern-translation mechanism. -Joe -Alan
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
Yeah, the test passes without the patch. That is my error. The test I wrote is for testing the correctness of the code I added. It doesn't test the specific change. I can add another test for the change. I can also move the bug I filed to this bug fix. -David On 3/14/2014 2:02 PM, huizhe wang wrote: I may be wrong, but the test passes without the patch. It looks like we may need to fix the bug David found: https://bugs.openjdk.java.net/browse/JDK-8037324 -Joe On 3/14/2014 10:01 AM, huizhe wang wrote: On 3/14/2014 7:38 AM, Alan Bateman wrote: On 13/03/2014 23:07, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ New test case was added for code change in RangeToken.intersectRanges. Other minor issues from last review comments are updated. Existing tests: JAXP SQE and unit tests passed. A bit of side question but this regex code seems very old (Hashtable, Vector ...) and I'm wondering how feasible it would be to replace it completely with java.util.regex. Joe might know if this has ever been discussed upstream (and I appreciate there is a desire to keep the code the same where possible). We are open to newer features. Xerces still support JDK 1.4, but we have moved on and accepted JDK 7 features. We will update and replace the old code such as Hashtable, Vector along the way and where possible. Note that, I mentioned to the corelib before, that Xalan has a strip-down version of Hashtable that actually performs better than Hashmap. As for the Xerces-flavor of regex, it's a XML Schema conformant implementation. It supports that defined in the XML Schema specification: http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be possible to replace with java.util.regex but not without some sort of pattern-translation mechanism. -Joe -Alan
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
Thanks David for the update! Joe On 3/13/2014 4:07 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ New test case was added for code change in RangeToken.intersectRanges. Other minor issues from last review comments are updated. Existing tests: JAXP SQE and unit tests passed. Thanks, David
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 05/03/2014 22:17, huizhe wang wrote: On 3/5/2014 1:38 PM, Alan Bateman wrote: On 05/03/2014 20:18, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Do you know if these tests exercise this code? I realize the Xerces didn't come with tests but I do have a concern about changing regex code without knowing that it is at least exercised by some tests. There are over 100 tests in SQE suite in this area, we'll take a look. If you can check then that would be good as I think I can be forgiven for being nervous about bringing in patches that don't have tests. -Alan.
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 3/6/2014 12:31 AM, Alan Bateman wrote: On 05/03/2014 22:17, huizhe wang wrote: On 3/5/2014 1:38 PM, Alan Bateman wrote: On 05/03/2014 20:18, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Do you know if these tests exercise this code? I realize the Xerces didn't come with tests but I do have a concern about changing regex code without knowing that it is at least exercised by some tests. There are over 100 tests in SQE suite in this area, we'll take a look. If you can check then that would be good as I think I can be forgiven for being nervous about bringing in patches that don't have tests. Understand. We'll look into adding new tests. -Joe -Alan.
RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Thanks, David
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
nitpicking, (1) shouldn't the variable at #468 to be updated to lch instead of uch as well? (2) StringBuilder can be used to replace the StringBuffer in toString(). -Sherman On 03/05/2014 12:18 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Thanks, David
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote: nitpicking, (1) shouldn't the variable at #468 to be updated to lch instead of uch as well? I would agree given you are now calling Character.toLowerCase (2) StringBuilder can be used to replace the StringBuffer in toString(). Agree, but I think this could be considered optional for this putback... -Sherman On 03/05/2014 12:18 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Thanks, David Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 05/03/2014 20:18, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Do you know if these tests exercise this code? I realize the Xerces didn't come with tests but I do have a concern about changing regex code without knowing that it is at least exercised by some tests. -Alan.
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 3/5/2014 12:46 PM, Lance Andersen - Oracle wrote: On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote: nitpicking, (1) shouldn't the variable at #468 to be updated to lch instead of uch as well? I would agree given you are now calling Character.toLowerCase Also, at line 463 and 464, it looks like 'uppers' meant to be 'lowers' defined at 462. (2) StringBuilder can be used to replace the StringBuffer in toString(). Agree, but I think this could be considered optional for this putback... Yes, feel free to replace StringBuffer with StringBuilder, Vector to ArrayList and etc. -Joe -Sherman On 03/05/2014 12:18 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Thanks, David Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On 3/5/2014 1:38 PM, Alan Bateman wrote: On 05/03/2014 20:18, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Do you know if these tests exercise this code? I realize the Xerces didn't come with tests but I do have a concern about changing regex code without knowing that it is at least exercised by some tests. There are over 100 tests in SQE suite in this area, we'll take a look. -Joe -Alan.
Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java
On Mar 5, 2014, at 5:10 PM, huizhe wang wrote: On 3/5/2014 12:46 PM, Lance Andersen - Oracle wrote: On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote: nitpicking, (1) shouldn't the variable at #468 to be updated to lch instead of uch as well? I would agree given you are now calling Character.toLowerCase Also, at line 463 and 464, it looks like 'uppers' meant to be 'lowers' defined at 462. I think we should add tests with this change it looks like we really need to test this if your suggestion above is correct. (2) StringBuilder can be used to replace the StringBuffer in toString(). Agree, but I think this could be considered optional for this putback... Yes, feel free to replace StringBuffer with StringBuilder, Vector to ArrayList and etc. -Joe -Sherman On 03/05/2014 12:18 PM, David Li wrote: Hi, This is an update from Xerces for file impl/xpath/regex/TokenRange.java. For details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577. Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/ No new tests. There were none added in Xerces. Existing tests: JAXP SQE and unit tests passed. Thanks, David Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 Oracle Java Engineering 1 Network Drive Burlington, MA 01803 lance.ander...@oracle.com