Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-20 Thread Lance Andersen - Oracle
I think this OK.

The comments with the o--o did not do much for me though and found them a bit 
confusing but perhaps I need more coffee this morning ?

Also, not sure we need the @author tag but I think its usage varies in the 
workspace

Best
Lance
On Mar 19, 2014, at 7:10 PM, David Li wrote:

 Hi,
 
 This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  For 
 details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577.
 
 Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/
 
 Existing tests: JAXP SQE and unit tests passed.
 
 Test cases added for typo fix in RangeToken.intersectRanges.  Code also 
 updated to fix a bug where regular expression intersection returns incorrect 
 value when first range ends later than second range.   Example below. Test 
 cases have been added to cover any scenarios that the code changes affect.
 
 new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct)
 new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect)
 
 Thanks,
 David
 
 P.S. Notes on bug fixes.
 1) Line 404 removal of while loop.
 This fixes a new bug where incorrect results are given when first range ends 
 later than second range.  In the old code we got
 (?[a-r][b-d]) - returns [b-de-r]
 By removing the while loop, we get [b-d].
 This while loop looks like a copy-paste error from subtractRanges. In 
 subtractRanges we need to keep the leftover portion from the first range, but 
 this does not apply to intersection.
 
 2) Line 388, addition of src2 += 2;
 This code change affects anything of the form (?[a-r][b-eg-j]).  The code 
 execution is diagrammed below.
 oo  (src1)
  o--o o--o (src2)
 For the first match we get
 oo  (src1)
  o--o  (src2)
 Next we want to run src2+=2 to get the second pair of endpoints (since the 
 first two endpoints are already used).  Notice how src1begin has been updated 
 to this.ranges[src1] = src2end+1, which is directly from the code.
  o--o  (src1)
   o--o (src2)
 The src2+=2 statement was left out of the old code, and is added in this 
 webrev.  If we leave out the src2+=2 at line 388, on the next iteration of 
 the large while loop we will reach case } else if (src2end  src1begin) { 
 which also executes src2+=2.  This means the correct final result is 
 generated, but on a later loop. We want to add the new code because it's 
 better to have all associated variable updated in the sameloop.  In addition, 
 all the other conditions have similar src1 or src2 updates.
 


Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
lance.ander...@oracle.com



Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-20 Thread huizhe wang


On 3/20/2014 8:40 AM, Lance Andersen - Oracle wrote:

I think this OK.

The comments with the o--o did not do much for me though and found them a bit 
confusing but perhaps I need more coffee this morning ?


Black or white? :-)

It illustrates an intersection. For example, the result of the following 
would be o--o:


 o--o  (src1)
  o--o (src2)


-Joe



Also, not sure we need the @author tag but I think its usage varies in the 
workspace

Best
Lance
On Mar 19, 2014, at 7:10 PM, David Li wrote:


Hi,

This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  For 
details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577.

Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

Existing tests: JAXP SQE and unit tests passed.

Test cases added for typo fix in RangeToken.intersectRanges.  Code also updated 
to fix a bug where regular expression intersection returns incorrect value when 
first range ends later than second range.   Example below. Test cases have been 
added to cover any scenarios that the code changes affect.

new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct)
new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect)

Thanks,
David

P.S. Notes on bug fixes.
1) Line 404 removal of while loop.
This fixes a new bug where incorrect results are given when first range ends 
later than second range.  In the old code we got
(?[a-r][b-d]) - returns [b-de-r]
By removing the while loop, we get [b-d].
This while loop looks like a copy-paste error from subtractRanges. In 
subtractRanges we need to keep the leftover portion from the first range, but 
this does not apply to intersection.

2) Line 388, addition of src2 += 2;
This code change affects anything of the form (?[a-r][b-eg-j]).  The code 
execution is diagrammed below.
oo  (src1)
  o--o o--o (src2)
For the first match we get
oo  (src1)
  o--o  (src2)
Next we want to run src2+=2 to get the second pair of endpoints (since the 
first two endpoints are already used).  Notice how src1begin has been updated 
to this.ranges[src1] = src2end+1, which is directly from the code.
  o--o  (src1)
   o--o (src2)
The src2+=2 statement was left out of the old code, and is added in this webrev.  If we leave out the 
src2+=2 at line 388, on the next iteration of the large while loop we will reach case } else if 
(src2end  src1begin) { which also executes src2+=2.  This means the correct 
final result is generated, but on a later loop. We want to add the new code because it's better to 
have all associated variable updated in the sameloop.  In addition, all the other conditions have 
similar src1 or src2 updates.




Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com





RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-19 Thread David Li

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

Existing tests: JAXP SQE and unit tests passed.

Test cases added for typo fix in RangeToken.intersectRanges.  Code also 
updated to fix a bug where regular expression intersection returns 
incorrect value when first range ends later than second range.   Example 
below. Test cases have been added to cover any scenarios that the code 
changes affect.


new RegularExpression((?[b-d][a-r])); - returns [b-d] (Correct)
new RegularExpression((?[a-r][b-d])); - returns [b-de-r] (Incorrect)

Thanks,
David

P.S. Notes on bug fixes.
1) Line 404 removal of while loop.
This fixes a new bug where incorrect results are given when first range 
ends later than second range.  In the old code we got

(?[a-r][b-d]) - returns [b-de-r]
By removing the while loop, we get [b-d].
This while loop looks like a copy-paste error from subtractRanges. In 
subtractRanges we need to keep the leftover portion from the first 
range, but this does not apply to intersection.


2) Line 388, addition of src2 += 2;
This code change affects anything of the form (?[a-r][b-eg-j]).  The 
code execution is diagrammed below.

oo  (src1)
  o--o o--o (src2)
For the first match we get
oo  (src1)
  o--o  (src2)
Next we want to run src2+=2 to get the second pair of endpoints (since 
the first two endpoints are already used).  Notice how src1begin has 
been updated to this.ranges[src1] = src2end+1, which is directly from 
the code.

  o--o  (src1)
   o--o (src2)
The src2+=2 statement was left out of the old code, and is added in this 
webrev.  If we leave out the src2+=2 at line 388, on the next iteration 
of the large while loop we will reach case } else if (src2end  
src1begin) { which also executes src2+=2.  This means the correct 
final result is generated, but on a later loop. We want to add the new 
code because it's better to have all associated variable updated in the 
sameloop.  In addition, all the other conditions have similar src1 or 
src2 updates.




Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-14 Thread Alan Bateman

On 13/03/2014 23:07, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

New test case was added for code change in 
RangeToken.intersectRanges.  Other minor issues from last review 
comments are updated.


Existing tests: JAXP SQE and unit tests passed.
A bit of side question but this regex code seems very old (Hashtable, 
Vector ...) and I'm wondering how feasible it would be to replace it 
completely with java.util.regex. Joe might know if this has ever been 
discussed upstream (and I appreciate there is a desire to keep the code 
the same where possible).


-Alan


Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-14 Thread huizhe wang


On 3/14/2014 7:38 AM, Alan Bateman wrote:

On 13/03/2014 23:07, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

New test case was added for code change in 
RangeToken.intersectRanges.  Other minor issues from last review 
comments are updated.


Existing tests: JAXP SQE and unit tests passed.
A bit of side question but this regex code seems very old (Hashtable, 
Vector ...) and I'm wondering how feasible it would be to replace it 
completely with java.util.regex. Joe might know if this has ever been 
discussed upstream (and I appreciate there is a desire to keep the 
code the same where possible).


We are open to newer features. Xerces still support JDK 1.4, but we have 
moved on and accepted JDK 7 features. We will update and replace the old 
code such as Hashtable, Vector along the way and where possible. Note 
that, I mentioned to the corelib before, that Xalan has a strip-down 
version of Hashtable that actually performs better than Hashmap.


As for the Xerces-flavor of regex, it's a XML Schema conformant 
implementation. It supports that defined in the XML Schema 
specification: 
http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be 
possible to replace with java.util.regex but not without some sort of 
pattern-translation mechanism.


-Joe



-Alan




Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-14 Thread huizhe wang
I may be wrong, but the test passes without the patch. It looks like we 
may need to fix the bug David found: 
https://bugs.openjdk.java.net/browse/JDK-8037324


-Joe


On 3/14/2014 10:01 AM, huizhe wang wrote:


On 3/14/2014 7:38 AM, Alan Bateman wrote:

On 13/03/2014 23:07, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

New test case was added for code change in 
RangeToken.intersectRanges.  Other minor issues from last review 
comments are updated.


Existing tests: JAXP SQE and unit tests passed.
A bit of side question but this regex code seems very old (Hashtable, 
Vector ...) and I'm wondering how feasible it would be to replace it 
completely with java.util.regex. Joe might know if this has ever been 
discussed upstream (and I appreciate there is a desire to keep the 
code the same where possible).


We are open to newer features. Xerces still support JDK 1.4, but we 
have moved on and accepted JDK 7 features. We will update and replace 
the old code such as Hashtable, Vector along the way and where 
possible. Note that, I mentioned to the corelib before, that Xalan has 
a strip-down version of Hashtable that actually performs better than 
Hashmap.


As for the Xerces-flavor of regex, it's a XML Schema conformant 
implementation. It supports that defined in the XML Schema 
specification: 
http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be 
possible to replace with java.util.regex but not without some sort of 
pattern-translation mechanism.


-Joe



-Alan






Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-14 Thread David Li
Yeah, the test passes without the patch.  That is my error.  The test I 
wrote is for testing the correctness of the code I added.  It doesn't 
test the specific change.  I can add another test for the change.


I can also move the bug I filed to this bug fix.

-David

On 3/14/2014 2:02 PM, huizhe wang wrote:
I may be wrong, but the test passes without the patch. It looks like 
we may need to fix the bug David found: 
https://bugs.openjdk.java.net/browse/JDK-8037324


-Joe


On 3/14/2014 10:01 AM, huizhe wang wrote:


On 3/14/2014 7:38 AM, Alan Bateman wrote:

On 13/03/2014 23:07, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

New test case was added for code change in 
RangeToken.intersectRanges.  Other minor issues from last review 
comments are updated.


Existing tests: JAXP SQE and unit tests passed.
A bit of side question but this regex code seems very old 
(Hashtable, Vector ...) and I'm wondering how feasible it would be 
to replace it completely with java.util.regex. Joe might know if 
this has ever been discussed upstream (and I appreciate there is a 
desire to keep the code the same where possible).


We are open to newer features. Xerces still support JDK 1.4, but we 
have moved on and accepted JDK 7 features. We will update and replace 
the old code such as Hashtable, Vector along the way and where 
possible. Note that, I mentioned to the corelib before, that Xalan 
has a strip-down version of Hashtable that actually performs better 
than Hashmap.


As for the Xerces-flavor of regex, it's a XML Schema conformant 
implementation. It supports that defined in the XML Schema 
specification: 
http://www.w3.org/TR/2000/WD-xmlschema-2-2407/#regexs. It may be 
possible to replace with java.util.regex but not without some sort of 
pattern-translation mechanism.


-Joe



-Alan








Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-13 Thread huizhe wang

Thanks David for the update!

Joe

On 3/13/2014 4:07 PM, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

New test case was added for code change in 
RangeToken.intersectRanges.  Other minor issues from last review 
comments are updated.


Existing tests: JAXP SQE and unit tests passed.

Thanks,
David




Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-06 Thread Alan Bateman

On 05/03/2014 22:17, huizhe wang wrote:


On 3/5/2014 1:38 PM, Alan Bateman wrote:

On 05/03/2014 20:18, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.
Do you know if these tests exercise this code? I realize the Xerces 
didn't come with tests but I do have a concern about changing regex 
code without knowing that it is at least exercised by some tests.


There are over 100 tests in SQE suite in this area, we'll take a look.
If you can check then that would be good as I think I can be forgiven 
for being nervous about bringing in patches that don't have tests.


-Alan.


Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-06 Thread huizhe wang


On 3/6/2014 12:31 AM, Alan Bateman wrote:

On 05/03/2014 22:17, huizhe wang wrote:


On 3/5/2014 1:38 PM, Alan Bateman wrote:

On 05/03/2014 20:18, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.
Do you know if these tests exercise this code? I realize the Xerces 
didn't come with tests but I do have a concern about changing regex 
code without knowing that it is at least exercised by some tests.


There are over 100 tests in SQE suite in this area, we'll take a look.
If you can check then that would be good as I think I can be forgiven 
for being nervous about bringing in patches that don't have tests.


Understand. We'll look into adding new tests.

-Joe



-Alan.




RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread David Li

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.

Thanks,
David


Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread Xueming Shen

nitpicking,

(1) shouldn't the variable at #468 to be updated to lch instead of uch as 
well?
(2) StringBuilder can be used to replace the StringBuffer in toString().

-Sherman

On 03/05/2014 12:18 PM, David Li wrote:

Hi,

This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  For 
details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577.

Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.

Thanks,
David




Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread Lance Andersen - Oracle

On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote:

 nitpicking,
 
 (1) shouldn't the variable at #468 to be updated to lch instead of uch as 
 well?

I would agree given you are now calling Character.toLowerCase
 (2) StringBuilder can be used to replace the StringBuffer in toString().
Agree, but I think this could be considered optional for this putback...


 
 -Sherman
 
 On 03/05/2014 12:18 PM, David Li wrote:
 Hi,
 
 This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  
 For details, please refer to: 
 https://bugs.openjdk.java.net/browse/JDK-8035577.
 
 Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/
 
 No new tests.  There were none added in Xerces.
 
 Existing tests: JAXP SQE and unit tests passed.
 
 Thanks,
 David
 


Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
lance.ander...@oracle.com



Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread Alan Bateman

On 05/03/2014 20:18, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.
Do you know if these tests exercise this code? I realize the Xerces 
didn't come with tests but I do have a concern about changing regex code 
without knowing that it is at least exercised by some tests.


-Alan.


Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread huizhe wang


On 3/5/2014 12:46 PM, Lance Andersen - Oracle wrote:

On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote:


nitpicking,

(1) shouldn't the variable at #468 to be updated to lch instead of uch as 
well?

I would agree given you are now calling Character.toLowerCase


Also, at line 463 and 464, it looks like 'uppers' meant to be 'lowers' 
defined at 462.



(2) StringBuilder can be used to replace the StringBuffer in toString().

Agree, but I think this could be considered optional for this putback...


Yes, feel free to replace StringBuffer with StringBuilder, Vector to 
ArrayList and etc.


-Joe





-Sherman

On 03/05/2014 12:18 PM, David Li wrote:

Hi,

This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  For 
details, please refer to: https://bugs.openjdk.java.net/browse/JDK-8035577.

Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.

Thanks,
David



Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com





Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread huizhe wang


On 3/5/2014 1:38 PM, Alan Bateman wrote:

On 05/03/2014 20:18, David Li wrote:

Hi,

This is an update from Xerces for file 
impl/xpath/regex/TokenRange.java.  For details, please refer to: 
https://bugs.openjdk.java.net/browse/JDK-8035577.


Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/

No new tests.  There were none added in Xerces.

Existing tests: JAXP SQE and unit tests passed.
Do you know if these tests exercise this code? I realize the Xerces 
didn't come with tests but I do have a concern about changing regex 
code without knowing that it is at least exercised by some tests.


There are over 100 tests in SQE suite in this area, we'll take a look.

-Joe



-Alan.




Re: RFR (JAXP): 8035577: Xerces Update: impl/xpath/regex/RangeToken.java

2014-03-05 Thread Lance Andersen - Oracle

On Mar 5, 2014, at 5:10 PM, huizhe wang wrote:

 
 On 3/5/2014 12:46 PM, Lance Andersen - Oracle wrote:
 On Mar 5, 2014, at 3:37 PM, Xueming Shen wrote:
 
 nitpicking,
 
 (1) shouldn't the variable at #468 to be updated to lch instead of uch 
 as well?
 I would agree given you are now calling Character.toLowerCase
 
 Also, at line 463 and 464, it looks like 'uppers' meant to be 'lowers' 
 defined at 462.

I think we should add tests with this change it looks like we really need to 
test this if your suggestion above is correct.  
 
 (2) StringBuilder can be used to replace the StringBuffer in toString().
 Agree, but I think this could be considered optional for this putback...
 
 Yes, feel free to replace StringBuffer with StringBuilder, Vector to 
 ArrayList and etc.
 
 -Joe
 
 
 
 -Sherman
 
 On 03/05/2014 12:18 PM, David Li wrote:
 Hi,
 
 This is an update from Xerces for file impl/xpath/regex/TokenRange.java.  
 For details, please refer to: 
 https://bugs.openjdk.java.net/browse/JDK-8035577.
 
 Webrevs: http://cr.openjdk.java.net/~joehw/jdk9/8035577/webrev/
 
 No new tests.  There were none added in Xerces.
 
 Existing tests: JAXP SQE and unit tests passed.
 
 Thanks,
 David
 
 
 Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
 Oracle Java Engineering
 1 Network Drive
 Burlington, MA 01803
 lance.ander...@oracle.com
 
 


Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering 
1 Network Drive 
Burlington, MA 01803
lance.ander...@oracle.com