Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-03-03 Thread Paul Sandoz

On Mar 3, 2015, at 3:09 AM, Stuart Marks stuart.ma...@oracle.com wrote:

 On 3/2/15 1:49 AM, Paul Sandoz wrote:
 On Feb 28, 2015, at 4:40 AM, Xueming Shen xueming.s...@oracle.com wrote:
 Updated to a static private class for the toMatchResult(). Added a private 
 field MatchResult for the anonymous MatchResult
 wrapper.
 
 http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html
 
 
 Many thanks, i took most of that code and updated:
 
 http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html
 
 - additional documentation for replacer function parameter is moved to the 
 main body
 
 - there is no need for an internal MatchResult instance wrapper. It does not 
 protect against state modification, does not protect against escape, creates 
 another object for all matcher instances, and results in more wrapping. The 
 only way we can avoid the first 2 is by using an immutable match result, 
 which has it's own cost that unfortunately we cannot avoid in the results().
 
 
 Even though there's more overhead, I like the idea of creating a distinct 
 match result for each stream value. Consider what might happen if somebody 
 decided to collect the match results into a list at the end of the pipeline.

Right, we cannot avoid it (sequentially or for parallel).


 (Did the earlier versions pass this down the stream?

No.


 If so, boy, was that a latent bug.)


 I'm reminded of one of the old Map entrySet iterators that reused the same 
 Map.Entry instance for the duration of the iteration. That caused some 
 problems.
 

See ServiceLoader :-)


 Anyway, this latest version looks fine.
 

Thanks!
Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-03-02 Thread Stuart Marks

On 3/2/15 1:49 AM, Paul Sandoz wrote:

On Feb 28, 2015, at 4:40 AM, Xueming Shen xueming.s...@oracle.com wrote:

Updated to a static private class for the toMatchResult(). Added a private 
field MatchResult for the anonymous MatchResult
wrapper.

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html



Many thanks, i took most of that code and updated:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

- additional documentation for replacer function parameter is moved to the main 
body

- there is no need for an internal MatchResult instance wrapper. It does not 
protect against state modification, does not protect against escape, creates 
another object for all matcher instances, and results in more wrapping. The 
only way we can avoid the first 2 is by using an immutable match result, which 
has it's own cost that unfortunately we cannot avoid in the results().



Even though there's more overhead, I like the idea of creating a distinct match 
result for each stream value. Consider what might happen if somebody decided to 
collect the match results into a list at the end of the pipeline. (Did the 
earlier versions pass this down the stream? If so, boy, was that a latent 
bug.) I'm reminded of one of the old Map entrySet iterators that reused the same 
Map.Entry instance for the duration of the iteration. That caused some problems.


Anyway, this latest version looks fine.

s'marks


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-03-02 Thread Paul Sandoz
On Feb 28, 2015, at 4:40 AM, Xueming Shen xueming.s...@oracle.com wrote:
 Updated to a static private class for the toMatchResult(). Added a private 
 field MatchResult for the anonymous MatchResult
 wrapper.
 
 http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html
 

Many thanks, i took most of that code and updated:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

- additional documentation for replacer function parameter is moved to the main 
body

- there is no need for an internal MatchResult instance wrapper. It does not 
protect against state modification, does not protect against escape, creates 
another object for all matcher instances, and results in more wrapping. The 
only way we can avoid the first 2 is by using an immutable match result, which 
has it's own cost that unfortunately we cannot avoid in the results().

Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-03-02 Thread Xueming Shen

Hi Paul, it looks good to me.

Thanks,
-Sherman

On 03/02/2015 01:49 AM, Paul Sandoz wrote:

On Feb 28, 2015, at 4:40 AM, Xueming Shenxueming.s...@oracle.com  wrote:

Updated to a static private class for the toMatchResult(). Added a private 
field MatchResult for the anonymous MatchResult
wrapper.

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html


Many thanks, i took most of that code and updated:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

- additional documentation for replacer function parameter is moved to the main 
body

- there is no need for an internal MatchResult instance wrapper. It does not 
protect against state modification, does not protect against escape, creates 
another object for all matcher instances, and results in more wrapping. The 
only way we can avoid the first 2 is by using an immutable match result, which 
has it's own cost that unfortunately we cannot avoid in the results().

Paul.




Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Xueming Shen

On 2/27/15 3:19 PM, Stuart Marks wrote:

On 2/27/15 12:40 PM, Xueming Shen wrote:

On 02/27/2015 11:21 AM, Xueming Shen wrote:

On 02/27/2015 10:55 AM, Paul Sandoz wrote:


What about a light wright immutable MatchResult? is that possible?


Should be possible. I can give it try.


too repetitive?

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html 



Not too bad. I know I was surprised to see that Matcher itself is the 
only MatchResult implementation, and that there wasn't a lightweight 
immutable MatchResult separate from Matcher. So I'm glad to see this 
added.


(One thing to think about, not part of this particular change, is 
whether named matching groups could be added to or extend MatchResult. 
Doing that would change the implementation of the lightweight 
MatchResult. See JDK-8065554.)


A few comments on the implementation.

At line 302, end(group) calls the groupCount() method instead of 
returning the captured groupCount local.


The MatchResult anonymous class unavoidably captures a reference to 
'this', which is the enclosing Matcher instance. I don't think it 
needs that, but it does enable methods like groupCount() on the 
enclosing class to be called inadvertently.


For these reasons the lightweight MatchResult might better be 
refactored to be a (named) static nested class (or even a top-level 
class). You'll have to write a constructor and declare fields instead 
of capturing locals though, but I think this makes it a bit more clear 
as to what's actually going on. For example, you could make all the 
fields final to make it clear that the object really is immutable.


s'marks


Updated to a static private class for the toMatchResult(). Added a 
private field MatchResult for the anonymous MatchResult

wrapper.

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

-Sherman


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Paul Sandoz
Hi,

On Feb 13, 2015, at 8:26 PM, Stuart Marks stuart.ma...@oracle.com wrote:
 OK, this looks great. Thanks for the updates.
 
 There is also
 
in same order - in the same order
 
 in the doc for the results() method, as Brian pointed out internally.
 
 No need for another webrev.
 

Alas there is :-) I made some updates:

1) Improving the documentation based on feedback from Brian; and

2) added co-mod checking to the replace* methods.

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev

Paul.



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Xueming Shen

On 02/27/2015 10:34 AM, Paul Sandoz wrote:

On Feb 27, 2015, at 7:18 PM, Xueming Shenxueming.s...@oracle.com  wrote:


Hi Paul,

1133  * @param  replacer
1134  * The function to be applied to the match result of this 
matcher
1135  * that returns a replacement string.
1136  *
1137  *p   The function should not modify this matcher's state during
1138  * replacement.  This method will, on a best-effort basis, 
throw a
1139  * {@link java.util.ConcurrentModificationException} if such
1140  * modification is detected.
1141  *
1142  *p   The state of the match result is guaranteed to be constant
1143  * only for the duration of the function call and only if the
1144  * function does not modify this matcher's state.


1151  * @throws ConcurrentModificationException if it is detected, on a
1152  * best-effort basis, that the replacer function modified this
1153  * matcher's state


Just wonder from API point of view, in theory the replacer should not be able 
to modify
this matcher's state via a MatchResult, it is the side-effect of our 
implementation detail
that happens to return this matcher as a MatchResult. For example, it should 
be possible
to simply return a wrapper MatchResult on top of this matcher to only expose 
the read-only
MatchResult methods, so the replacer will never be possible to modify the 
matcher.


It's not really about casting a MatchResult to Matcher it's about the function 
capturing the Matcher instance and operating on it.



The replacer does not have an explicit reference to the matcher... and
from the implementation it is not really about the replacer to change the
state of the matcher, but the matcher's state gets changed during replacing?
it might be more clear/obvious to move those warning notes up to the method
doc as the matcher state should not be updated/changed during the replaceAll
is invoked...?

-sherman


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Paul Sandoz

On Feb 27, 2015, at 7:48 PM, Xueming Shen xueming.s...@oracle.com wrote:

 On 02/27/2015 10:34 AM, Paul Sandoz wrote:
 On Feb 27, 2015, at 7:18 PM, Xueming Shenxueming.s...@oracle.com  wrote:
 
 Hi Paul,
 
 1133  * @param  replacer
 1134  * The function to be applied to the match result of this 
 matcher
 1135  * that returns a replacement string.
 1136  *
 1137  *p   The function should not modify this matcher's state during
 1138  * replacement.  This method will, on a best-effort basis, 
 throw a
 1139  * {@link java.util.ConcurrentModificationException} if 
 such
 1140  * modification is detected.
 1141  *
 1142  *p   The state of the match result is guaranteed to be constant
 1143  * only for the duration of the function call and only if 
 the
 1144  * function does not modify this matcher's state.
 
 
 1151  * @throws ConcurrentModificationException if it is detected, on a
 1152  * best-effort basis, that the replacer function modified 
 this
 1153  * matcher's state
 
 
 Just wonder from API point of view, in theory the replacer should not be 
 able to modify
 this matcher's state via a MatchResult, it is the side-effect of our 
 implementation detail
 that happens to return this matcher as a MatchResult. For example, it 
 should be possible
 to simply return a wrapper MatchResult on top of this matcher to only 
 expose the read-only
 MatchResult methods, so the replacer will never be possible to modify the 
 matcher.
 
 It's not really about casting a MatchResult to Matcher it's about the 
 function capturing the Matcher instance and operating on it.
 
 
 The replacer does not have an explicit reference to the matcher...

Correct, just like a Consumer does not have an Iterable with Iterable.forEach.


 and
 from the implementation it is not really about the replacer to change the
 state of the matcher, but the matcher's state gets changed during replacing?

It's both, A function could do something silly, such as:

 Matcher m = ...
 m.replaceAll(mr - { 
m.find(); // bad
// mr state is now messed up
return mr.group();
 });


 List l = ...
 m.replaceAll(mr - { 
  l.add(mr); // bad, mr escapes
  return mr.group();
 });


 it might be more clear/obvious to move those warning notes up to the method
 doc as the matcher state should not be updated/changed during the replaceAll
 is invoked...?
 

Moving them up would be clearer, i will do that.

What about a light wright immutable MatchResult? is that possible?

Paul.



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Paul Sandoz

On Feb 27, 2015, at 7:18 PM, Xueming Shen xueming.s...@oracle.com wrote:

 Hi Paul,
 
 1133  * @param  replacer
 1134  * The function to be applied to the match result of this 
 matcher
 1135  * that returns a replacement string.
 1136  *
 1137  *p  The function should not modify this matcher's state during
 1138  * replacement.  This method will, on a best-effort basis, 
 throw a
 1139  * {@link java.util.ConcurrentModificationException} if such
 1140  * modification is detected.
 1141  *
 1142  *p  The state of the match result is guaranteed to be constant
 1143  * only for the duration of the function call and only if the
 1144  * function does not modify this matcher's state.
 
 
 1151  * @throws ConcurrentModificationException if it is detected, on a
 1152  * best-effort basis, that the replacer function modified 
 this
 1153  * matcher's state
 
 
 Just wonder from API point of view, in theory the replacer should not be able 
 to modify
 this matcher's state via a MatchResult, it is the side-effect of our 
 implementation detail
 that happens to return this matcher as a MatchResult. For example, it 
 should be possible
 to simply return a wrapper MatchResult on top of this matcher to only expose 
 the read-only
 MatchResult methods, so the replacer will never be possible to modify the 
 matcher.
 

It's not really about casting a MatchResult to Matcher it's about the function 
capturing the Matcher instance and operating on it.


 The modCount might still be good to have to catch the possible concurrent 
 modification
 of the matcher while iterating, though the existing implementation does not 
 do that for
 the original methods
 
 The results currently returns heavy clone of this matcher, it might be 
 ideal
 to have a light weight MatchResult implementation with less fields and 
 especially
 to a single text.toString(), this might be helpful when the text is huge 
 and it
 is not a String object.

Can you suggest, via code, such a lighter weight immutable implementation?

If there was a lighter weight alternative we could apply that to replace* 
functions as well.

Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Xueming Shen

Hi Paul,

1133  * @param  replacer
1134  * The function to be applied to the match result of this 
matcher
1135  * that returns a replacement string.
1136  *
1137  *p  The function should not modify this matcher's state during
1138  * replacement.  This method will, on a best-effort basis, 
throw a
1139  * {@link java.util.ConcurrentModificationException} if such
1140  * modification is detected.
1141  *
1142  *p  The state of the match result is guaranteed to be constant
1143  * only for the duration of the function call and only if the
1144  * function does not modify this matcher's state.


1151  * @throws ConcurrentModificationException if it is detected, on a
1152  * best-effort basis, that the replacer function modified this
1153  * matcher's state


Just wonder from API point of view, in theory the replacer should not be able 
to modify
this matcher's state via a MatchResult, it is the side-effect of our 
implementation detail
that happens to return this matcher as a MatchResult. For example, it should 
be possible
to simply return a wrapper MatchResult on top of this matcher to only expose 
the read-only
MatchResult methods, so the replacer will never be possible to modify the 
matcher.

The modCount might still be good to have to catch the possible concurrent 
modification
of the matcher while iterating, though the existing implementation does not 
do that for
the original methods

The results currently returns heavy clone of this matcher, it might be ideal
to have a light weight MatchResult implementation with less fields and 
especially
to a single text.toString(), this might be helpful when the text is huge 
and it
is not a String object.

-Sherman

On 02/27/2015 08:32 AM, Paul Sandoz wrote:

Hi,

On Feb 13, 2015, at 8:26 PM, Stuart Marksstuart.ma...@oracle.com  wrote:

OK, this looks great. Thanks for the updates.

There is also

in same order -  in the same order

in the doc for the results() method, as Brian pointed out internally.

No need for another webrev.


Alas there is :-) I made some updates:

1) Improving the documentation based on feedback from Brian; and

2) added co-mod checking to the replace* methods.

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev

Paul.





Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Xueming Shen

On 02/27/2015 11:21 AM, Xueming Shen wrote:

On 02/27/2015 10:55 AM, Paul Sandoz wrote:


What about a light wright immutable MatchResult? is that possible?



Should be possible. I can give it try.



too repetitive?

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Xueming Shen

On 02/27/2015 10:55 AM, Paul Sandoz wrote:


What about a light wright immutable MatchResult? is that possible?



Should be possible. I can give it try.



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-27 Thread Stuart Marks

On 2/27/15 12:40 PM, Xueming Shen wrote:

On 02/27/2015 11:21 AM, Xueming Shen wrote:

On 02/27/2015 10:55 AM, Paul Sandoz wrote:


What about a light wright immutable MatchResult? is that possible?


Should be possible. I can give it try.


too repetitive?

http://cr.openjdk.java.net/~sherman/regex.stream/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html


Not too bad. I know I was surprised to see that Matcher itself is the only 
MatchResult implementation, and that there wasn't a lightweight immutable 
MatchResult separate from Matcher. So I'm glad to see this added.


(One thing to think about, not part of this particular change, is whether named 
matching groups could be added to or extend MatchResult. Doing that would change 
the implementation of the lightweight MatchResult. See JDK-8065554.)


A few comments on the implementation.

At line 302, end(group) calls the groupCount() method instead of returning the 
captured groupCount local.


The MatchResult anonymous class unavoidably captures a reference to 'this', 
which is the enclosing Matcher instance. I don't think it needs that, but it 
does enable methods like groupCount() on the enclosing class to be called 
inadvertently.


For these reasons the lightweight MatchResult might better be refactored to be a 
(named) static nested class (or even a top-level class). You'll have to write a 
constructor and declare fields instead of capturing locals though, but I think 
this makes it a bit more clear as to what's actually going on. For example, you 
could make all the fields final to make it clear that the object really is 
immutable.


s'marks


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-13 Thread Paul Sandoz

On Feb 13, 2015, at 1:20 AM, Stuart Marks stuart.ma...@oracle.com wrote:

 
 
 On 2/12/15 3:15 AM, Paul Sandoz wrote:
   
 http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/
 
 OK, overall looks pretty good. Two minor comments on Matcher.java:
 
 1202   if (expectedCount = 0  expectedCount != matchOrResetCount)
 1203   return true;
 
 This is a concurrent modification check, so other cases that test this 
 condition will throw CME. Should this also throw CME or should it indeed 
 return true?
 

The latter, so a CME is only thrown on data returning methods. 

I have added a comment:

// Defer throwing ConcurrentModificationException to when
// next is called.  The is consistent with other fail-fast
// implementations.
if (expectedCount = 0  expectedCount != matchOrResetCount)
return true;

Given that the iterator is never exposed directly it most likely does not 
matter, but i wanted to be consistent.


 1224 // Perform a first find if required
 1225 if (s  1  !find())
 1226 return;
 
 If I understand this correctly, the state field can have values -1, 0, or 1; 
 and 's' is a local variable that's initialized from the state field.
 
 I was confused by this code because it ends up calling find() when s == 0, 
 which means not found ... so why is find() being called again in this case?
 
 However, the s == 0 case is dispensed with earlier, so it actually cannot 
 occur here. I think it would be clearer, and have the same behavior, if the 
 condition were changed to
 
if (s  0  !find())
 

Ok.

Webrev updated in place:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

Thanks,
Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-13 Thread Stuart Marks

OK, this looks great. Thanks for the updates.

There is also

in same order - in the same order

in the doc for the results() method, as Brian pointed out internally.

No need for another webrev.

s'marks

On 2/13/15 1:17 AM, Paul Sandoz wrote:


On Feb 13, 2015, at 1:20 AM, Stuart Marks stuart.ma...@oracle.com wrote:




On 2/12/15 3:15 AM, Paul Sandoz wrote:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/


OK, overall looks pretty good. Two minor comments on Matcher.java:

1202   if (expectedCount = 0  expectedCount != matchOrResetCount)
1203   return true;

This is a concurrent modification check, so other cases that test this 
condition will throw CME. Should this also throw CME or should it indeed return 
true?



The latter, so a CME is only thrown on data returning methods.

I have added a comment:

// Defer throwing ConcurrentModificationException to when
// next is called.  The is consistent with other fail-fast
// implementations.
if (expectedCount = 0  expectedCount != matchOrResetCount)
 return true;

Given that the iterator is never exposed directly it most likely does not 
matter, but i wanted to be consistent.



1224 // Perform a first find if required
1225 if (s  1  !find())
1226 return;

If I understand this correctly, the state field can have values -1, 0, or 1; 
and 's' is a local variable that's initialized from the state field.

I was confused by this code because it ends up calling find() when s == 0, which means 
not found ... so why is find() being called again in this case?

However, the s == 0 case is dispensed with earlier, so it actually cannot occur 
here. I think it would be clearer, and have the same behavior, if the condition 
were changed to

if (s  0  !find())



Ok.

Webrev updated in place:

http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/src/java.base/share/classes/java/util/regex/Matcher.java.sdiff.html

Thanks,
Paul.



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-12 Thread Paul Sandoz

On Feb 12, 2015, at 9:43 AM, Peter Levart peter.lev...@gmail.com wrote:

 
 On 02/11/2015 08:23 PM, Stuart Marks wrote:
 1.1) Change the specification of Matcher.results to reset the stream before 
 matching, making it consistent with the replace* methods. 
 
 I'm not sure about this. The current replaceAll/replaceFirst methods reset 
 the matcher before doing any matching, so the lambda-based overloads should 
 do the same. 
 
 However, the model for 
 
 StreamMatchResult results() 
 
 seems to me to be a stream of matches that would be returned by successive 
 calls to find(). (Indeed, that's how it's implemented.) The no-arg find() 
 call doesn't reset the Matcher, and it respects the existing region of the 
 Matcher. I think results() should do the same.
 
 Hi,
 
 What about two methods?
 
 StreamMatchResult remainingResults(); // doesn't reset the Matcher
 StreamMatchResult [all]results(); // resets Matcher and calls 
 remainingResults()
 

I would prefer to stick with just one, given that it is very easy to reset the 
matcher, and the most common case is to start with a new matcher.

Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-12 Thread Paul Sandoz
On Feb 12, 2015, at 3:18 AM, Stuart Marks stuart.ma...@oracle.com wrote:
 On 2/11/15 12:45 PM, Paul Sandoz wrote:
 On Feb 11, 2015, at 8:23 PM, Stuart Marks stuart.ma...@oracle.com wrote:
 That matches my original thinking on the matter and is reflected in the 
 patch. It's very simple to support. If the method was named findAll then 
 it would be misleading and imply a reset was needed.
 
 OK, good, I see that, so it doesn't need to be changed.
 
 Ok, the first webrev i sent is already implemented as agreed so lets review 
 that code.
 
 I looked at:
 
 http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/
 
 I think this was before you added Pattern.replaceFirst(), so that should be 
 retrofitted here.
 

Thanks, added in place:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/


 Regarding the disparity between MatchResult and Matcher. I think that 
 would require a new sub-interface of MatchResult from which Matcher 
 extends from and returns. If we think it important we should do that for 9 
 otherwise we will be stuck for the stream-based methods.
 
 Why do you think we need a new sub-interface? Wouldn't it be sufficient to 
 add default methods?
 
 What would the defaults do? Throwing an exception seems a poor solution to 
 me.
 
 My guess it will be possible to evolve Matcher in binary and source 
 compatible way to use the sub-type.
 
 OK, I filed an RFE to cover this, then immediately closed it as a duplicate 
 :-) because I failed to search for the existing RFE that covers this:
 
https://bugs.openjdk.java.net/browse/JDK-8065554
 
 Either we add some default methods that throw exceptions (gross) or we add a 
 sub-interface (also gross). It might be a matter of taste, or bike shed 
 painting.

Or a matter of how much the grossness is contained. In that respect the latter 
is more self-contained, the latter spreads out to all independent consumers of 
MatchResult.

Paul.


Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-12 Thread Stuart Marks



On 2/12/15 3:15 AM, Paul Sandoz wrote:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/


OK, overall looks pretty good. Two minor comments on Matcher.java:

1202   if (expectedCount = 0  expectedCount != matchOrResetCount)
1203   return true;

This is a concurrent modification check, so other cases that test this condition 
will throw CME. Should this also throw CME or should it indeed return true?


1224 // Perform a first find if required
1225 if (s  1  !find())
1226 return;

If I understand this correctly, the state field can have values -1, 0, or 1; and 
's' is a local variable that's initialized from the state field.


I was confused by this code because it ends up calling find() when s == 0, which 
means not found ... so why is find() being called again in this case?


However, the s == 0 case is dispensed with earlier, so it actually cannot occur 
here. I think it would be clearer, and have the same behavior, if the condition 
were changed to


if (s  0  !find())

s'marks








Regarding the disparity between MatchResult and Matcher. I think that would 
require a new sub-interface of MatchResult from which Matcher extends from and 
returns. If we think it important we should do that for 9 otherwise we will be 
stuck for the stream-based methods.


Why do you think we need a new sub-interface? Wouldn't it be sufficient to add 
default methods?


What would the defaults do? Throwing an exception seems a poor solution to me.

My guess it will be possible to evolve Matcher in binary and source compatible 
way to use the sub-type.


OK, I filed an RFE to cover this, then immediately closed it as a duplicate :-) 
because I failed to search for the existing RFE that covers this:

https://bugs.openjdk.java.net/browse/JDK-8065554

Either we add some default methods that throw exceptions (gross) or we add a 
sub-interface (also gross). It might be a matter of taste, or bike shed 
painting.


Or a matter of how much the grossness is contained. In that respect the latter 
is more self-contained, the latter spreads out to all independent consumers of 
MatchResult.

Paul.



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-12 Thread Peter Levart


On 02/11/2015 08:23 PM, Stuart Marks wrote:
1.1) Change the specification of Matcher.results to reset the stream 
before matching, making it consistent with the replace* methods.


I'm not sure about this. The current replaceAll/replaceFirst methods 
reset the matcher before doing any matching, so the lambda-based 
overloads should do the same.


However, the model for

StreamMatchResult results()

seems to me to be a stream of matches that would be returned by 
successive calls to find(). (Indeed, that's how it's implemented.) The 
no-arg find() call doesn't reset the Matcher, and it respects the 
existing region of the Matcher. I think results() should do the same. 


Hi,

What about two methods?

StreamMatchResult remainingResults(); // doesn't reset the Matcher
StreamMatchResult [all]results(); // resets Matcher and calls 
remainingResults()



Peter



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-11 Thread Stuart Marks

On 2/11/15 2:25 AM, Paul Sandoz wrote:

Hi Stuart,

Thanks for the detailed review.

Here is a possible way forward:

1) Add the methods to Matcher, as proposed in the initial webrev.


Yes, I think this is the way to go, and it seems that Sherman concurs.


1.1) Change the specification of Matcher.results to reset the stream before 
matching, making it consistent with the replace* methods.


I'm not sure about this. The current replaceAll/replaceFirst methods reset the 
matcher before doing any matching, so the lambda-based overloads should do the same.


However, the model for

StreamMatchResult results()

seems to me to be a stream of matches that would be returned by successive calls 
to find(). (Indeed, that's how it's implemented.) The no-arg find() call doesn't 
reset the Matcher, and it respects the existing region of the Matcher. I think 
results() should do the same.


Now there's also a find(int start) overload that does reset the matcher 
(discarding the region) but starts at the given position. This suggests another 
overload,


StreamMatchResult results(int start)

which also resets the matcher. I don't think this is necessary, though, since I 
believe the equivalent effect can be achieved by setting the region before 
calling the no-arg results() -- as long as it doesn't reset the matcher.


No matter what, I think results() will have to check for concurrent 
modification. It seems to be in the nature of this API, sigh.


By the way, I think Matcher.results() is a fine name. The overload of 
Pattern.matches() is what bothers me.



2) Add convenience methods for all replace*() and matches() on Pattern that 
defer to those on Matcher.


I'm not sure convenience methods are necessary. After all, there are the 
existing replaceFirst/replaceAll methods on Matcher that aren't on Pattern.


In addition, if you don't need to hang onto the Pattern or Matcher instances, my 
example from earlier can be compressed to:


String result = Pattern.compile(a*b)
.matcher(input).replaceAll(mr - mr.group().toUpperCase());

which ain't too bad.



We can do that in two stages, focusing on 1) in this review.


Yes.


I was not too concerned about the static method and the stream returning method having the same 
name as the context is quite different. For stream returning methods there is a de-facto pattern of 
using a plural of the stream element kind, so i prefer that to findAll. What about the name 
Pattern.matchResults? which chains well with Pattern.match(...).results().

--

Regarding the disparity between MatchResult and Matcher. I think that would 
require a new sub-interface of MatchResult from which Matcher extends from and 
returns. If we think it important we should do that for 9 otherwise we will be 
stuck for the stream-based methods.


Why do you think we need a new sub-interface? Wouldn't it be sufficient to add 
default methods?


There is of course the possibility of existing 3rd party classes that implement 
MatchResult having conflicting methods. But I don't think MatchResult is 
implemented quite as often as the collection interfaces, where we've been 
reticent to add default methods because of so many implementations.


I did a quick web search and I did find a few implementations/extensions of 
MatchResult, but there were no obvious conflicts. This isn't definitive, of course.


The default implementations of group(String), start(String), and end(String) 
could throw IllegalArgumentException, since that's what the ones on Matcher do 
if there's no such named capture group.


I admit this is a little weird, since it's only safe to use these new methods if 
you know where the MatchResult instance came from. So logically perhaps it's a 
different type. My hunch, though, is that MatchResults aren't passed around, 
they're created from Patterns/Matchers and processed within the same code, so in 
practice this won't be a problem.


s'marks



Paul.

On Feb 11, 2015, at 2:02 AM, Stuart Marks stuart.ma...@oracle.com wrote:


Hi Paul,

I spent some time looking at this API. Overall it seems to me that things work 
a bit more nicely when these methods are added to Pattern instead of Matcher. 
Unfortunately there are some odd things with the existing API that make this 
tradeoff not so obvious.

First, here's what a simple replacement operation looks like when replaceAll() 
is added to Matcher:

String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
Matcher m = p.matcher(input);
String result = m.replaceAll(mr - mr.group().toUpperCase());

But if replaceAll() is on Pattern, we can skip a step:

String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
String result = p.replaceAll(input, mr - mr.group().toUpperCase());

Getting stream of match results is similar. So yes, I agree that it simplifies 
things to have these be on Pattern instead of Matcher.

An advantage of having these on Pattern is 

Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-11 Thread Xueming Shen

Hi

It might be more consistent with the existing design to add those 
methods into Matcher. I agree the

better name for the stream return method is findAll.

-sherman


On 2/11/15 2:25 AM, Paul Sandoz wrote:

Hi Stuart,

Thanks for the detailed review.

Here is a possible way forward:

1) Add the methods to Matcher, as proposed in the initial webrev.
1.1) Change the specification of Matcher.results to reset the stream before 
matching, making it consistent with the replace* methods.

2) Add convenience methods for all replace*() and matches() on Pattern that 
defer to those on Matcher.

We can do that in two stages, focusing on 1) in this review.

I was not too concerned about the static method and the stream returning method having the same 
name as the context is quite different. For stream returning methods there is a de-facto pattern of 
using a plural of the stream element kind, so i prefer that to findAll. What about the name 
Pattern.matchResults? which chains well with Pattern.match(...).results().

--

Regarding the disparity between MatchResult and Matcher. I think that would 
require a new sub-interface of MatchResult from which Matcher extends from and 
returns. If we think it important we should do that for 9 otherwise we will be 
stuck for the stream-based methods.

Paul.

On Feb 11, 2015, at 2:02 AM, Stuart Marks stuart.ma...@oracle.com wrote:


Hi Paul,

I spent some time looking at this API. Overall it seems to me that things work 
a bit more nicely when these methods are added to Pattern instead of Matcher. 
Unfortunately there are some odd things with the existing API that make this 
tradeoff not so obvious.

First, here's what a simple replacement operation looks like when replaceAll() 
is added to Matcher:

String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
Matcher m = p.matcher(input);
String result = m.replaceAll(mr - mr.group().toUpperCase());

But if replaceAll() is on Pattern, we can skip a step:

String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
String result = p.replaceAll(input, mr - mr.group().toUpperCase());

Getting stream of match results is similar. So yes, I agree that it simplifies 
things to have these be on Pattern instead of Matcher.

An advantage of having these on Pattern is that the matcher that gets created 
is encapsulated, and its state isn't exposed to being mucked about by the 
application. Thus you can avoid the additional concurrent modification checks 
that you have to do if replaceAll et. al. are on Matcher.

Unfortunately, putting these on Pattern now creates some difficulties meshing 
with the existing API.

One issue is that Matcher already has replaceAll(String) and 
replaceFirst(String). It would be strange to have these here and to have 
replaceAll(replacer) and replaceFirst(replacer) on Pattern.

Another issue is that Matcher supports matching on region (subrange) of its 
input. For example, today you can do this:

pattern.matcher(input).region(start, end)

The region will constrain the matching for certain operations, such as find() 
(but not replaceAll or replaceFirst). If something like results() were added to 
Matcher, I'd expect that it would respect the Matcher's region, but if 
results() (or matches() as you called it) were on Pattern, the region 
constraint would be lacking.

Also note that Pattern already has this:

static boolean matches(regex, input)

so I don't think an overload of matches() that returns a Stream would be a good 
idea. (Maybe findAll()?)

Another issue, not directly related to where the new lambda/streams methods get 
added, is that MatchResult allows references only numbered capturing groups. 
Matcher, which implements MatchResult, also supports named capturing groups, 
with the new overloaded methods group(String), start(String), and end(String). 
These were added in Java 7. Logically these also belong on MatchResult, but 
they probably weren't added because of the compatibility issue of adding 
methods to interfaces. Maybe we should add these as default methods to 
MatchResult.

(But what would the supported implementation be? Just throw 
UnsupportedOperationException?)

--

I'm not entirely sure where this leaves things. It certainly seems more 
convenient to have the new methods on Pattern. But given the way the existing 
API is set up, it seems like it's a better fit to add them to Matcher.

s'marks



On 2/9/15 6:18 AM, Paul Sandoz wrote:

Here is an alternative that pushes the methods on to Pattern instead:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/on-Pattern/webrev/

(Whe webrev reports some files as empty, please ingore those, i have this 
webrev stacked on the previous one.)

I have also included replaceFirst.

This simplifies things for streaming on the match results and also for 
replacing.

Note that the existing replace* methods on Matcher reset the matcher 

Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-11 Thread Paul Sandoz

On Feb 11, 2015, at 8:23 PM, Stuart Marks stuart.ma...@oracle.com wrote:

 On 2/11/15 2:25 AM, Paul Sandoz wrote:
 Hi Stuart,
 
 Thanks for the detailed review.
 
 Here is a possible way forward:
 
 1) Add the methods to Matcher, as proposed in the initial webrev.
 
 Yes, I think this is the way to go, and it seems that Sherman concurs.
 
 1.1) Change the specification of Matcher.results to reset the stream before 
 matching, making it consistent with the replace* methods.
 
 I'm not sure about this. The current replaceAll/replaceFirst methods reset 
 the matcher before doing any matching, so the lambda-based overloads should 
 do the same.

Yes, we are in agreement on that.


 
 However, the model for
 
StreamMatchResult results()
 
 seems to me to be a stream of matches that would be returned by successive 
 calls to find(). (Indeed, that's how it's implemented.) The no-arg find() 
 call doesn't reset the Matcher,

It cannot, otherwise it would not work for repeated calls to obtain subsequent 
matches. 


 and it respects the existing region of the Matcher. I think results() should 
 do the same.
 

That matches my original thinking on the matter and is reflected in the 
patch. It's very simple to support. If the method was named findAll then it 
would be misleading and imply a reset was needed.


 Now there's also a find(int start) overload that does reset the matcher 
 (discarding the region) but starts at the given position. This suggests 
 another overload,
 
StreamMatchResult results(int start)
 
 which also resets the matcher. I don't think this is necessary, though, since 
 I believe the equivalent effect can be achieved by setting the region before 
 calling the no-arg results() -- as long as it doesn't reset the matcher.
 

Yes.


 No matter what, I think results() will have to check for concurrent 
 modification.

Yes, as in the patch.


 It seems to be in the nature of this API, sigh.
 
 By the way, I think Matcher.results() is a fine name. The overload of 
 Pattern.matches() is what bothers me.

Ok.


 
 2) Add convenience methods for all replace*() and matches() on Pattern that 
 defer to those on Matcher.
 
 I'm not sure convenience methods are necessary. After all, there are the 
 existing replaceFirst/replaceAll methods on Matcher that aren't on Pattern.
 
 In addition, if you don't need to hang onto the Pattern or Matcher instances, 
 my example from earlier can be compressed to:
 
String result = Pattern.compile(a*b)
.matcher(input).replaceAll(mr - mr.group().toUpperCase());
 
 which ain't too bad.
 

Agreed.


 
 We can do that in two stages, focusing on 1) in this review.
 
 Yes.
 

Ok, the first webrev i sent is already implemented as agreed so lets review 
that code.


 I was not too concerned about the static method and the stream returning 
 method having the same name as the context is quite different. For stream 
 returning methods there is a de-facto pattern of using a plural of the 
 stream element kind, so i prefer that to findAll. What about the name 
 Pattern.matchResults? which chains well with 
 Pattern.match(...).results().
 
 --
 
 Regarding the disparity between MatchResult and Matcher. I think that would 
 require a new sub-interface of MatchResult from which Matcher extends from 
 and returns. If we think it important we should do that for 9 otherwise we 
 will be stuck for the stream-based methods.
 
 Why do you think we need a new sub-interface? Wouldn't it be sufficient to 
 add default methods?

What would the defaults do? Throwing an exception seems a poor solution to me.

My guess it will be possible to evolve Matcher in binary and source compatible 
way to use the sub-type.

Paul.

 
 There is of course the possibility of existing 3rd party classes that 
 implement MatchResult having conflicting methods. But I don't think 
 MatchResult is implemented quite as often as the collection interfaces, where 
 we've been reticent to add default methods because of so many implementations.
 
 I did a quick web search and I did find a few implementations/extensions of 
 MatchResult, but there were no obvious conflicts. This isn't definitive, of 
 course.
 
 The default implementations of group(String), start(String), and end(String) 
 could throw IllegalArgumentException, since that's what the ones on Matcher 
 do if there's no such named capture group.
 
 I admit this is a little weird, since it's only safe to use these new methods 
 if you know where the MatchResult instance came from. So logically perhaps 
 it's a different type. My hunch, though, is that MatchResults aren't passed 
 around, they're created from Patterns/Matchers and processed within the same 
 code, so in practice this won't be a problem.
 
 s'marks



Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-11 Thread Stuart Marks

On 2/11/15 12:45 PM, Paul Sandoz wrote:

On Feb 11, 2015, at 8:23 PM, Stuart Marks stuart.ma...@oracle.com wrote:
That matches my original thinking on the matter and is reflected in the patch. It's 
very simple to support. If the method was named findAll then it would be misleading and 
imply a reset was needed.


OK, good, I see that, so it doesn't need to be changed.


Ok, the first webrev i sent is already implemented as agreed so lets review 
that code.


I looked at:


http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/

I think this was before you added Pattern.replaceFirst(), so that should be 
retrofitted here.



Regarding the disparity between MatchResult and Matcher. I think that would 
require a new sub-interface of MatchResult from which Matcher extends from and 
returns. If we think it important we should do that for 9 otherwise we will be 
stuck for the stream-based methods.


Why do you think we need a new sub-interface? Wouldn't it be sufficient to add 
default methods?


What would the defaults do? Throwing an exception seems a poor solution to me.

My guess it will be possible to evolve Matcher in binary and source compatible 
way to use the sub-type.


OK, I filed an RFE to cover this, then immediately closed it as a duplicate :-) 
because I failed to search for the existing RFE that covers this:


https://bugs.openjdk.java.net/browse/JDK-8065554

Either we add some default methods that throw exceptions (gross) or we add a 
sub-interface (also gross). It might be a matter of taste, or bike shed 
painting. Either way, I think this needs to be done.


s'marks



Paul.



There is of course the possibility of existing 3rd party classes that implement 
MatchResult having conflicting methods. But I don't think MatchResult is 
implemented quite as often as the collection interfaces, where we've been 
reticent to add default methods because of so many implementations.

I did a quick web search and I did find a few implementations/extensions of 
MatchResult, but there were no obvious conflicts. This isn't definitive, of 
course.

The default implementations of group(String), start(String), and end(String) 
could throw IllegalArgumentException, since that's what the ones on Matcher do 
if there's no such named capture group.

I admit this is a little weird, since it's only safe to use these new methods 
if you know where the MatchResult instance came from. So logically perhaps it's 
a different type. My hunch, though, is that MatchResults aren't passed around, 
they're created from Patterns/Matchers and processed within the same code, so 
in practice this won't be a problem.

s'marks




Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-11 Thread Paul Sandoz
Hi Stuart,

Thanks for the detailed review.

Here is a possible way forward:

1) Add the methods to Matcher, as proposed in the initial webrev.
1.1) Change the specification of Matcher.results to reset the stream before 
matching, making it consistent with the replace* methods. 

2) Add convenience methods for all replace*() and matches() on Pattern that 
defer to those on Matcher.

We can do that in two stages, focusing on 1) in this review.

I was not too concerned about the static method and the stream returning method 
having the same name as the context is quite different. For stream returning 
methods there is a de-facto pattern of using a plural of the stream element 
kind, so i prefer that to findAll. What about the name Pattern.matchResults? 
which chains well with Pattern.match(...).results().

--

Regarding the disparity between MatchResult and Matcher. I think that would 
require a new sub-interface of MatchResult from which Matcher extends from and 
returns. If we think it important we should do that for 9 otherwise we will be 
stuck for the stream-based methods.

Paul.

On Feb 11, 2015, at 2:02 AM, Stuart Marks stuart.ma...@oracle.com wrote:

 Hi Paul,
 
 I spent some time looking at this API. Overall it seems to me that things 
 work a bit more nicely when these methods are added to Pattern instead of 
 Matcher. Unfortunately there are some odd things with the existing API that 
 make this tradeoff not so obvious.
 
 First, here's what a simple replacement operation looks like when 
 replaceAll() is added to Matcher:
 
String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
Matcher m = p.matcher(input);
String result = m.replaceAll(mr - mr.group().toUpperCase());
 
 But if replaceAll() is on Pattern, we can skip a step:
 
String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
String result = p.replaceAll(input, mr - mr.group().toUpperCase());
 
 Getting stream of match results is similar. So yes, I agree that it 
 simplifies things to have these be on Pattern instead of Matcher.
 
 An advantage of having these on Pattern is that the matcher that gets created 
 is encapsulated, and its state isn't exposed to being mucked about by the 
 application. Thus you can avoid the additional concurrent modification checks 
 that you have to do if replaceAll et. al. are on Matcher.
 
 Unfortunately, putting these on Pattern now creates some difficulties meshing 
 with the existing API.
 
 One issue is that Matcher already has replaceAll(String) and 
 replaceFirst(String). It would be strange to have these here and to have 
 replaceAll(replacer) and replaceFirst(replacer) on Pattern.
 
 Another issue is that Matcher supports matching on region (subrange) of its 
 input. For example, today you can do this:
 
pattern.matcher(input).region(start, end)
 
 The region will constrain the matching for certain operations, such as find() 
 (but not replaceAll or replaceFirst). If something like results() were added 
 to Matcher, I'd expect that it would respect the Matcher's region, but if 
 results() (or matches() as you called it) were on Pattern, the region 
 constraint would be lacking.
 
 Also note that Pattern already has this:
 
static boolean matches(regex, input)
 
 so I don't think an overload of matches() that returns a Stream would be a 
 good idea. (Maybe findAll()?)
 
 Another issue, not directly related to where the new lambda/streams methods 
 get added, is that MatchResult allows references only numbered capturing 
 groups. Matcher, which implements MatchResult, also supports named capturing 
 groups, with the new overloaded methods group(String), start(String), and 
 end(String). These were added in Java 7. Logically these also belong on 
 MatchResult, but they probably weren't added because of the compatibility 
 issue of adding methods to interfaces. Maybe we should add these as default 
 methods to MatchResult.
 
 (But what would the supported implementation be? Just throw 
 UnsupportedOperationException?)
 
 --
 
 I'm not entirely sure where this leaves things. It certainly seems more 
 convenient to have the new methods on Pattern. But given the way the existing 
 API is set up, it seems like it's a better fit to add them to Matcher.
 
 s'marks
 
 
 
 On 2/9/15 6:18 AM, Paul Sandoz wrote:
 Here is an alternative that pushes the methods on to Pattern instead:
 
   
 http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/on-Pattern/webrev/
 
 (Whe webrev reports some files as empty, please ingore those, i have this 
 webrev stacked on the previous one.)
 
 I have also included replaceFirst.
 
 This simplifies things for streaming on the match results and also for 
 replacing.
 
 Note that the existing replace* methods on Matcher reset the matcher before 
 matching and indicate that the matcher should be reset afterwards for reuse. 
 Thus there is no loss in functionality moving 

Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-10 Thread Stuart Marks

Hi Paul,

I spent some time looking at this API. Overall it seems to me that things work a 
bit more nicely when these methods are added to Pattern instead of Matcher. 
Unfortunately there are some odd things with the existing API that make this 
tradeoff not so obvious.


First, here's what a simple replacement operation looks like when replaceAll() 
is added to Matcher:


String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
Matcher m = p.matcher(input);
String result = m.replaceAll(mr - mr.group().toUpperCase());

But if replaceAll() is on Pattern, we can skip a step:

String input = foobbfooabbbfoo;
Pattern p = Pattern.compile(a*b);
String result = p.replaceAll(input, mr - mr.group().toUpperCase());

Getting stream of match results is similar. So yes, I agree that it simplifies 
things to have these be on Pattern instead of Matcher.


An advantage of having these on Pattern is that the matcher that gets created is 
encapsulated, and its state isn't exposed to being mucked about by the 
application. Thus you can avoid the additional concurrent modification checks 
that you have to do if replaceAll et. al. are on Matcher.


Unfortunately, putting these on Pattern now creates some difficulties meshing 
with the existing API.


One issue is that Matcher already has replaceAll(String) and 
replaceFirst(String). It would be strange to have these here and to have 
replaceAll(replacer) and replaceFirst(replacer) on Pattern.


Another issue is that Matcher supports matching on region (subrange) of its 
input. For example, today you can do this:


pattern.matcher(input).region(start, end)

The region will constrain the matching for certain operations, such as find() 
(but not replaceAll or replaceFirst). If something like results() were added to 
Matcher, I'd expect that it would respect the Matcher's region, but if results() 
(or matches() as you called it) were on Pattern, the region constraint would be 
lacking.


Also note that Pattern already has this:

static boolean matches(regex, input)

so I don't think an overload of matches() that returns a Stream would be a good 
idea. (Maybe findAll()?)


Another issue, not directly related to where the new lambda/streams methods get 
added, is that MatchResult allows references only numbered capturing groups. 
Matcher, which implements MatchResult, also supports named capturing groups, 
with the new overloaded methods group(String), start(String), and end(String). 
These were added in Java 7. Logically these also belong on MatchResult, but they 
probably weren't added because of the compatibility issue of adding methods to 
interfaces. Maybe we should add these as default methods to MatchResult.


(But what would the supported implementation be? Just throw 
UnsupportedOperationException?)


--

I'm not entirely sure where this leaves things. It certainly seems more 
convenient to have the new methods on Pattern. But given the way the existing 
API is set up, it seems like it's a better fit to add them to Matcher.


s'marks



On 2/9/15 6:18 AM, Paul Sandoz wrote:

Here is an alternative that pushes the methods on to Pattern instead:

   
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/on-Pattern/webrev/

(Whe webrev reports some files as empty, please ingore those, i have this 
webrev stacked on the previous one.)

I have also included replaceFirst.

This simplifies things for streaming on the match results and also for 
replacing.

Note that the existing replace* methods on Matcher reset the matcher before 
matching and indicate that the matcher should be reset afterwards for reuse. 
Thus there is no loss in functionality moving such lambda accepting methods 
from Matcher to Pattern. It comes down to the performance of reusing a matcher, 
which does not seems so compelling to me.

Paul.

On Feb 5, 2015, at 11:59 AM, Paul Sandoz paul.san...@oracle.com wrote:


Hi.

Please review these stream/lambda enhancements on Matcher:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/

Two new methods are added to Matcher:

1) replaceAll(FunctionMatchResult, String ) that is more flexible than the 
existing replaceAll that accepts a single value.

2) StreamMatchResult results() that returns a stream of MatchResult for all 
matches.

The former does introduce a minor source incompatibility for a null argument, 
but then so did the new append methods accepting StringBuilder that were 
recently added (see JDK-8039124).

For the latter i opted to place the method on Matcher rather than Pattern as i 
think that is a better fit with current usages of Matcher and operating on a 
MatchResult. That marginally increases the complexity since co-modification 
checking is required.

I update the test PatternStreamTest to derive the expected result.


I suppose i could add another method replaceFirst(FunctionMatchResult, String 
) if anyone 

Re: RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-09 Thread Paul Sandoz
Here is an alternative that pushes the methods on to Pattern instead:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/on-Pattern/webrev/

(Whe webrev reports some files as empty, please ingore those, i have this 
webrev stacked on the previous one.)

I have also included replaceFirst. 

This simplifies things for streaming on the match results and also for 
replacing. 

Note that the existing replace* methods on Matcher reset the matcher before 
matching and indicate that the matcher should be reset afterwards for reuse. 
Thus there is no loss in functionality moving such lambda accepting methods 
from Matcher to Pattern. It comes down to the performance of reusing a matcher, 
which does not seems so compelling to me.

Paul. 

On Feb 5, 2015, at 11:59 AM, Paul Sandoz paul.san...@oracle.com wrote:

 Hi.
 
 Please review these stream/lambda enhancements on Matcher:
 
  
 http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/
 
 Two new methods are added to Matcher:
 
 1) replaceAll(FunctionMatchResult, String ) that is more flexible than the 
 existing replaceAll that accepts a single value.
 
 2) StreamMatchResult results() that returns a stream of MatchResult for all 
 matches.
 
 The former does introduce a minor source incompatibility for a null argument, 
 but then so did the new append methods accepting StringBuilder that were 
 recently added (see JDK-8039124).
 
 For the latter i opted to place the method on Matcher rather than Pattern as 
 i think that is a better fit with current usages of Matcher and operating on 
 a MatchResult. That marginally increases the complexity since co-modification 
 checking is required.
 
 I update the test PatternStreamTest to derive the expected result.
 
 
 I suppose i could add another method replaceFirst(FunctionMatchResult, 
 String ) if anyone feels strongly about that. Consistency-wise it seems the 
 right thing to do.
 
 Paul.



RFR 8071479: Stream and lamdification improvements to j.u.regex.Matcher

2015-02-05 Thread Paul Sandoz
Hi.

Please review these stream/lambda enhancements on Matcher:

  
http://cr.openjdk.java.net/~psandoz/jdk9/JDK-8071479--Matcher-stream-results/webrev/

Two new methods are added to Matcher:

1) replaceAll(FunctionMatchResult, String ) that is more flexible than the 
existing replaceAll that accepts a single value.

2) StreamMatchResult results() that returns a stream of MatchResult for all 
matches.

The former does introduce a minor source incompatibility for a null argument, 
but then so did the new append methods accepting StringBuilder that were 
recently added (see JDK-8039124).
 
For the latter i opted to place the method on Matcher rather than Pattern as i 
think that is a better fit with current usages of Matcher and operating on a 
MatchResult. That marginally increases the complexity since co-modification 
checking is required.

I update the test PatternStreamTest to derive the expected result.


I suppose i could add another method replaceFirst(FunctionMatchResult, String 
) if anyone feels strongly about that. Consistency-wise it seems the right 
thing to do.

Paul.