Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-07-18 Thread Isaac Levy
I still think it would be valuable to land a patch for replaceAll to
avoid temporary StringBuilders, is there anyone who wants to help me
land it?

Isaac


On Sun, Apr 29, 2018 at 10:24 PM, Isaac Levy  wrote:
> Your patch looks good to me, and will get the majority of performance
> benefits without any API issues.
>
> Isaac
>
>
> On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen  wrote:
>> for String based replaceAll/First() it might be worth doing something like
>>
>> http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
>>
>>
>> On 4/24/18, 10:47 AM, Isaac Levy wrote:
>>>
>>> Hi Sherman,
>>>
>>> Thanks for clarifying. Looks like exceptions are caused by invalid
>>> format string. I wouldn't expect most programs to be catching this and
>>> preserving their buffer, but dunno.
>>>
>>> How much does it affect perf? Well it depends on use case, a jmh of
>>> replaceAll with a length 200 string of digits and regex "(\w)" shows
>>> about 30% speedup.
>>>
>>> [info] Benchmark  Mode  Cnt   Score   Error  Units
>>> [info] origM  avgt   10  11.669 ± 0.211  us/op
>>> [info] newM   avgt   10   8.926 ± 0.095  us/op
>>>
>>> Isaac
>>>
>>>
>>> On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen
>>> wrote:

 Hi Isaac,

 I actually meant to say "we are not supposed to output the partial text
 into
 the output buffer in case of an exception". It has nothing to do with the
 changeset you cited. This has been the behavior since day one/JDK1.4,
 though it is not specified explicitly in the API doc. The newly added
 StringBuilder variant simply follows this behavior. If it's really
 desired
 it
 is kinda doable to save that StringBuilder without the "incompatible"
 behavior
 change but just wonder if it is really worth the effort.

 Thanks,
 Sherman


 On 4/24/18, 9:11 AM, Isaac Levy wrote:
>
> (moving this to a separate discussion)
>
>
> --- a/src/java.base/share/classes/java/util/regex/Matcher.java
> +++ b/src/java.base/share/classes/java/util/regex/Matcher.java
> @@ -993,13 +993,11 @@
>   public Matcher appendReplacement(StringBuilder sb, String
> replacement) {
>// If no match, return error
>if (first<   0)
>throw new IllegalStateException("No match available");
> -StringBuilder result = new StringBuilder();
> -appendExpandedReplacement(replacement, result);
>// Append the intervening text
>sb.append(text, lastAppendPosition, first);
>// Append the match substitution
> +appendExpandedReplacement(replacement, sb);
> -sb.append(result);
>
>
>
> On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen
> wrote:
>>
>>
>> I would assume in case of an exception thrown from
>> appendExpandedReplacement() we don't
>> want "text" to be pushed into the "sb".
>>
>> -sherman
>
>
> Perhaps. Though the behavior under exception is undefined and this
> function is probably primarily used though the replaceAll API, which
> wouldn’t return the intermediate sb under the case of exception.
>
> My reading of the blame was the temp StringBuilder was an artifact of
> the api previously using StringBuffer externally.  See
> http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3


>>


Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-29 Thread Isaac Levy
Your patch looks good to me, and will get the majority of performance
benefits without any API issues.

Isaac


On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen  wrote:
> for String based replaceAll/First() it might be worth doing something like
>
> http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
>
>
> On 4/24/18, 10:47 AM, Isaac Levy wrote:
>>
>> Hi Sherman,
>>
>> Thanks for clarifying. Looks like exceptions are caused by invalid
>> format string. I wouldn't expect most programs to be catching this and
>> preserving their buffer, but dunno.
>>
>> How much does it affect perf? Well it depends on use case, a jmh of
>> replaceAll with a length 200 string of digits and regex "(\w)" shows
>> about 30% speedup.
>>
>> [info] Benchmark  Mode  Cnt   Score   Error  Units
>> [info] origM  avgt   10  11.669 ± 0.211  us/op
>> [info] newM   avgt   10   8.926 ± 0.095  us/op
>>
>> Isaac
>>
>>
>> On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen
>> wrote:
>>>
>>> Hi Isaac,
>>>
>>> I actually meant to say "we are not supposed to output the partial text
>>> into
>>> the output buffer in case of an exception". It has nothing to do with the
>>> changeset you cited. This has been the behavior since day one/JDK1.4,
>>> though it is not specified explicitly in the API doc. The newly added
>>> StringBuilder variant simply follows this behavior. If it's really
>>> desired
>>> it
>>> is kinda doable to save that StringBuilder without the "incompatible"
>>> behavior
>>> change but just wonder if it is really worth the effort.
>>>
>>> Thanks,
>>> Sherman
>>>
>>>
>>> On 4/24/18, 9:11 AM, Isaac Levy wrote:

 (moving this to a separate discussion)


 --- a/src/java.base/share/classes/java/util/regex/Matcher.java
 +++ b/src/java.base/share/classes/java/util/regex/Matcher.java
 @@ -993,13 +993,11 @@
   public Matcher appendReplacement(StringBuilder sb, String
 replacement) {
// If no match, return error
if (first<   0)
throw new IllegalStateException("No match available");
 -StringBuilder result = new StringBuilder();
 -appendExpandedReplacement(replacement, result);
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
 +appendExpandedReplacement(replacement, sb);
 -sb.append(result);



 On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen
 wrote:
>
>
> I would assume in case of an exception thrown from
> appendExpandedReplacement() we don't
> want "text" to be pushed into the "sb".
>
> -sherman


 Perhaps. Though the behavior under exception is undefined and this
 function is probably primarily used though the replaceAll API, which
 wouldn’t return the intermediate sb under the case of exception.

 My reading of the blame was the temp StringBuilder was an artifact of
 the api previously using StringBuffer externally.  See
 http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
>>>
>>>
>


Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-25 Thread Isaac Levy
yeah perhaps this is enough.  Though users of appendReplacement are
presumably after high performance loops or they'd be using a higher
level API like replaceAll.  I just can't imagine people using this API
hit the format code exception in normal usage, and it's not like java
is dumping junk into the StringBuilder when it throws the exception --
it pushed the correct replacement up to the error.

Maybe I can catch exceptions, roll back the builder using setLength,
and rethrow.

Isaac


On Tue, Apr 24, 2018 at 9:38 PM, Xueming Shen  wrote:
> for String based replaceAll/First() it might be worth doing something like
>
> http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/
>
>
> On 4/24/18, 10:47 AM, Isaac Levy wrote:
>>
>> Hi Sherman,
>>
>> Thanks for clarifying. Looks like exceptions are caused by invalid
>> format string. I wouldn't expect most programs to be catching this and
>> preserving their buffer, but dunno.
>>
>> How much does it affect perf? Well it depends on use case, a jmh of
>> replaceAll with a length 200 string of digits and regex "(\w)" shows
>> about 30% speedup.
>>
>> [info] Benchmark  Mode  Cnt   Score   Error  Units
>> [info] origM  avgt   10  11.669 ± 0.211  us/op
>> [info] newM   avgt   10   8.926 ± 0.095  us/op
>>
>> Isaac
>>
>>
>> On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen
>> wrote:
>>>
>>> Hi Isaac,
>>>
>>> I actually meant to say "we are not supposed to output the partial text
>>> into
>>> the output buffer in case of an exception". It has nothing to do with the
>>> changeset you cited. This has been the behavior since day one/JDK1.4,
>>> though it is not specified explicitly in the API doc. The newly added
>>> StringBuilder variant simply follows this behavior. If it's really
>>> desired
>>> it
>>> is kinda doable to save that StringBuilder without the "incompatible"
>>> behavior
>>> change but just wonder if it is really worth the effort.
>>>
>>> Thanks,
>>> Sherman
>>>
>>>
>>> On 4/24/18, 9:11 AM, Isaac Levy wrote:

 (moving this to a separate discussion)


 --- a/src/java.base/share/classes/java/util/regex/Matcher.java
 +++ b/src/java.base/share/classes/java/util/regex/Matcher.java
 @@ -993,13 +993,11 @@
   public Matcher appendReplacement(StringBuilder sb, String
 replacement) {
// If no match, return error
if (first<   0)
throw new IllegalStateException("No match available");
 -StringBuilder result = new StringBuilder();
 -appendExpandedReplacement(replacement, result);
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
 +appendExpandedReplacement(replacement, sb);
 -sb.append(result);



 On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen
 wrote:
>
>
> I would assume in case of an exception thrown from
> appendExpandedReplacement() we don't
> want "text" to be pushed into the "sb".
>
> -sherman


 Perhaps. Though the behavior under exception is undefined and this
 function is probably primarily used though the replaceAll API, which
 wouldn’t return the intermediate sb under the case of exception.

 My reading of the blame was the temp StringBuilder was an artifact of
 the api previously using StringBuffer externally.  See
 http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
>>>
>>>
>


Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-24 Thread Xueming Shen

for String based replaceAll/First() it might be worth doing something like

http://cr.openjdk.java.net/~sherman/regex_removesb/webrev/

On 4/24/18, 10:47 AM, Isaac Levy wrote:

Hi Sherman,

Thanks for clarifying. Looks like exceptions are caused by invalid
format string. I wouldn't expect most programs to be catching this and
preserving their buffer, but dunno.

How much does it affect perf? Well it depends on use case, a jmh of
replaceAll with a length 200 string of digits and regex "(\w)" shows
about 30% speedup.

[info] Benchmark  Mode  Cnt   Score   Error  Units
[info] origM  avgt   10  11.669 ± 0.211  us/op
[info] newM   avgt   10   8.926 ± 0.095  us/op

Isaac


On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen  wrote:

Hi Isaac,

I actually meant to say "we are not supposed to output the partial text into
the output buffer in case of an exception". It has nothing to do with the
changeset you cited. This has been the behavior since day one/JDK1.4,
though it is not specified explicitly in the API doc. The newly added
StringBuilder variant simply follows this behavior. If it's really desired
it
is kinda doable to save that StringBuilder without the "incompatible"
behavior
change but just wonder if it is really worth the effort.

Thanks,
Sherman


On 4/24/18, 9:11 AM, Isaac Levy wrote:

(moving this to a separate discussion)


--- a/src/java.base/share/classes/java/util/regex/Matcher.java
+++ b/src/java.base/share/classes/java/util/regex/Matcher.java
@@ -993,13 +993,11 @@
  public Matcher appendReplacement(StringBuilder sb, String
replacement) {
   // If no match, return error
   if (first<   0)
   throw new IllegalStateException("No match available");
-StringBuilder result = new StringBuilder();
-appendExpandedReplacement(replacement, result);
   // Append the intervening text
   sb.append(text, lastAppendPosition, first);
   // Append the match substitution
+appendExpandedReplacement(replacement, sb);
-sb.append(result);



On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen
wrote:


I would assume in case of an exception thrown from
appendExpandedReplacement() we don't
want "text" to be pushed into the "sb".

-sherman


Perhaps. Though the behavior under exception is undefined and this
function is probably primarily used though the replaceAll API, which
wouldn’t return the intermediate sb under the case of exception.

My reading of the blame was the temp StringBuilder was an artifact of
the api previously using StringBuffer externally.  See
http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3






Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-24 Thread Isaac Levy
Hi Sherman,

Thanks for clarifying. Looks like exceptions are caused by invalid
format string. I wouldn't expect most programs to be catching this and
preserving their buffer, but dunno.

How much does it affect perf? Well it depends on use case, a jmh of
replaceAll with a length 200 string of digits and regex "(\w)" shows
about 30% speedup.

[info] Benchmark  Mode  Cnt   Score   Error  Units
[info] origM  avgt   10  11.669 ± 0.211  us/op
[info] newM   avgt   10   8.926 ± 0.095  us/op

Isaac


On Tue, Apr 24, 2018 at 12:53 PM, Xueming Shen  wrote:
> Hi Isaac,
>
> I actually meant to say "we are not supposed to output the partial text into
> the output buffer in case of an exception". It has nothing to do with the
> changeset you cited. This has been the behavior since day one/JDK1.4,
> though it is not specified explicitly in the API doc. The newly added
> StringBuilder variant simply follows this behavior. If it's really desired
> it
> is kinda doable to save that StringBuilder without the "incompatible"
> behavior
> change but just wonder if it is really worth the effort.
>
> Thanks,
> Sherman
>
>
> On 4/24/18, 9:11 AM, Isaac Levy wrote:
>>
>> (moving this to a separate discussion)
>>
>>
>> --- a/src/java.base/share/classes/java/util/regex/Matcher.java
>> +++ b/src/java.base/share/classes/java/util/regex/Matcher.java
>> @@ -993,13 +993,11 @@
>>  public Matcher appendReplacement(StringBuilder sb, String
>> replacement) {
>>   // If no match, return error
>>   if (first<  0)
>>   throw new IllegalStateException("No match available");
>> -StringBuilder result = new StringBuilder();
>> -appendExpandedReplacement(replacement, result);
>>   // Append the intervening text
>>   sb.append(text, lastAppendPosition, first);
>>   // Append the match substitution
>> +appendExpandedReplacement(replacement, sb);
>> -sb.append(result);
>>
>>
>>
>> On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen
>> wrote:
>>>
>>>
>>> I would assume in case of an exception thrown from
>>> appendExpandedReplacement() we don't
>>> want "text" to be pushed into the "sb".
>>>
>>> -sherman
>>
>>
>> Perhaps. Though the behavior under exception is undefined and this
>> function is probably primarily used though the replaceAll API, which
>> wouldn’t return the intermediate sb under the case of exception.
>>
>> My reading of the blame was the temp StringBuilder was an artifact of
>> the api previously using StringBuffer externally.  See
>> http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3
>
>


Re: [PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-24 Thread Xueming Shen

Hi Isaac,

I actually meant to say "we are not supposed to output the partial text into
the output buffer in case of an exception". It has nothing to do with the
changeset you cited. This has been the behavior since day one/JDK1.4,
though it is not specified explicitly in the API doc. The newly added
StringBuilder variant simply follows this behavior. If it's really 
desired it
is kinda doable to save that StringBuilder without the "incompatible" 
behavior

change but just wonder if it is really worth the effort.

Thanks,
Sherman

On 4/24/18, 9:11 AM, Isaac Levy wrote:

(moving this to a separate discussion)


--- a/src/java.base/share/classes/java/util/regex/Matcher.java
+++ b/src/java.base/share/classes/java/util/regex/Matcher.java
@@ -993,13 +993,11 @@
 public Matcher appendReplacement(StringBuilder sb, String replacement) {
  // If no match, return error
  if (first<  0)
  throw new IllegalStateException("No match available");
-StringBuilder result = new StringBuilder();
-appendExpandedReplacement(replacement, result);
  // Append the intervening text
  sb.append(text, lastAppendPosition, first);
  // Append the match substitution
+appendExpandedReplacement(replacement, sb);
-sb.append(result);



On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen  wrote:


I would assume in case of an exception thrown from appendExpandedReplacement() 
we don't
want "text" to be pushed into the "sb".

-sherman


Perhaps. Though the behavior under exception is undefined and this
function is probably primarily used though the replaceAll API, which
wouldn’t return the intermediate sb under the case of exception.

My reading of the blame was the temp StringBuilder was an artifact of
the api previously using StringBuffer externally.  See
http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3




[PATCH] regex matcher opt: remove redundant StringBuilder

2018-04-24 Thread Isaac Levy
(moving this to a separate discussion)


--- a/src/java.base/share/classes/java/util/regex/Matcher.java
+++ b/src/java.base/share/classes/java/util/regex/Matcher.java
@@ -993,13 +993,11 @@
public Matcher appendReplacement(StringBuilder sb, String replacement) {
 // If no match, return error
 if (first < 0)
 throw new IllegalStateException("No match available");
-StringBuilder result = new StringBuilder();
-appendExpandedReplacement(replacement, result);
 // Append the intervening text
 sb.append(text, lastAppendPosition, first);
 // Append the match substitution
+appendExpandedReplacement(replacement, sb);
-sb.append(result);



On Mon, Apr 23, 2018 at 5:05 PM Xueming Shen  wrote:
>
>
> I would assume in case of an exception thrown from 
> appendExpandedReplacement() we don't
> want "text" to be pushed into the "sb".
>
> -sherman


Perhaps. Though the behavior under exception is undefined and this
function is probably primarily used though the replaceAll API, which
wouldn’t return the intermediate sb under the case of exception.

My reading of the blame was the temp StringBuilder was an artifact of
the api previously using StringBuffer externally.  See
http://hg.openjdk.java.net/jdk9/dev/jdk/rev/763c564451b3