Re: RFR [15] 8242230: Whitespace typos, relaxed javadoc, formatting

2020-04-07 Thread Ivan Gerasimov

Looks good to me, thanks!

Kind regards,

Ivan

On 4/7/20 8:28 AM, Pavel Rappo wrote:

Hi Ivan,


On 7 Apr 2020, at 09:11, Ivan Gerasimov  wrote:

Hi Pavel!

A couple of comments.

1)
java/util/logging/Formatter.java

This has one extra open curly brace:

"{{@literal }"

The leftmost curly brace is not a part of the "literal" inline tag, but rather
a part of the message format string. Have a look at the method that the comment
belongs to. Note what that method is looking for in a message string.


2)
grep finds some more typos of the same kind that you've spotted.

a)  rgrep '^[ ]*\*'|grep ' ,'|less

This find number of potential typos.  For example, the javadoc for VarHandle 
[1] has 53 occurrances of space-before-comma.  A few more are found in 
j.l.Thread, j.io.DataOutput, j.l.String, etc.

b)  rgrep '^[ ]*\*'|grep '\w- '|less

This find the word 'network' broken with a hyphen at [2] and also in 
share/classes/sun/net/util/IPAddressUtil.java

(I only searched under src/java.base)

Thanks. I've included some of those: true positive, non-controversial,
and significant cases where I believe I sufficiently understood context.

For instance, I was not sure if I reliably grasped the applicability of the
"statically represented using varargs" phrase used widely across the VarHandle
type. It looks like that phrase was blankedly added to @return and @return tags.
So, I left it out for now.

The updated patch can be found here:

 http://cr.openjdk.java.net/~prappo/8242230/webrev.01/

***

Some of the cases this patch addresses suggest that we might need to do
something about how the Standard Doclet treats newline characters in source
files. More often than not, newline characters are purely to control the width
of the source lines. When carried over to the output, they may produce
undesirable effects. Punctuation marks and contents of the Standard Doclet tags
may be affected.

That problem [1] is neither new nor trivial to address. Still, we should add
something to the Standard Doclet Specification [2].

I'm not sure what we can do about it now other than work around the current
behavior where it is significant.

-Pavel

P.S. CC'ing to javadoc-...@openjdk.java.net

---
[1] https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap
[2] 
https://docs.oracle.com/en/java/javase/14/docs/specs/javadoc/doc-comment-spec.html


With kind regards,

Ivan

[1] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/invoke/VarHandle.html

[2] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/Inet4Address.html

On 4/6/20 11:28 AM, Pavel Rappo wrote:

Hello,

Please review the change for https://bugs.openjdk.java.net/browse/JDK-8242230:

   http://cr.openjdk.java.net/~prappo/8242230/webrev.00/

This is a documentation cleanup. There are no code changes involved,
and the changes in documentation are mostly trivial.

The following packages are affected:

 java.lang,
 java.text,
 java.util,
 java.util.logging,
 javax.lang.model.util,
 jdk.internal.reflect

-Pavel


--
With kind regards,
Ivan Gerasimov


--
With kind regards,
Ivan Gerasimov



Re: RFR [15] 8242230: Whitespace typos, relaxed javadoc, formatting

2020-04-07 Thread Ivan Gerasimov

Hi Pavel!

A couple of comments.

1)
java/util/logging/Formatter.java

This has one extra open curly brace:

"{{@literal }"

2)
grep finds some more typos of the same kind that you've spotted.

a)  rgrep '^[ ]*\*'|grep ' ,'|less

This find number of potential typos.  For example, the javadoc for 
VarHandle [1] has 53 occurrances of space-before-comma.  A few more are 
found in j.l.Thread, j.io.DataOutput, j.l.String, etc.


b)  rgrep '^[ ]*\*'|grep '\w- '|less

This find the word 'network' broken with a hyphen at [2] and also in 
share/classes/sun/net/util/IPAddressUtil.java


(I only searched under src/java.base)

With kind regards,

Ivan

[1] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/invoke/VarHandle.html


[2] 
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/Inet4Address.html


On 4/6/20 11:28 AM, Pavel Rappo wrote:

Hello,

Please review the change for https://bugs.openjdk.java.net/browse/JDK-8242230:

   http://cr.openjdk.java.net/~prappo/8242230/webrev.00/

This is a documentation cleanup. There are no code changes involved,
and the changes in documentation are mostly trivial.

The following packages are affected:

 java.lang,
 java.text,
 java.util,
 java.util.logging,
 javax.lang.model.util,
 jdk.internal.reflect

-Pavel


--
With kind regards,
Ivan Gerasimov



Re: [15] RFR 8241727 : Typos: empty lines in javadoc, inconsistent indents, etc. (core-libs only)

2020-03-28 Thread Ivan Gerasimov

Thank you Pavel!

Pushed.

On 3/27/20 5:09 AM, Pavel Rappo wrote:

140  a successful * query to a constant will always remain successful.

Made me pause and read a bit of the surrounding context
to make sure there's no "*" type of query.

Looks good to me. Thanks for doing this.

-Pavel


On 27 Mar 2020, at 10:26, Ivan Gerasimov  wrote:

Hello!

This is a javadoc-only fix.

There are a few changes that will actually be visible (for example [1], [2]), 
but the vast majority of changes are to remove redundant empty lines, correct 
indentation, or otherwise restore harmony.

Would you please help review this rather technical fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8241727
WEBREV: http://cr.openjdk.java.net/~igerasim/8241727/00/webrev/

Thank in advance!

[1] Extra '+' in text: 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/io/ObjectInputStream.html#readObjectOverride()

[2] Extra 'I' in text: 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/annotation/Annotation.html#hashCode()

--

With kind regards,
Ivan Gerasimov


--
With kind regards,
Ivan Gerasimov



[15] RFR 8241727 : Typos: empty lines in javadoc, inconsistent indents, etc. (core-libs only)

2020-03-27 Thread Ivan Gerasimov

Hello!

This is a javadoc-only fix.

There are a few changes that will actually be visible (for example [1], 
[2]), but the vast majority of changes are to remove redundant empty 
lines, correct indentation, or otherwise restore harmony.


Would you please help review this rather technical fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8241727
WEBREV: http://cr.openjdk.java.net/~igerasim/8241727/00/webrev/

Thank in advance!

[1] Extra '+' in text: 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/io/ObjectInputStream.html#readObjectOverride()


[2] Extra 'I' in text: 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/annotation/Annotation.html#hashCode()


--

With kind regards,
Ivan Gerasimov



Re: RFR 8237599 : Greedy matching against supplementary chars fails to respect the region

2020-03-25 Thread Ivan Gerasimov

Thank you Roger for review!


On 3/25/20 6:56 AM, Roger Riggs wrote:

Hi Ivan,

Looks fine.

Interesting edge case, would never be seen with 8 bit charsets.

Thanks, Roger


On 3/21/20 3:15 AM, Ivan Gerasimov wrote:

Gentle ping.

The webrev was rebased to accommodate recent changes in RegExTest.java.

The fix is to handle an edge case situation, which is supposedly not 
too common.


Nevertheless, I think, it is important to handle it correctly.

Thanks in advance!

Ivan


On 1/22/20 8:23 PM, Ivan Gerasimov wrote:

Hello everyone!

When the input of a j.u.regex.Matcher is restricted with .region() 
method, it can possibly cut off a half of a surrogate pair.


It turns out that greedy matching implemented in the 
Pattern.CharPropertyGreedy class fails to recognize this edge case 
in two scenarios:


1) When it greedily consumes the input and meets a higher half of a 
surrogate pair that was cut off at the end of input, and


2) When it backs off and meets a lower half of a surrogate pair at 
the very beginning of input.


In both cases, the engine reads the entire codepoint, crossing the 
boundaries of the set region.


Instead, it should only read the half of the surrogate pair that 
lies inside the region and ignore the other half.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599
WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/

Thanks in advance!




--
With kind regards,
Ivan Gerasimov



Re: RFR 8237599 : Greedy matching against supplementary chars fails to respect the region

2020-03-21 Thread Ivan Gerasimov

Gentle ping.

The webrev was rebased to accommodate recent changes in RegExTest.java.

The fix is to handle an edge case situation, which is supposedly not too 
common.


Nevertheless, I think, it is important to handle it correctly.

Thanks in advance!

Ivan


On 1/22/20 8:23 PM, Ivan Gerasimov wrote:

Hello everyone!

When the input of a j.u.regex.Matcher is restricted with .region() 
method, it can possibly cut off a half of a surrogate pair.


It turns out that greedy matching implemented in the 
Pattern.CharPropertyGreedy class fails to recognize this edge case in 
two scenarios:


1) When it greedily consumes the input and meets a higher half of a 
surrogate pair that was cut off at the end of input, and


2) When it backs off and meets a lower half of a surrogate pair at the 
very beginning of input.


In both cases, the engine reads the entire codepoint, crossing the 
boundaries of the set region.


Instead, it should only read the half of the surrogate pair that lies 
inside the region and ignore the other half.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599
WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/

Thanks in advance!


--
With kind regards,
Ivan Gerasimov



Re: RFR [15] 8241014: Miscellaneous typos in documentation comments

2020-03-20 Thread Ivan Gerasimov

Thank you Paul!

grep found a few more occurrences of  'equals to'  across java.base, so 
I fixed them as well.


Here's the updated webrev:

http://cr.openjdk.java.net/~igerasim/XXX-typos/01/webrev/


Pavel, I checked your portion of correction, everything looks good to me!

One minor nit:  In 
src/java.base/share/classes/java/util/StringJoiner.java can you please 
wrap the modified line, so it won't be that long?



With kind regards,

Ivan


On 3/20/20 10:16 AM, Paul Sandoz wrote:

--- a/src/java.base/share/classes/java/lang/invoke/MethodType.java
+++ b/src/java.base/share/classes/java/lang/invoke/MethodType.java
@@ -1379,12 +1379,12 @@
  
  /**

   * This implementation returns {@code true} if {@code obj} is 
another
- * {@code WeakEntry} whose referent is equals to this referent, or
- * if {@code obj} is equals to the referent of this. This allows
+ * {@code WeakEntry} whose referent equals to this referent, or
+ * if {@code obj} equals to the referent of this. This allows
   * lookups to be made without wrapping in a {@code WeakEntry}.
   *
   * @param obj the object to compare
- * @return true if {@code obj} is equals to this or the referent 
of this
+ * @return true if {@code obj} equals to this or the referent of 
this
   * @see MethodType#equals(Object)
   * @see Object#equals(Object)
  
Use either:


   whose referent is equal to this referent,

or

   whose referent equals this referent,

The former is easier just delete the ’s’.

Other bits look good.

Paul.


On Mar 13, 2020, at 7:03 PM, Ivan Gerasimov  wrote:

Hi Pavel!

Can this please be combined with my collection of typos?

http://cr.openjdk.java.net/~igerasim/XXX-typos/00/webrev/

Just to save cycles on reviewing :)

With kind regards,

Ivan


On 3/13/20 8:42 AM, Pavel Rappo wrote:

Hello,

Please review the change for https://bugs.openjdk.java.net/browse/JDK-8241014:

   http://cr.openjdk.java.net/~prappo/8241014/webrev.00/

This is a documentation cleanup. There are no code changes involved,
and the changes in documentation are mostly trivial.

The following packages are affected:

 java.lang,
 java.nio.file,
 java.nio.file.attribute,
 java.security,
 java.time.chrono,
 java.time.temporal,
 java.util,
 java.util.regex,
 java.util.stream,
 javax.crypto,
 javax.security.cert,
 javax.tools

That said, there are two changes that I'd prefer to be carefully reviewed by
the experts in the corresponding areas.

The first one is for a suspected typo in the javax.crypto.CryptoPolicyParser
class, "AlgrithomParameterSpec". It is not unheard-of for typos to be kept and
supported for the sake of backward compatibility. Sadly, we have a number of
those in OpenJDK. Even though I performed reasonable checks, the proposed fix
should better be verified by the security folk.

The second one is for the doc comment for the java.util.stream.Stream.collect 
method.

  @apiNote
  The following will accumulate strings into an ArrayList:

  List asList = stringStream.collect(Collectors.toList());

Given that the spec for Collectors.toList() clearly says that

 ...There are no guarantees on the type, mutability, serializability, or
 thread-safety of the List returned;...

I'd assume that @apiNote should be fixed as proposed.

-Pavel

P.S. Apologies for spamming multiple mailing lists.


--
With kind regards,
Ivan Gerasimov


--
With kind regards,
Ivan Gerasimov



Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes

2020-03-18 Thread Ivan Gerasimov

Thank you Roger!

Pushed.

With kind regards,

Ivan


On 3/17/20 5:20 PM, Roger Riggs wrote:

Hi Ivan,

I see the CSR is approved.

I'm fine with the change.

Regards, Roger


On 2/25/20 3:18 PM, Ivan Gerasimov wrote:

Thank you Roger for reviewing CSR and the release note!


On 2/11/20 12:49 PM, Roger Riggs wrote:

Hi Ivan,

Will this have enough of a compatibility concern to warrant a CSR?
It will change the behavor of these cases.

In the RegExTest, the failures should print which case is failing. 
(Line 4961, 4990).


I agree that many testcases in RegExTest could provide better 
diagnostics in a case of a failure.


I think, it maybe done as a separate cleanup.

In the added testcase I made sure that both the input string and the 
pattern are printed upon failure.


With kind regards,

Ivan



Regards, Roger


On 2/7/20 3:05 PM, Ivan Gerasimov wrote:

Gentle ping.

I had to rebase the fix, as the code has diverged since the RFR was 
sent out 10 months ago.


Also, the test was slightly modified to cover more cases.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/

Thanks in advance to the volunteer to review the fix!

With kind regards,

Ivan

On 4/21/19 7:50 PM, Ivan Gerasimov wrote:

Hello!

It turns out, that the case-insensitive j.u.regex.Pattern still 
pays attention to the characters case when certain char classes 
are used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} 
continue to recognize only lower, upper and title case characters, 
even in case-insensitive context.


For example, for POSIX char classes this behavior contradicts this 
paragraph:

"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions 
specifies that pattern matching shall be performed without regard 
to the case (uppercase or lowercase) of either data or patterns, 
then when each character in the string is matched against the 
pattern, not only the character, but also its case counterpart (if 
any), shall be matched. This definition of case-insensitive 
processing is intended to allow matching of multi-character 
collating elements as well as characters, as each character in the 
string is matched using both its cases.

...
"""

I also checked how Perl is dealing with in such situation, and 
yes, it ignores the case with various \p{} classes when they are 
used in case-insensitive context, so all these tests run fine:

'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;

For reference, here's a lengthy document, describing precise rules 
used by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d 



So, for any Lower, Upper or Title case chars in case-insensitive 
context Perl uses set of "Cased Letters", with is just a 
combination of these three categories (aka "LC" general category).


Would you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/






--
With kind regards,
Ivan Gerasimov



Re: RFR: 8196334: Optimize UUID#fromString

2020-02-28 Thread Ivan Gerasimov

Sounds reasonable.

I think the current proposal is clearly a progress, so it looks good.

I agree that any further optimization can be considered within a 
separate enhancement request.


With kind regards,

Ivan

On 2/28/20 2:59 PM, Claes Redestad wrote:

Hi Ivan,

I've considered trying it out, but it gets complicated as we don't want
to penalize -XX:-UseCompactStrings.

I think adding special methods for UUID inside StringLatin1 is a step
in the wrong direction, but maybe there's something more generic we can
consider. However, short of adding String.value()+coder() access via
JavaLangAccess I'm not sure there's anything that'd net us any
significant gain. And I don't think it's worth it to open up that
particular can of worms right now just for this.

/Claes

On 2020-02-28 20:51, Ivan Gerasimov wrote:

Hi Claes and Andriy!

It looks good overall.

I wonder if it'll be even faster if the fast path were implemented in 
StringLatin1 (for compact strings only) and was called via 
SharedSecrets/JavaLangAccess?


Then, you could operate directly on bytes and avoid dealing with longs.

With kind regards,

Ivan


On 2/28/20 6:22 AM, Claes Redestad wrote:

Hi all,

please review this patch to optimize UUID#fromString.

Jon Chambers originally proposed a patch that used a strict parser to
get a similar speed-up, but I failed to adapt it in a way that
could fall back to the less strict behavior while maintaining a
reasonable speed-up in the fast-path case. Sorry, Jon!

The patch proposed here was recently contributed by Andriy Plokhotnyuk
(OCA signed), and manages to get more than a 3x speed-up on the new
fromString microbenchmark, while falling back gently to the current,
less strict implementation if ever needed. I've done some light edits,
and added a simple microbenchmark.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8196334
Webrev: http://cr.openjdk.java.net/~redestad/8196334/open.00/

Testing: tier1-3

Thanks!

/Claes



--
With kind regards,
Ivan Gerasimov



Re: RFR: 8196334: Optimize UUID#fromString

2020-02-28 Thread Ivan Gerasimov

Hi Claes and Andriy!

It looks good overall.

I wonder if it'll be even faster if the fast path were implemented in 
StringLatin1 (for compact strings only) and was called via 
SharedSecrets/JavaLangAccess?


Then, you could operate directly on bytes and avoid dealing with longs.

With kind regards,

Ivan


On 2/28/20 6:22 AM, Claes Redestad wrote:

Hi all,

please review this patch to optimize UUID#fromString.

Jon Chambers originally proposed a patch that used a strict parser to
get a similar speed-up, but I failed to adapt it in a way that
could fall back to the less strict behavior while maintaining a
reasonable speed-up in the fast-path case. Sorry, Jon!

The patch proposed here was recently contributed by Andriy Plokhotnyuk
(OCA signed), and manages to get more than a 3x speed-up on the new
fromString microbenchmark, while falling back gently to the current,
less strict implementation if ever needed. I've done some light edits,
and added a simple microbenchmark.

Bug:    https://bugs.openjdk.java.net/browse/JDK-8196334
Webrev: http://cr.openjdk.java.net/~redestad/8196334/open.00/

Testing: tier1-3

Thanks!

/Claes


--
With kind regards,
Ivan Gerasimov



Re: RFR: 8240094: Optimize empty substring

2020-02-26 Thread Ivan Gerasimov

Looks good to me!

The copyright years need to be updated.

With kind regards,

Ivan


On 2/26/20 6:12 AM, Claes Redestad wrote:

Hi,

I took the liberty to file 
https://bugs.openjdk.java.net/browse/JDK-8240094 for this.


Webrev: http://cr.openjdk.java.net/~redestad/8240094/open.00/

I added a trivial micro and verified that there's a roughly 2.5x
improvement in this targeted test on my machine (~10 ns/op -> ~4ns/op),
while staying neutral on a selection of benchmarks that exercise
newString.

/Claes

On 2020-02-26 14:19, Claes Redestad wrote:

On 2020-02-26 11:01, Сергей Цыпанов wrote:

Currently we have

public static String stripLeading(byte[] value) {
   int left = indexOfNonWhitespace(value);
   if (left == value.length) {
 return "";
   }
   return (left != 0) ? newString(value, left, value.length - left) 
: null;

}

With the patch we change behaviour of this method for the case when 
value.length == 0:


public static String stripLeading(byte[] value) {
   int left = indexOfNonWhitespace(value);
   return (left != 0) ? newString(value, left, value.length - left) 
: null;

}

Unlike original method this code returns null instead of "" for 
empty array.
This does not affect caller (String.stripLeading()) i. e. visible 
behaviour

remains the same for String user, but is it OK in general?


One observable difference on the public API I think will happen here is
that new String("").stripLeading() == "" will change from true to false,
since the null return from the inner method mean we return the argument.

 From a compatibility point of view I think this should be fine, as
the identity of the returned empty string isn't specified. I don't think
a CSR is required, but bringing it up since others think otherwise.

Overall I like how the patch now cleans things up a bit on top of
(likely) optimizing a few edge cases, and can volunteer to sponsor it if
no one else has spoken up already.

Doing some quick performance testing and possibly adding a
microbenchmark before push would be good.

Thanks!

/Claes


--
With kind regards,
Ivan Gerasimov



Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes

2020-02-25 Thread Ivan Gerasimov

Thank you Roger for reviewing CSR and the release note!


On 2/11/20 12:49 PM, Roger Riggs wrote:

Hi Ivan,

Will this have enough of a compatibility concern to warrant a CSR?
It will change the behavor of these cases.

In the RegExTest, the failures should print which case is failing. 
(Line 4961, 4990).


I agree that many testcases in RegExTest could provide better 
diagnostics in a case of a failure.


I think, it maybe done as a separate cleanup.

In the added testcase I made sure that both the input string and the 
pattern are printed upon failure.


With kind regards,

Ivan



Regards, Roger


On 2/7/20 3:05 PM, Ivan Gerasimov wrote:

Gentle ping.

I had to rebase the fix, as the code has diverged since the RFR was 
sent out 10 months ago.


Also, the test was slightly modified to cover more cases.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/

Thanks in advance to the volunteer to review the fix!

With kind regards,

Ivan

On 4/21/19 7:50 PM, Ivan Gerasimov wrote:

Hello!

It turns out, that the case-insensitive j.u.regex.Pattern still pays 
attention to the characters case when certain char classes are used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} 
continue to recognize only lower, upper and title case characters, 
even in case-insensitive context.


For example, for POSIX char classes this behavior contradicts this 
paragraph:

"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions 
specifies that pattern matching shall be performed without regard to 
the case (uppercase or lowercase) of either data or patterns, then 
when each character in the string is matched against the pattern, 
not only the character, but also its case counterpart (if any), 
shall be matched. This definition of case-insensitive processing is 
intended to allow matching of multi-character collating elements as 
well as characters, as each character in the string is matched using 
both its cases.

...
"""

I also checked how Perl is dealing with in such situation, and yes, 
it ignores the case with various \p{} classes when they are used in 
case-insensitive context, so all these tests run fine:

'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;

For reference, here's a lengthy document, describing precise rules 
used by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d 



So, for any Lower, Upper or Title case chars in case-insensitive 
context Perl uses set of "Cased Letters", with is just a combination 
of these three categories (aka "LC" general category).


Would you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/




--
With kind regards,
Ivan Gerasimov



Re: [TRIVIAL] Fast-path for String.subsring(n,n)

2020-02-25 Thread Ivan Gerasimov

Hi Sergei!

Wouldn't it make sense to incorporate this fast path into 
StringUTF16.newString/StringLatin1.newString?


This way other callers of these methods will also benefit from it.

In particular, StringLatin1/StringUTF16.stripLeading(), 
StringLatin1/StringUTF16.stripTrailing() and maybe others could be 
slightly simplified.


With kind regards,

Ivan


On 2/25/20 7:42 AM, Сергей Цыпанов wrote:

Hello,

current implementation of String.substring(int,int) has a fast-path for the 
case when beginIndex = 0 and endIndex = length.

I think there should be similar fast-path for the case beginIndex = endIndex 
(asuming both are valid):


--
With kind regards,
Ivan Gerasimov



Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes

2020-02-24 Thread Ivan Gerasimov

Thank you Roger and Joe for the feedback!

May I please ask you to review the CSR draft [1] and the Release Notes 
[2] for this issue:


[1] https://bugs.openjdk.java.net/browse/JDK-8238984

[2] https://bugs.openjdk.java.net/browse/JDK-8239887

Thanks in advance!

Ivan


On 2/11/20 3:10 PM, Joe Darcy wrote:

Hello,

Yes, I believe this change should have a CSR, and most likely a 
release note.


Thanks,

-Joe

On 2/11/2020 12:49 PM, Roger Riggs wrote:

Hi Ivan,

Will this have enough of a compatibility concern to warrant a CSR?
It will change the behavor of these cases.

In the RegExTest, the failures should print which case is failing. 
(Line 4961, 4990).


Regards, Roger


On 2/7/20 3:05 PM, Ivan Gerasimov wrote:

Gentle ping.

I had to rebase the fix, as the code has diverged since the RFR was 
sent out 10 months ago.


Also, the test was slightly modified to cover more cases.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/

Thanks in advance to the volunteer to review the fix!

With kind regards,

Ivan

On 4/21/19 7:50 PM, Ivan Gerasimov wrote:

Hello!

It turns out, that the case-insensitive j.u.regex.Pattern still 
pays attention to the characters case when certain char classes are 
used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} 
continue to recognize only lower, upper and title case characters, 
even in case-insensitive context.


For example, for POSIX char classes this behavior contradicts this 
paragraph:

"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions 
specifies that pattern matching shall be performed without regard 
to the case (uppercase or lowercase) of either data or patterns, 
then when each character in the string is matched against the 
pattern, not only the character, but also its case counterpart (if 
any), shall be matched. This definition of case-insensitive 
processing is intended to allow matching of multi-character 
collating elements as well as characters, as each character in the 
string is matched using both its cases.

...
"""

I also checked how Perl is dealing with in such situation, and yes, 
it ignores the case with various \p{} classes when they are used in 
case-insensitive context, so all these tests run fine:

'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;

For reference, here's a lengthy document, describing precise rules 
used by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d 



So, for any Lower, Upper or Title case chars in case-insensitive 
context Perl uses set of "Cased Letters", with is just a 
combination of these three categories (aka "LC" general category).


Would you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/




--
With kind regards,
Ivan Gerasimov



Re: RFR 8235812 : (regex) Unicode linebreak with quantifier does not match valid input

2020-02-10 Thread Ivan Gerasimov

Thank you Roger for review!

I've adjusted the test as you suggested and pushed the fix.

With kind regards,
Ivan

On 2/10/20 1:11 PM, Roger Riggs wrote:

Hi Ivan,

This look fine.

In the test TegExTest: 5074, I would output the failed cases to 
System.err.

That way they get properly interleaved with the test progress output.

No need for another review.

Thanks, Roger



On 2/5/20 8:22 PM, Ivan Gerasimov wrote:

Hello!

j.u.regex.Pattern supports a special char class \R, which is 
specified to be equal to 
\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029].


In particular, this means that the input "\r\n" must match to both 
patterns \R and \R\R.


(In the later case, first \R matches \r and second \R matches \n.)

A pattern \R{2} is expected to be equal to \R\R.

However with the current implementation this does not hold (so, 
Pattern.matches("\\R{2}", "\r\n") == false, while 
Pattern.matches("\\R\\R", "\r\n") == true).


The root cause of this bug is that the special char class \R is 
handled via dedicated class LineEnding, which is not able to 
correctly handle backtracking in  presence of quantifiers).


A simple solution is to treat \R with quantifiers as an anonymous 
group, which will make it comply with the specification.


Without quantifiers, \R is still handled via more efficient 
implementation of LineEnding.


Would you please help review the fix?

Some minor cleanup was done along the way in the affected code.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235812
WEBREV: http://cr.openjdk.java.net/~igerasim/8235812/00/webrev/

Control build and testing (tiers1-4) are all green.




--
With kind regards,
Ivan Gerasimov



Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes

2020-02-07 Thread Ivan Gerasimov

Gentle ping.

I had to rebase the fix, as the code has diverged since the RFR was sent 
out 10 months ago.


Also, the test was slightly modified to cover more cases.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/

Thanks in advance to the volunteer to review the fix!

With kind regards,

Ivan

On 4/21/19 7:50 PM, Ivan Gerasimov wrote:

Hello!

It turns out, that the case-insensitive j.u.regex.Pattern still pays 
attention to the characters case when certain char classes are used.
For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} 
continue to recognize only lower, upper and title case characters, 
even in case-insensitive context.


For example, for POSIX char classes this behavior contradicts this 
paragraph:

"""
9.2 Regular Expression General Requirements
...
When a standard utility or function that uses regular expressions 
specifies that pattern matching shall be performed without regard to 
the case (uppercase or lowercase) of either data or patterns, then 
when each character in the string is matched against the pattern, not 
only the character, but also its case counterpart (if any), shall be 
matched. This definition of case-insensitive processing is intended to 
allow matching of multi-character collating elements as well as 
characters, as each character in the string is matched using both its 
cases.

...
"""

I also checked how Perl is dealing with in such situation, and yes, it 
ignores the case with various \p{} classes when they are used in 
case-insensitive context, so all these tests run fine:

'A' =~ /\p{Lower}/i or die;
'a' =~ /\p{Upper}/i or die;
'A' =~ /\p{gc=Lt}/i or die; # title case
'a' =~ /\p{IsTitlecase}/i or die;
'Lj' =~ /\p{Lower}/i or die; # title-cased digraph
'lj' =~ /\p{Upper}/i or die;
'LJ' =~ /\p{Lt}/i or die;

For reference, here's a lengthy document, describing precise rules 
used by Perl to deal with \p{} char classes:
https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d 



So, for any Lower, Upper or Title case chars in case-insensitive 
context Perl uses set of "Cased Letters", with is just a combination 
of these three categories (aka "LC" general category).


Would you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245
WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/


--
With kind regards,
Ivan Gerasimov



RFR 8235812 : (regex) Unicode linebreak with quantifier does not match valid input

2020-02-05 Thread Ivan Gerasimov

Hello!

j.u.regex.Pattern supports a special char class \R, which is specified 
to be equal to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029].


In particular, this means that the input "\r\n" must match to both 
patterns \R and \R\R.


(In the later case, first \R matches \r and second \R matches \n.)

A pattern \R{2} is expected to be equal to \R\R.

However with the current implementation this does not hold (so, 
Pattern.matches("\\R{2}", "\r\n") == false, while 
Pattern.matches("\\R\\R", "\r\n") == true).


The root cause of this bug is that the special char class \R is handled 
via dedicated class LineEnding, which is not able to correctly handle 
backtracking in  presence of quantifiers).


A simple solution is to treat \R with quantifiers as an anonymous group, 
which will make it comply with the specification.


Without quantifiers, \R is still handled via more efficient 
implementation of LineEnding.


Would you please help review the fix?

Some minor cleanup was done along the way in the affected code.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235812
WEBREV: http://cr.openjdk.java.net/~igerasim/8235812/00/webrev/

Control build and testing (tiers1-4) are all green.

--
With kind regards,
Ivan Gerasimov



RFR 8237599 : Greedy matching against supplementary chars fails to respect the region

2020-01-22 Thread Ivan Gerasimov

Hello everyone!

When the input of a j.u.regex.Matcher is restricted with .region() 
method, it can possibly cut off a half of a surrogate pair.


It turns out that greedy matching implemented in the 
Pattern.CharPropertyGreedy class fails to recognize this edge case in 
two scenarios:


1) When it greedily consumes the input and meets a higher half of a 
surrogate pair that was cut off at the end of input, and


2) When it backs off and meets a lower half of a surrogate pair at the 
very beginning of input.


In both cases, the engine reads the entire codepoint, crossing the 
boundaries of the set region.


Instead, it should only read the half of the surrogate pair that lies 
inside the region and ignore the other half.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599
WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: RFR 8236034 : Use optimized Ques node for curly {0,1} quantifier

2020-01-22 Thread Ivan Gerasimov

Thank you Roger for review!

Pushed.

--
With kind regards,
Ivan Gerasimov



Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList

2020-01-09 Thread Ivan Gerasimov

Thank you Roger for your review!

Happy New Year to you too! :)

With kind regards,

Ivan

On 1/9/20 10:36 AM, Roger Riggs wrote:

Hi Ivan,

Happy New Year!

This change looks fine.

Roger


On 11/23/19 2:30 AM, Ivan Gerasimov wrote:
Re-sending the request with the correct Subject line - the bug is not 
(only) about the iterator.



On 11/20/19 10:11 PM, Ivan Gerasimov wrote:

Hello!

When a sublist of a sublist of an ArrayList is created, its modCount 
is initialized from the ArrayList root, and not from its immediate 
parent.


This means that modifying that 2nd level sublist will reset 
modCounts of the entire chain up to the root, and consequently the 
1st level sublist won't detect concurrent modification done to the 
ArrayList root.


Along with the fix, handling of modCount is slightly improved in a 
few other places.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423
WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/




--
With kind regards,
Ivan Gerasimov



Re: Use single character replace variant in Resources.toPackageName(String)

2020-01-09 Thread Ivan Gerasimov

FYI.  I changed the year to 2020 and pushed the fix.

Thank you Christoph and Alan!

On 1/6/20 11:49 PM, Alan Bateman wrote:

On 07/01/2020 07:09, Ivan Gerasimov wrote:

So, I filed a Jira bug:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236705
WEBREV: http://cr.openjdk.java.net/~igerasim/8236705/00/webrev/

I added these source files in JDK 9, it was just an oversight that I 
didn't use the single char replace. My guess is that the change 
ModulePath deriveModuleDescriptor won't be noticeable in any real 
benchmark. This method is deriving a module descriptor for a legacy 
JAR file on the module path so it's scanning the contents of the JAR 
file and all attributes in its main manifest. The Resources 
toPackageName method is used more widely so it may help in a few 
cases. Anyway, this change looks fine except the copyright year 
(you've changed it to 2019 rather than 2020).


-Alan


--
With kind regards,
Ivan Gerasimov



Re: Use single character replace variant in Resources.toPackageName(String)

2020-01-06 Thread Ivan Gerasimov

So, I filed a Jira bug:

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236705
WEBREV: http://cr.openjdk.java.net/~igerasim/8236705/00/webrev/

If we agree on the content of the fix, I can push it after a sanity build.

With kind regards,

Ivan


On 1/6/20 10:41 PM, Ivan Gerasimov wrote:

Hi Christoph!

First, the same optimization can be done in 
src/java.base/share/classes/jdk/internal/module/ModulePath.java:

    mainClass = mainClass.replace("/", ".");

Second, I wonder what was the JDK version you ran your benchmark on?

After the fix for JDK-8222955 the method replace(CharSequence, 
CharSequence) now detects one-char arguments as a special case.


Thus, I think, more work should actually reduce the difference in 
performance between two versions of replace.


Still, I think this optimization is worthwhile.  I can sponsor it for 
you.


With kind regards,

Ivan


On 1/6/20 9:33 AM, Christoph Dreis wrote:

Hi and a happy new year,

I recently noticed that 
jdk.internal.module.Resources.toPackageName(String) makes use of 
String.replace(CharSequence, CharSequence) while it could use the 
single char variant in my opinion:


diff --git 
a/src/java.base/share/classes/jdk/internal/module/Resources.java 
b/src/java.base/share/classes/jdk/internal/module/Resources.java

--- a/src/java.base/share/classes/jdk/internal/module/Resources.java
+++ b/src/java.base/share/classes/jdk/internal/module/Resources.java
@@ -64,7 +64,7 @@
  if (index == -1 || index == name.length()-1) {
  return "";
  } else {
-    return name.substring(0, index).replace("/", ".");
+    return name.substring(0, index).replace('/', '.');
  }
  }

I ran an isolated benchmark with some test data on it, which shows 
the following results


Benchmark (param) Mode  Cnt Score Error   Units
MyBenchmark.testNew java/lang  avgt  10    14,905 ±   0,130   ns/op
MyBenchmark.testNew:·gc.alloc.rate.norm    java/lang  avgt 10    
48,000 ±   0,001    B/op

MyBenchmark.testNew a/b/c/d/e  avgt  10    23,122 ±   0,389   ns/op
MyBenchmark.testNew:·gc.alloc.rate.norm   a/b/c/d/e  avgt 10    
96,000 ±   0,001    B/op

MyBenchmark.testOld java/lang  avgt   10    16,614 ±   0,420   ns/op
MyBenchmark.testOld:·gc.alloc.rate.norm java/lang avgt   
10    48,000 ±   0,001    B/op

MyBenchmark.testOld a/b/c/d/e  avgt   10    84,745 ±   1,329   ns/op
MyBenchmark.testOld:·gc.alloc.rate.norm    a/b/c/d/e  avgt 10   
120,000 ±   0,001    B/op


As you can see the more replacing needs to be done the better the 
newer version seems to be perform.


In case this is considered worthwhile I would need someone to sponsor 
the patch. If not, I'm sorry for the noise.


Cheers,
Christoph Dreis




--
With kind regards,
Ivan Gerasimov



Re: Use single character replace variant in Resources.toPackageName(String)

2020-01-06 Thread Ivan Gerasimov

Hi Christoph!

First, the same optimization can be done in 
src/java.base/share/classes/jdk/internal/module/ModulePath.java:

    mainClass = mainClass.replace("/", ".");

Second, I wonder what was the JDK version you ran your benchmark on?

After the fix for JDK-8222955 the method replace(CharSequence, 
CharSequence) now detects one-char arguments as a special case.


Thus, I think, more work should actually reduce the difference in 
performance between two versions of replace.


Still, I think this optimization is worthwhile.  I can sponsor it for you.

With kind regards,

Ivan


On 1/6/20 9:33 AM, Christoph Dreis wrote:

Hi and a happy new year,

I recently noticed that jdk.internal.module.Resources.toPackageName(String) 
makes use of String.replace(CharSequence, CharSequence) while it could use the 
single char variant in my opinion:

diff --git a/src/java.base/share/classes/jdk/internal/module/Resources.java 
b/src/java.base/share/classes/jdk/internal/module/Resources.java
--- a/src/java.base/share/classes/jdk/internal/module/Resources.java
+++ b/src/java.base/share/classes/jdk/internal/module/Resources.java
@@ -64,7 +64,7 @@
  if (index == -1 || index == name.length()-1) {
  return "";
  } else {
-return name.substring(0, index).replace("/", ".");
+return name.substring(0, index).replace('/', '.');
  }
  }

I ran an isolated benchmark with some test data on it, which shows the 
following results

Benchmark(param)
 Mode  Cnt Score Error   Units
MyBenchmark.testNew   java/lang  avgt  
1014,905 ±   0,130   ns/op
MyBenchmark.testNew:·gc.alloc.rate.normjava/lang  avgt  1048,000 ±  
 0,001B/op
MyBenchmark.testNew  a/b/c/d/e  avgt  
1023,122 ±   0,389   ns/op
MyBenchmark.testNew:·gc.alloc.rate.norm   a/b/c/d/e  avgt  1096,000 ±   
0,001B/op
MyBenchmark.testOldjava/lang  avgt  
 1016,614 ±   0,420   ns/op
MyBenchmark.testOld:·gc.alloc.rate.norm java/lang  avgt   1048,000 
±   0,001B/op
MyBenchmark.testOld   a/b/c/d/e  avgt   
1084,745 ±   1,329   ns/op
MyBenchmark.testOld:·gc.alloc.rate.norma/b/c/d/e  avgt   10   120,000 ± 
  0,001B/op

As you can see the more replacing needs to be done the better the newer version 
seems to be perform.

In case this is considered worthwhile I would need someone to sponsor the 
patch. If not, I'm sorry for the noise.

Cheers,
Christoph Dreis




--
With kind regards,
Ivan Gerasimov



Re: RFR 8225466 : Optimize matching BMP Slice nodes

2019-12-18 Thread Ivan Gerasimov

Hi Roger.

Thank you for taking a look!

The variant with a single loop was the first thing I tried (should have 
mentioned that in the review request).


Unfortunately, that showed performance decrease.

I suspect that hitting the end of the input should be a less common 
thing, that's why it it beneficial to separate it as a slow path.


With kind regards,

Ivan


On 12/18/19 7:48 AM, Roger Riggs wrote:

Hi Ivan,

Though the new code has a good effect, the asymmetry and duplication 
seems unnecessary.
Can it be structured to have a single copy of the loop comparing the 
available range

and still get the desired performance improvement.

Like:

boolean match(Matcher matcher,int i, CharSequence seq) {
    int[] buf =buffer;
    int len = buf.length;
    for (int j =0; j < Math.min(len, matcher.to); j++) {
    if (buf[j] != seq.charAt(i+j))
    return false;
    }
    if (len >= matcher.to) {
    matcher.hitEnd =true;
    return false;
    }
    return next.match(matcher, i+len, seq);
}

Regards, Roger


On 10/28/19 9:03 PM, Ivan Gerasimov wrote:

Hello!

When building a Pattern object, the regex parser recognizes "slices" 
- continuous char subsequences, which all have to be matched 
case-sensitively/case-insensitively.  Matching with such a slice is 
implemented as a simple loop over a portion of the input.


In the current implementation, on each iteration of the loop it is 
checked if we have hit the end of the input (which is an uncommon case).


This check can be done only once, before the loop, which will make 
the loop lighter.


Benchmark shows up to +4% to the throughput for the case-insensitive 
matching.


Would you please help review the enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/


--- benchmark results ---

UNFIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op

FIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
---




--
With kind regards,
Ivan Gerasimov



Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList

2019-12-16 Thread Ivan Gerasimov

Gentle ping.

Will someone please volunteer to review this fix of a regression?

Thanks in advance!



On 11/20/19 10:11 PM, Ivan Gerasimov wrote:

Hello!

When a sublist of a sublist of an ArrayList is created, its modCount 
is initialized from the ArrayList root, and not from its immediate 
parent.


This means that modifying that 2nd level sublist will reset modCounts 
of the entire chain up to the root, and consequently the 1st level 
sublist won't detect concurrent modification done to the ArrayList root.


Along with the fix, handling of modCount is slightly improved in a 
few other places.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423
WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/


--
With kind regards,
Ivan Gerasimov



RFR 8236034 : Use optimized Ques node for curly {0,1} quantifier

2019-12-16 Thread Ivan Gerasimov

Hello!

In the regular expressions, the quantifier ? can be equally written as 
{0,1}.


While for the former variant an optimized algorithm is used (coded in 
the Pattern.Ques class), the later variant uses more general and less 
efficient implementation (coded in the Pattern.Curly class).


It would be beneficial, if we unify the implementation for these two 
variants of quantifier and make them both use Ques node.


Also, I couldn't resist to do some refactoring in the affected potion of 
the code to eliminate the repetition of logics.


The regression test was modified to exercise both forms of quantifiers 
equally well.


Would you please help review this enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236034
WEBREV: http://cr.openjdk.java.net/~igerasim/8236034/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node

2019-12-15 Thread Ivan Gerasimov

Thank you Martin!

Pushed.

On 12/14/19 10:45 PM, Martin Buchholz wrote:

Looks Good To Me.



--
With kind regards,
Ivan Gerasimov



Re: RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node

2019-12-14 Thread Ivan Gerasimov

Thank you Martin for looking at this!

On 12/14/19 6:58 AM, Martin Buchholz wrote:

Hi Ivan,

Did you mean to change from System.err to System.out?

Yes.  The first string that this utility prints in main() goes to 
System.out.

Not sure why the rest would need to go to another stream.

I might have fixed the overloading by giving one of the overloaded 
methods a better name.  How about

private static void printIndented(String s, int depth)


Now, after removing the overload `print(String fmt, Object ... args)` 
(which was just an alias of printf(String fmt, Object ... args)), we 
have two print() methods, and both of them indent the text.


I am not planning to invest much into this utility, just wanted to fix 
that annoying glitch.


With kind regards,
Ivan




On Fri, Dec 13, 2019 at 12:11 PM Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


Hello!

The java.util.regex package contains a debugging utility PrintPattern
which is used to display the internal representation structure of a
compiled regex pattern.

When it prints a node that is linked to another node that has already
been printed, it is supposed to print a link, but fails.

The error is due to ambigous overloading of print() method.

Would you please help review a trivial fix?

No regression test because the utility is not something officially
supported.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235930
WEBREV: http://cr.openjdk.java.net/~igerasim/8235930/00/webrev/

-- 
With kind regards,

    Ivan Gerasimov


--
With kind regards,
Ivan Gerasimov



RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node

2019-12-13 Thread Ivan Gerasimov

Hello!

The java.util.regex package contains a debugging utility PrintPattern 
which is used to display the internal representation structure of a 
compiled regex pattern.


When it prints a node that is linked to another node that has already 
been printed, it is supposed to print a link, but fails.


The error is due to ambigous overloading of print() method.

Would you please help review a trivial fix?

No regression test because the utility is not something officially 
supported.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235930
WEBREV: http://cr.openjdk.java.net/~igerasim/8235930/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR 8234147 : Avoid looking up standard charsets in core libraries

2019-12-01 Thread Ivan Gerasimov

Hi Ulf!

Thank you for the links!

The enhancement requests that you pointed out are about making the 
lookup faster.


My current proposal, however, is to completely avoid the lookup of the 
standard charsets in certain specific cases.


With kind regards,

Ivan

On 11/30/19 1:32 PM, Ulf Zibis wrote:

Hi Ivan,

you may compare your proposal with:
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6790402
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6850361
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6796087

-Ulf

Am 27.11.19 um 05:39 schrieb Ivan Gerasimov:

Hello!

It is a cleanup fix with mostly two kinds of changes:
- when a standard charset is specified by its name, use a 
preinitialized Charset constant instead,
- replace the usage of StandardCharset.* constants with their 
sun.nio.cs.* equivalents to avoid accidental early initialization of 
rarely-used charsets.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147
WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/

I also had to modify one regression test that relied on a private 
auxiliary method, which was removed with the fix.


Mach5 control build looks green.

Would you please help review the fix?


--
With kind regards,
Ivan Gerasimov



Re: RFR 8234147 : Avoid looking up standard charsets in core libraries

2019-12-01 Thread Ivan Gerasimov

Thank you Alan for the further review!

On 11/27/19 12:53 PM, Alan Bateman wrote:

On 27/11/2019 11:52, Ivan Gerasimov wrote:

:

It's not clear how to distinguish the classes inside the java.base 
module that do not have to depend on sun.nio.cs.  If you feel strong 
about NTLM and XML, I can replace sun.nio.cs.* instances with 
corresponding java.charset.StandardCharsets.* constants in these 
classes.
There isn't any way to say what is core and non-core in java.base so 
you have to use your judgement. My personal view is that the NTLM, 
XML, SOCKS and several others in this patch should stick to the 
standard APIs if possible as their performance will likely be 
dominated by other factors.



Personally, I think that using constants sun.nio.cs.xxx.INSTANCE is not 
too bad even in the code that is unlikely to be executed during the VM 
startup.  One small advantage is that if the code is copy/pasted within 
the java.base module, it will not bring the risk of early initialization 
of StandardCharsets.


With NTLM, however, switching to StandardCharsets allows to remove 
sun.nio.cs.UTF_16LE.INSTANCE and all other corresponding modifications.


So, I used StandardCharsets in NTLM (and in XML and SOCKS, as you 
suggested), and left sun.nio.cs constants in all other places.


Here's the updated webrev:

http://cr.openjdk.java.net/~igerasim/8234147/02/webrev/

It builds fine, tests run fine.

--
With kind regards,
Ivan Gerasimov



Re: RFR 8234147 : Avoid looking up standard charsets in core libraries

2019-11-27 Thread Ivan Gerasimov

Thank you Alan for reviewing!

Please see the comments inline.

On 11/26/19 11:54 PM, Alan Bateman wrote:

On 27/11/2019 04:39, Ivan Gerasimov wrote:

Hello!

It is a cleanup fix with mostly two kinds of changes:
- when a standard charset is specified by its name, use a 
preinitialized Charset constant instead,
- replace the usage of StandardCharset.* constants with their 
sun.nio.cs.* equivalents to avoid accidental early initialization of 
rarely-used charsets.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147
WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/

I also had to modify one regression test that relied on a private 
auxiliary method, which was removed with the fix.


Mach5 control build looks green.

Would you please help review the fix?
The motivation is okay and most of the changes are okay but it does 
create a needless dependency on sun.nio.* classes in places that are 
unlikely (or impossible) to use during early startup. I think the 
change to the JDBC CachedRow implementation, prefs, and the 
jdk.net.httpserver module should be dropped as want fewer classes 
outside of java.base depending on java.base internals, not more. I 
also wonder about the NTLM, XML properties and a few more that 
shouldn't be dependency on sun.nio.* classes if possible.


Please note that in fact sun.nio.cs.* was not used in java.net.http, 
java.prefs and java.sql.rowset.  I only used those sun.nio constants 
inside the java.base module, exactly for the reason of avoiding 
additional dependencies.


It's not clear how to distinguish the classes inside the java.base 
module that do not have to depend on sun.nio.cs.  If you feel strong 
about NTLM and XML, I can replace sun.nio.cs.* instances with 
corresponding java.charset.StandardCharsets.* constants in these classes.





Minot nit but if this is pushed then it would be good to keep the 
imports consistent with the existing style where you can, e.g. I see 
several files where the import sun.nio.* is put at the top of source 
files that already group imports of JDK internal classes together 
further down.


Thanks, done.  Unfortunately, imports conventions are not very 
consistent across the JDK code, so I mostly tried to preserve the 
pre-existing order of imports in each file.


Mandy might want to comment on TrySetAccessibleTest. If I'm not 
mistaken it should say "non-exported" rather than "exported". Also I 
think we want this test to have a dependency on a few JDK internal and 
private methods as possible as it's a maintenance issue to keep it up 
to date (as you've experienced here with the removal of Perf.getBytes).


If I got it correct, the test was meant to access a private static 
method, and after removing Perf.getBytes there were no such methods left 
in the Perf class.  That's why I had to pick some other method to test, 
so I chose ModulePath.packageName instead.


Here's the updated webrev with reorganized imports:

http://cr.openjdk.java.net/~igerasim/8234147/01/webrev/

Thanks again!

--
With kind regards,
Ivan Gerasimov



RFR 8234147 : Avoid looking up standard charsets in core libraries

2019-11-26 Thread Ivan Gerasimov

Hello!

It is a cleanup fix with mostly two kinds of changes:
- when a standard charset is specified by its name, use a preinitialized 
Charset constant instead,
- replace the usage of StandardCharset.* constants with their 
sun.nio.cs.* equivalents to avoid accidental early initialization of 
rarely-used charsets.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147
WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/

I also had to modify one regression test that relied on a private 
auxiliary method, which was removed with the fix.


Mach5 control build looks green.

Would you please help review the fix?

--
With kind regards,
Ivan Gerasimov



Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList

2019-11-22 Thread Ivan Gerasimov
Re-sending the request with the correct Subject line - the bug is not 
(only) about the iterator.



On 11/20/19 10:11 PM, Ivan Gerasimov wrote:

Hello!

When a sublist of a sublist of an ArrayList is created, its modCount 
is initialized from the ArrayList root, and not from its immediate 
parent.


This means that modifying that 2nd level sublist will reset modCounts 
of the entire chain up to the root, and consequently the 1st level 
sublist won't detect concurrent modification done to the ArrayList root.


Along with the fix, handling of modCount is slightly improved in a few 
other places.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423
WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/


--
With kind regards,
Ivan Gerasimov



RFR 8234423 : More accurate handling of modification counter in ArrayList.SubList.Iterator

2019-11-20 Thread Ivan Gerasimov

Hello!

When a sublist of a sublist of an ArrayList is created, its modCount is 
initialized from the ArrayList root, and not from its immediate parent.


This means that modifying that 2nd level sublist will reset modCounts of 
the entire chain up to the root, and consequently the 1st level sublist 
won't detect concurrent modification done to the ArrayList root.


Along with the fix, handling of modCount is slightly improved in a few 
other places.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423
WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR (XXS) 8233658 : Escape + in the expression describing Runtime.Version string

2019-11-06 Thread Ivan Gerasimov

Thank you Naoto!

Pushed.

On 11/6/19 8:24 AM, naoto.s...@oracle.com wrote:

Hi Ivan,

Looks good to me.

Naoto

On 11/5/19 9:39 PM, Ivan Gerasimov wrote:

Hello!

Here's yet another javadoc-only fix.

Format of the Runtime.Version string is described as a list of 
regular expressions [1]:


  $VNUM(-$PRE)?\+$BUILD(-$OPT)?
  $VNUM-$PRE(-$OPT)?
  $VNUM(+-$OPT)?

The character + in the last line should be escaped to be consistent 
with the first line and to avoid confusion with RE quantifiers.


[1] 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Runtime.Version.html 



Would you please help review this trivial one-char fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233658
WEBREV: http://cr.openjdk.java.net/~igerasim/8233658/00/webrev/




--
With kind regards,
Ivan Gerasimov



RFR (XXS) 8233658 : Escape + in the expression describing Runtime.Version string

2019-11-05 Thread Ivan Gerasimov

Hello!

Here's yet another javadoc-only fix.

Format of the Runtime.Version string is described as a list of regular 
expressions [1]:


 $VNUM(-$PRE)?\+$BUILD(-$OPT)?
 $VNUM-$PRE(-$OPT)?
 $VNUM(+-$OPT)?

The character + in the last line should be escaped to be consistent with 
the first line and to avoid confusion with RE quantifiers.


[1] 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Runtime.Version.html


Would you please help review this trivial one-char fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233658
WEBREV: http://cr.openjdk.java.net/~igerasim/8233658/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8233650 : Javadoc for Math.floorMod(int, int) gives wrong example

2019-11-05 Thread Ivan Gerasimov

Thank you Brian!

Pushed.

On 11/5/19 4:05 PM, Brian Burkhalter wrote:

Hi Ivan,

I think this looks good.

Brian


On Nov 5, 2019, at 3:35 PM, Ivan Gerasimov  wrote:

The javadoc for the method Math.floorMod(int, int) [1] contains an inaccuracy.

There are two sets of examples given:  One for the case when signs of the 
arguments are the same and the other for the case when signs of the arguments 
are different.
The second set of examples wrongly contains an example from the first set.

Would you please help review this javadoc-only fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233650
WEBREV: http://cr.openjdk.java.net/~igerasim/8233650/00/webrev/

[1] 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Math.html#floorMod(int,int)


--
With kind regards,
Ivan Gerasimov



RFR (S) 8233650 : Javadoc for Math.floorMod(int, int) gives wrong example

2019-11-05 Thread Ivan Gerasimov

Hello!

The javadoc for the method Math.floorMod(int, int) [1] contains an 
inaccuracy.


There are two sets of examples given:  One for the case when signs of 
the arguments are the same and the other for the case when signs of the 
arguments are different.

The second set of examples wrongly contains an example from the first set.

Would you please help review this javadoc-only fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233650
WEBREV: http://cr.openjdk.java.net/~igerasim/8233650/00/webrev/

[1] 
https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Math.html#floorMod(int,int)


--
With kind regards,
Ivan Gerasimov



RFR 8225466 : Optimize matching BMP Slice nodes

2019-10-28 Thread Ivan Gerasimov

Hello!

When building a Pattern object, the regex parser recognizes "slices" - 
continuous char subsequences, which all have to be matched 
case-sensitively/case-insensitively.  Matching with such a slice is 
implemented as a simple loop over a portion of the input.


In the current implementation, on each iteration of the loop it is 
checked if we have hit the end of the input (which is an uncommon case).


This check can be done only once, before the loop, which will make the 
loop lighter.


Benchmark shows up to +4% to the throughput for the case-insensitive 
matching.


Would you please help review the enhancement?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466
WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/


--- benchmark results ---

UNFIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  190.612 ? 0.336  ns/op

FIXED
Benchmark    Mode  Cnt    Score   Error  Units
PatternBench.sliceIFind  avgt   16  182.954 ? 0.493  ns/op
---

--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma

2019-10-03 Thread Ivan Gerasimov

Hi Chris!

On 10/3/19 8:05 AM, Chris Hegarty wrote:

Ivan,


On 3 Oct 2019, at 04:41, Ivan Gerasimov  wrote:

...

So, I filed CSR: https://bugs.openjdk.java.net/browse/JDK-8231805 to cover the 
addition of @throws paragraph in the javadoc of SocketPermission.

I would really appreciate it, if someone helped to review it.


Since we’re here ... ;-)
It would be good to specify the NPE behavior of the constructor. Here are the 
changes for SocketPermission. If you agree, fold them into your patch and CSR. 
( I’ve included test changes to verify the new tighter spec )

https://cr.openjdk.java.net/~chegar/8230407.extra/


Yes, it's a good point, thanks!

I've adopted your suggested changes and the test:
http://cr.openjdk.java.net/~igerasim/8230407/02/webrev/

CSR was also updated accordingly:
https://bugs.openjdk.java.net/browse/JDK-8231805

With kind regards,

Ivan



-Chris.




--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma

2019-10-03 Thread Ivan Gerasimov

Thanks Peter, your variant looks cute!

Let's keep this code in mind as a candidate for refactoring once the 
switch expression will make its way to the standard.


I would hesitate to allow using experimental features in building the 
JDK just for that code :)


With kind regards,

Ivan


On 10/2/19 11:40 PM, Peter Levart wrote:

Hi Ivan,

On 10/1/19 10:26 PM, Ivan Gerasimov wrote:

Hello!

The constructors of SocketPermission and FilePermission expect a 
String argument with comma-separated list of actions.


If the list is malformed, then the constructors throw 
IllegalArgumentException.


It turns out that the current implementation fails to throw IAE if 
the list starts with a leading comma.


Would you please help review a simple fix, which will make the 
behavior more consistent?


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407
WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/



With new switch expressions the logic could be a little clearer. For 
example:


    for (boolean seencomma = false; i >= matchlen && 
!seencomma; i--) {

    seencomma = switch (a[i - matchlen]) {
    case ' ', '\r', '\n', '\f', '\t': yield false;
    case ',': if (i > matchlen) yield true;
    default: throw new IllegalArgumentException(
    "invalid permission: " + actions);
    };
    }


This is still experimental, but I think it will be proposed to be 
standard soon.


If you want to backport it later, then maybe you don't want to do that 
now.


Regards, Peter


--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma

2019-10-02 Thread Ivan Gerasimov

Thank you Joe for checking it!

On 10/2/19 4:38 PM, Joe Darcy wrote:

Hello,

At least from a quick reading, either the spec change or the behavior 
change would seem to merit a CSR.



Sigh.  I was hopping it'll be a quick fix :-)

So, I filed CSR: https://bugs.openjdk.java.net/browse/JDK-8231805 to 
cover the addition of @throws paragraph in the javadoc of SocketPermission.


I would really appreciate it, if someone helped to review it.

W.r.t the behavior change, I don't think the fix has to be counted as 
such.   Current implementation already would throw 
IllegalArgumentException if the action list were malformed (for example 
if it were " , accept", or "connect,,accept", or "connect,", etc.)  The 
only case when it would *not* throw IAE is when the argument 
*immediately* starts with a comma, and that's what the fix is about.


It's not like if we used to allow commas in arbitrary places and stopped 
doing that.  Instead, it just turned out that the code fails to catch 
one specific pattern of malformed action list.


With kind regards,

Ivan



Cheers,

-Joe

On 10/2/2019 4:26 PM, Ivan Gerasimov wrote:

Hi Chris!

Thank you very much for review!

I agree that it makes sense to update the javadoc for consistency.

I don't think CSR is required in this case, is it? (IAE is unchecked 
anyway, and the fix doesn't really change the behavior.)


Here's the updated webrev:

http://cr.openjdk.java.net/~igerasim/8230407/01/webrev/

With kind regards,

Ivan


On 10/2/19 6:44 AM, Chris Hegarty wrote:

Ivan,

On 01/10/2019 21:26, Ivan Gerasimov wrote:

Hello!

The constructors of SocketPermission and FilePermission expect a 
String argument with comma-separated list of actions.


If the list is malformed, then the constructors throw 
IllegalArgumentException.


It turns out that the current implementation fails to throw IAE if 
the list starts with a leading comma.


Would you please help review a simple fix, which will make the 
behavior more consistent?


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407
WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/


The implementation changes look ok.

The SocketPermission constructor should be updated to specify IAE 
too, right?


-Chris.





--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma

2019-10-02 Thread Ivan Gerasimov

Hi Chris!

Thank you very much for review!

I agree that it makes sense to update the javadoc for consistency.

I don't think CSR is required in this case, is it? (IAE is unchecked 
anyway, and the fix doesn't really change the behavior.)


Here's the updated webrev:

http://cr.openjdk.java.net/~igerasim/8230407/01/webrev/

With kind regards,

Ivan


On 10/2/19 6:44 AM, Chris Hegarty wrote:

Ivan,

On 01/10/2019 21:26, Ivan Gerasimov wrote:

Hello!

The constructors of SocketPermission and FilePermission expect a 
String argument with comma-separated list of actions.


If the list is malformed, then the constructors throw 
IllegalArgumentException.


It turns out that the current implementation fails to throw IAE if 
the list starts with a leading comma.


Would you please help review a simple fix, which will make the 
behavior more consistent?


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407
WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/


The implementation changes look ok.

The SocketPermission constructor should be updated to specify IAE too, 
right?


-Chris.



--
With kind regards,
Ivan Gerasimov



Re: 8229022: BufferedReader performance can be improved by using StringBuilder

2019-10-01 Thread Ivan Gerasimov

Hi Brian!

This looks good to me!

With kind regards,

Ivan


On 10/1/19 5:13 PM, Brian Burkhalter wrote:

While the performance improvement that I measured for this proposed change [1] 
using JMH was only in the 2% to 8% range as opposed to the 13% claimed in the 
issue description, given the simplicity of the change [2] it is probably worth 
doing. All use of the StringBuilder is within a synchronized block within a 
single package-scope method, so there should be no problem with access by 
multiple threads.

Thanks,

Brian

[1] https://bugs.openjdk.java.net/browse/JDK-8229022
[2] diff

--- a/src/java.base/share/classes/java/io/BufferedReader.java
+++ b/src/java.base/share/classes/java/io/BufferedReader.java
@@ -314,7 +314,7 @@
   * @throws IOException  If an I/O error occurs
   */
  String readLine(boolean ignoreLF, boolean[] term) throws IOException {
-StringBuffer s = null;
+StringBuilder s = null;
  int startChar;
  
  synchronized (lock) {

@@ -372,7 +372,7 @@
  }
  
  if (s == null)

-s = new StringBuffer(defaultExpectedLineLength);
+s = new StringBuilder(defaultExpectedLineLength);
  s.append(cb, startChar, i - startChar);
  }
  }




RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma

2019-10-01 Thread Ivan Gerasimov

Hello!

The constructors of SocketPermission and FilePermission expect a String 
argument with comma-separated list of actions.


If the list is malformed, then the constructors throw 
IllegalArgumentException.


It turns out that the current implementation fails to throw IAE if the 
list starts with a leading comma.


Would you please help review a simple fix, which will make the behavior 
more consistent?


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407
WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8230415 : Avoid redundant permission checking in FilePermissionCollection and SocketPermissionCollection

2019-09-30 Thread Ivan Gerasimov

Thank you Sean for reviewing!

With kind regards,

Ivan


On 9/27/19 7:20 AM, Sean Mullan wrote:

Hi Ivan,

The fix looks good. Good catch.

--Sean

On 8/30/19 7:32 PM, Ivan Gerasimov wrote:

Hello!

In the two implementations of 
PermissionCollection.implies(Permission), all the permissions are 
traversed, and their corresponding bit mask are checked.


For example, here's a snippet from FilePermission.java:

 int desired = fperm.getMask();
 int effective = 0;
 int needed = desired;

 for (Permission perm : perms.values()) {
 FilePermission fp = (FilePermission)perm;
 if (((needed & fp.getMask()) != 0) && 
fp.impliesIgnoreMask(fperm)) {

 effective |= fp.getMask();
 if ((effective & desired) == desired) {
 return true;
 }
 needed = (desired ^ effective);// <<< should be 
(desired & ~effective)

 }
 }

Here, if a permission's mask `fp.getMask()` intersects with `needed`, 
but does not fully cover all the needed bits, the variable `needed` 
is updated as XOR of desired and effective. This can raise a 
not-really-needed bits in the `needed` mask, so that for all 
subsequent permissions from the collection with that unneeded bits in 
the mask, an expensive fp.impliesIgnoreMask(fperm) will be executed.


The fix does not change the behavior, but helps avoid unnecessary 
calls to impliesIgnoreMask().


Would you please help review a trivial fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230415
WEBREV: http://cr.openjdk.java.net/~igerasim/8230415/00/webrev/

Thanks in advance!




--
With kind regards,
Ivan Gerasimov



Re: RFR 8230829 : Matcher matches a surrogate pair that crosses border of the region

2019-09-12 Thread Ivan Gerasimov

Thank you Naoto for reviewing!

With kind regards,

Ivan

On 9/12/19 8:02 AM, naoto.s...@oracle.com wrote:

Looks good to me, Ivan.

Naoto

On 9/10/19 9:13 PM, Ivan Gerasimov wrote:

Hello!

When the regex.Matcher matches a supplementary codepoint, it may 
inadvertently cross the border of the explicitly specified region 
without noticing.


Would you please help review a simple fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230829
WEBREV: http://cr.openjdk.java.net/~igerasim/8230829/00/webrev/

Thanks in advance!




--
With kind regards,
Ivan Gerasimov



RFR 8230829 : Matcher matches a surrogate pair that crosses border of the region

2019-09-10 Thread Ivan Gerasimov

Hello!

When the regex.Matcher matches a supplementary codepoint, it may 
inadvertently cross the border of the explicitly specified region 
without noticing.


Would you please help review a simple fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230829
WEBREV: http://cr.openjdk.java.net/~igerasim/8230829/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-09-09 Thread Ivan Gerasimov

Thank you Stuart for the analysis!

Please see my comments inline.

On 9/9/19 4:39 PM, Stuart Marks wrote:



On 9/5/19 1:43 PM, Ivan Gerasimov wrote:

Personally, I don't have a strong preference here.

The compatibility property are meant to be temporary anyways.

Maybe if we have a single option that will control several different 
aspects of behavior, it will be harder to get rid of it?


Partially, because it will be tempting to reuse it for other similar 
changes, should they be needed.


OK, let's take an inventory of what behavior changes are being 
contemplated for regexes:


JDK-8230675 restrict IDs for control chars
JDK-xxx allow case-insensitive IDs for control chars *NOTE*
JDK-8225021 Treat ambiguous embedded flags as parse syntax errors
JDK-8214245 Case insensitive matching doesn't work correctly for some 
character classes


I quickly searched JBS and found several more bugs/enhancements requests 
that, if implemented, may result in the behavior changes.


Here's (presumably incomplete) list:

JDK-8218146  $ matches before end of line, even without MULTILINE mode
JDK-8217977  Matcher matching trailing high surrogate reports false for 
requireEnd()

JDK-8217501  Matcher.hitEnd returns false for incomplete surrogate pairs
JDK-8217496  Matcher.group() can return null after usePattern
JDK-8216332  Grapheme regex does not work with emoji sequences
JDK-8199594  Regex Pattern class improperly ignores spaces in character 
classes

JDK-8187083  Regex: Capturing groups inside a lookahead aren't backtracked
JDK-8187082  Regex: Nested capturing groups under lazy repetition aren't 
backtracked

JDK-8183391  Regex: End of line found more than once for non-multiline regex
JDK-8179668  Valid regex patterns match the latter half of complete 
surrogate pairs

JDK-8029966  Broken supplementary character support in regex
JDK-6919621  Matcher find returns wrong result in java 1.6 for certain 
patterns


All of them are of low priorities, so I don't anticipate active work on 
these bugs in the near future.
Though at least some of them, if fixed, would make the Java regexp 
engine better, so it probably wouldn't make sense to just abandon these 
request because of the compatibility reasons.


*NOTE* this was part of the original JDK-8230675 proposal, but you 
removed it after discussion. I don't know if we decided never to do 
this, or whether we're merely considering it separately. It seemed to 
me that there was a possibility that we'd do this in the future.


I was thinking of filling an enhancement request with the fix version 
set to TBD, so we can return to this proposal in some future release, if 
desirable.



Is this all the behavior changes being contemplated, or is this simply 
the set that we happened to have stumbled across recently? Put another 
way, if we decided to do some further analysis of regexes, would we 
run across other issues where we might say, "Yeah, we ought to fix 
that, but that would be a potentially incompatible behavior change, so 
we need to add another property." ?


In practice, such properties are only removed after a very long time, 
or perhaps even "never." It's not like this change would be added in 
this release (JDK 14), with backward compatibility support removed in 
a year (say, JDK 16) along with the property. The property, and the 
backward compatibility mode, would stick around in the code for many 
years.


What I want to avoid doing is to introduce behavior changes -- and 
properties to control them -- in a piecemeal fashion. It looks like we 
might have three or four candidates already. Would we want to live 
with three or four properties? If we did this and continued with 
additional changes, we might end up with six or eight or ten 
properties over time.


I'd like to see us look ahead a bit and take stock of what changes 
we're contemplating, and then decided how to deal with compatibility 
and migration based on that. I'd like to avoid making individual 
changes (and adding properties) one at a time, with decisions made in 
isolation, because that will lead to a proliferation of properties.



So, there are two alternatives at the table at this time:
1) A single compatibility property to revert to the old behavior; The 
property is reused for each of listed above bugs, so with each fix a 
portion of revert logic is added to the property.


PROS:  Easy to implement and maintain.
CONS:  Over time, can become hard to track, what exactly the property 
controls, so may be hard to use.  For example, if a user turns on this 
property to revert a single aspect of behavior, one will get all other 
behavior oddities.


2) Individual properties for every change of behavior.

PROS:  If needed, the behavior can be fine-grained.  Easier to 
understand what the expected behavior would be with every set of 
properties set.
CONS:  Complex to maintain.  For the majority of cases would be just an 
overkill.  Also, can greatly increase number of testi

Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-09-05 Thread Ivan Gerasimov

Hi Stuart!

Thanks for the comments, and please see my answers inline:

On 9/5/19 1:18 PM, Stuart Marks wrote:


Conceptually I think having the restriction is fine. If we were 
designing this as a new feature, I'd recommend putting in the 
restrictions from the very beginning.


However, since the old behavior has been out there for 20 years,  my 
main concern is compatibility. Having system properties to control 
this is ... ok ... I guess. I wish there were a better way, but it 
seems the simplest. However, I'd strongly advise making the behavior 
change and the compatibility mode(s) as simple as possible. Having 
more configuration options complicates the code and complicates the 
testing, and I really don't see the need for it.


So yes, I agree with abandoning ALLOW_LOWERCASE_CONTROL_CHAR_IDS.



Okay.  I've just sent out the updated webrev without this option:
http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/

And filed the CSR for the spec change and the new property:
https://bugs.openjdk.java.net/browse/JDK-8230675


I guess now we get to have a bikeshed on the property name. Does this 
apply only to mapping of control characters, or is there some other 
form of pattern syntax that could be made more strict? Ivan, I seem to 
recall you talking to me about making some changes in the syntax or 
interpretation of the construct that enables/disables various flags. 
This one:


|(?idmsuxU-idmsuxU)|

Would it make sense to have a single property, e.g. 
jdk.util.regex.strictSyntax, that governs this one as well? And are 
there other possibilities for tuning up the parsing behavior that we 
should be taking?



Yes, that was this bug:

https://bugs.openjdk.java.net/browse/JDK-8225021 (Treat ambiguous 
embedded flags as parse syntax errors)


And actually I have yet another bug of similar flavor:

https://bugs.openjdk.java.net/browse/JDK-8214245 (Case insensitive 
matching doesn't work correctly for some character classes)


However, this last one is about changing the matching rules, and not 
about the syntax (And it is still waiting for review!).




I'd rather not have a situation where we fix up one area of syntax and 
add a property for it, fix up another area and add a different 
property, leading to property proliferation.



Personally, I don't have a strong preference here.

The compatibility property are meant to be temporary anyways.

Maybe if we have a single option that will control several different 
aspects of behavior, it will be harder to get rid of it?


Partially, because it will be tempting to reuse it for other similar 
changes, should they be needed.


With kind regards,

Ivan



s'marks


On 9/4/19 9:00 PM, Martin Buchholz wrote:

Thanks, Ivan.  We're mostly in agreement.

+ * If {@code true} then lower-case control-character ids are mapped to the
+ * their upper-case counterparts.
Extra "the".

After all these decades I only now realize that c ^= 0x40 moves '?' 
to the end of the ASCII range and all the other controls to the start!


Should we support lower-case controls? Compatibility with perl regex 
still matters, but a lot less than in 2003.  But the key is that we 
got the WRONG ANSWER previously, so when we restrict the control ids 
let's just make lower case controls syntax errors.  Silently changing 
behavior is bad for users.  ... so let's abandon 
ALLOW_LOWERCASE_CONTROL_CHAR_IDS.

An alternative:
int ch = read() ^ 0x40;
if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch;



On Wed, Sep 4, 2019 at 6:49 PM Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


Thank you Martin!

On 8/30/19 6:19 PM, Martin Buchholz wrote:
> There's a strong expectation that ctrl-A and ctrl-a both map to
> \u0001, so I support Ivan's initiative.
> I'm surprised java regex gets this wrong.
> Might need a transitional system property.
>
Right.  I think it would be best to introduce two system properties:

The first, to turn on/off the restrictions on the control-char
names.
This will be enabled by default, and will permit names from the
limited
list: capital letters and a few other special characters.

The second one, to enable mapping of lower-case control-char
names to
their upper-case counterpart.  This option should be turned off by
default for the current release of JDK, and then turned on by
default
for some subsequent release (when, presumably, most applications
that
use this kind of regexp are fixed).

This all feels like a little bit too much for such a rarely used
feature, but probably is a proper thing to do anyway :-)

If we have an agreement on these system properties, I can create a
separate test to verify all possible combinations.


> What's the best bit-twiddle?  Untested:
> if ((c ^= 0x40) < 0x20) return c;
> if ((c ^=0x20)  <= 26 && c > 0) return c;
>

Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-09-05 Thread Ivan Gerasimov

Thank you Martin again!

Here's the updated webrev without the lower-case control char ids:

http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/

I've also filed a CSR to record the changes in bahavior:

https://bugs.openjdk.java.net/browse/JDK-8230675

Could you please help review it?


On 9/4/19 9:00 PM, Martin Buchholz wrote:

Thanks, Ivan.  We're mostly in agreement.

+ * If {@code true} then lower-case control-character ids are mapped to the
+ * their upper-case counterparts.
Extra "the".

After all these decades I only now realize that c ^= 0x40 moves '?' to 
the end of the ASCII range and all the other controls to the start!


Should we support lower-case controls?  Compatibility with perl regex 
still matters, but a lot less than in 2003.  But the key is that we 
got the WRONG ANSWER previously, so when we restrict the control ids 
let's just make lower case controls syntax errors.  Silently changing 
behavior is bad for users. ... so let's abandon 
ALLOW_LOWERCASE_CONTROL_CHAR_IDS.

An alternative:
int ch = read() ^ 0x40;
if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch;




This code will probably be most efficient for the common case.

However, I'd prefer to use the auxiliary method isCntrlId() in this 
case, as it is self-documenting and still efficient enough.


With kind regards,

Ivan




Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-09-05 Thread Ivan Gerasimov

Hello Bernd!

Thank you for your comments!

I'm going to proceed with only the restriction part of the change for 
now, so no blind conversion of lower-case control chars will happen.


A system property will allow the users to return to the previous less 
restrictive behavior, should they decide to keep malformed patterns 
unchanged.


I'll post the updated webrev and CSR request shortly.

With kind regards,
Ivan


On 9/4/19 10:54 PM, Bernd Eckenfels wrote:

Hallo,

Since not all combinations make sense (Exception+convert) a multi value might 
be better:

jdk.regex.control=WARN|EXCEPTION|STANDARD|LEGACY

With Exception generating an error, Standard beeing the planned new default 
(treating upper/lower same and error on all undefined chars) and legacy beeing 
the manual fallback to current behavior and WARN the same fallback but with 
logging.

I guess some form of early feedback like EXCPETION or WARN is needed, even when 
it is between a rock and a hard place. Maybe have at least one iteration where 
it defaults to LEGACY (+Release Notes announcement), then WARN and then finally 
STANDARD?

Gruss
Bernd


--
http://bernd.eckenfels.net


Von: core-libs-dev  im Auftrag von Ivan 
Gerasimov 
Gesendet: Donnerstag, September 5, 2019 4:00 AM
An: Martin Buchholz; Stuart Marks
Cc: core-libs-dev
Betreff: Re: RFR 8230365 : Pattern for a control-char matches non-control 
characters

Thank you Martin!

On 8/30/19 6:19 PM, Martin Buchholz wrote:

There's a strong expectation that ctrl-A and ctrl-a both map to
\u0001, so I support Ivan's initiative.
I'm surprised java regex gets this wrong.
Might need a transitional system property.


Right.  I think it would be best to introduce two system properties:

The first, to turn on/off the restrictions on the control-char names.
This will be enabled by default, and will permit names from the limited
list: capital letters and a few other special characters.

The second one, to enable mapping of lower-case control-char names to
their upper-case counterpart.  This option should be turned off by
default for the current release of JDK, and then turned on by default
for some subsequent release (when, presumably, most applications that
use this kind of regexp are fixed).

This all feels like a little bit too much for such a rarely used
feature, but probably is a proper thing to do anyway :-)

If we have an agreement on these system properties, I can create a
separate test to verify all possible combinations.



What's the best bit-twiddle?  Untested:
if ((c ^= 0x40) < 0x20) return c;
if ((c ^=0x20)  <= 26 && c > 0) return c;

0x40 is more readable than 64.


`((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case)
ids.


Does ctrol-? get mapped to 0x7f ?


Yes. I've got it in the test at the end of the line 4997.

Would you please help review the updated webrev:

http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/

With kind regards,

Ivan




--
With kind regards,
Ivan Gerasimov



Re: Reply: what to do next to fix JDK-8230557. thanks

2019-09-05 Thread Ivan Gerasimov

Hi Peter!

On 9/5/19 7:24 AM, Peter Levart wrote:

Hi Ivan,

On 9/5/19 11:22 AM, Ivan Gerasimov wrote:

Hello!

BitSet is known to be flawed in many ways:?0?2 its size(), length(), 
cardinality() and nextClearBit?6?7() can return meaningless negative 
values.


I was thinking about disallowing setting any index greater than 
(Integer.MAX_VALUE - 63), for example by throwing OutOfMemoryError.


An index of Integer.MAX_VALUE - 64 would be the greatest index then, 
an index of Integer.MAX_VALUE - 63 already needs an array of longs of 
length 2^25 which results in BitSet size() of 2^31 ...




We could do it without changing the specification.


The calls to: new BitSet(Integer.MAX_VALUE - 63) ... new 
BitSet(Integer.MAX_VALUE) would also have to throw then.
BitSet.valueOf(...) methods don't even check the passed in arguments 
lengths, so size() can really return a meaningless negative or 
positive number. They all would have to throw if the passed-in length 
of array/buffer is too large.


So would you not specify when those methods throw?

Yes.?0?2 My point was that any method that requires memory allocation 
should be expected to possibly throw OOM, so we don't have to specify it 
explicitly.


A similar thing happened with compact Strings, for example.?0?2 When the 
internal storage was changed to byte[] array, all of a sudden, it became 
impossible to create UTF16 Strings (or StringBuilders) of length > 
(Integer.MAX_VALUE / 2).


W.r.t. BitSet, personally, I would rather not fix its current behavior 
unless there is a very strong demand for the change.


Just wanted to mention one approach, which would limit usage of this 
class to the boundaries it was originally designed for.


With kind regards,

Ivan



Regards, Peter



With kind regards,

Ivan


On 9/5/19 1:16 AM,  wrote:

Hi, Peter.


I understand your point, but I think it's unreasonable for the 
reason that source code compatibility problem, it's really a bug.



User can't understand why the size method return a negative number.






Best, lamber-ken




----
??:"Peter Levart"??:""<2217232...@qq.com;"core-libs-dev"

:"David Holmes"indexes is: 0 ... 2^31 - 1 (0 ... Integer.MAX_VALUE). The maximum 
"size"

of BitSet is therefore 2^31. Unfortunately, this value can't be
"corectly" represented with signed 32 bit integer (int). Only in this
corner case, - 2^31 (Integer.MIN_VALUE) is the interpreted value
returned from size(). If one would interpret it as unsigned 32 bit
integer value, it is entirely correct. For example, this holds:

Integer.toUnsignedLong(new BitSet(Integer.MAX_VALUE).size()) == 1L 
<< 31


It is also always true what javadoc says about size(): "The maximum
element in the set is the size - 1st element"

The following holds also for this corner case:

new BitSet(Integer.MAX_VALUE).size() - 1 == Integer.MAX_VALUE;

So perhaps, the fix could be just to describe this corner case in the
spec. And perhaps, to support the following use case in the corner 
case:


BitSet set1 = ...
...

BitSet set2 = new BitSet(set1.size());

... by modifying the BitSet constructor to accept the Integer.MIN_VALUE
in addition to all the non-negative values as the 'nbits' parameter.

What do you think?

Regards, Peter

On 9/5/19 8:31 AM, David Holmes wrote:
 Hi,

 On 5/09/2019 4:11 pm,  wrote:

 Thanks very much.

 *BUG-LINK:* https://bugs.openjdk.java.net/browse/JDK-8230557

 *Describe: *
 the return type of the method BitSet#size is int, so the 
method may

 return a negative value in some case, for example, will return
 -2147483648.
 ```
 BitSet bitSet = new BitSet(Integer.MAX_VALUE);
 for (int i = 0; i < Integer.MAX_VALUE - 1000; i++) {
 bitSet.set(i);
 }
 System.out.println(bitSet.size());
 ```
 EXPECTED: 2147483648, but ACTUAL: -2147483648.

 *FIX*
 I have put the patch in the attachment of the mail.

 In case the attachment got stripped form the mailing list the 
proposed

 fix is:

 - public int size() {
 - return words.length 
* BITS_PER_WORD;

 + public long size() {
 + return (long) 
words.length * BITS_PER_WORD;


 Unfortunately this simple fix it not possible. You can't just 
change
 the return type of the method to long as that is a 
source-incompatible
 change and would not be approved [1]. Instead the return value 
should
 be capped at Integer.MAX_VALUE - but I'll leave that for 
someone from

 core-libs team to pick up. Also look at the evaluation in:

 https://bugs.openjdk.java.net/browse/JDK-4213570

 Cheers,
 David

 [1] https://wiki.openjdk.java.net/display/csr/CSR+FAQs




 ----
 *??:*"David Holmes" 
*??:*""<2217232...@qq.com;"core-libs-dev"
 d...@openjdk.java.net;
 *:*Re: ?? what to do next to fix JDK-8230557. 
thanks


 On 5/09/2019 3:46 pm,  wrote:
 
  hi, develope

Re: Reply: what to do next to fix JDK-8230557. thanks

2019-09-05 Thread Ivan Gerasimov

Hello!

BitSet is known to be flawed in many ways:?0?2 its size(), length(), 
cardinality() and nextClearBit?6?7() can return meaningless negative values.


I was thinking about disallowing setting any index greater than 
(Integer.MAX_VALUE - 63), for example by throwing OutOfMemoryError.


We could do it without changing the specification.

With kind regards,

Ivan


On 9/5/19 1:16 AM,  wrote:

Hi, Peter.


I understand your point, but I think it's unreasonable for the reason that 
source code compatibility problem, it's really a bug.


User can't understand why the size method return a negative number.






Best, lamber-ken




----
??:"Peter Levart"https://bugs.openjdk.java.net/browse/JDK-8230557

 *Describe: *
 the return type of the method BitSet#size is int, so the method may
 return a negative value in some case, for example, will return
 -2147483648.
 ```
 BitSet bitSet = new BitSet(Integer.MAX_VALUE);
 for (int i = 0; i < Integer.MAX_VALUE - 1000; i++) {
 bitSet.set(i);
 }
 System.out.println(bitSet.size());
 ```
 EXPECTED: 2147483648, but ACTUAL: -2147483648.

 *FIX*
 I have put the patch in the attachment of the mail.

 In case the attachment got stripped form the mailing list the proposed
 fix is:

 - public int size() {
 - return words.length * 
BITS_PER_WORD;
 + public long size() {
 + return (long) words.length * 
BITS_PER_WORD;

 Unfortunately this simple fix it not possible. You can't just change
 the return type of the method to long as that is a source-incompatible
 change and would not be approved [1]. Instead the return value should
 be capped at Integer.MAX_VALUE - but I'll leave that for someone from
 core-libs team to pick up. Also look at the evaluation in:

 https://bugs.openjdk.java.net/browse/JDK-4213570

 Cheers,
 David

 [1] https://wiki.openjdk.java.net/display/csr/CSR+FAQs




 ----
 *??:*"David Holmes"http://openjdk.java.net/contribute/
 
  and you would seem to be up to step 2. :)
 
  Cheers,
  David
 
  
  
  
  
   
[1]https://bugs.openjdk.java.net/browse/JDK-8230557
  
   [2]http://openjdk.java.net/contribute/
   
[3]https://www.oracle.com/technetwork/community/oca-486395.html
  
  
  
  
  
   
--nbsp;nbsp;--
   ??:nbsp;"Bug Report
  Notification"

--
With kind regards,
Ivan Gerasimov



Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-09-04 Thread Ivan Gerasimov

Thank you Martin!

On 8/30/19 6:19 PM, Martin Buchholz wrote:
There's a strong expectation that ctrl-A and ctrl-a both map to 
\u0001, so I support Ivan's initiative.

I'm surprised java regex gets this wrong.
Might need a transitional system property.


Right.  I think it would be best to introduce two system properties:

The first, to turn on/off the restrictions on the control-char names.  
This will be enabled by default, and will permit names from the limited 
list: capital letters and a few other special characters.


The second one, to enable mapping of lower-case control-char names to 
their upper-case counterpart.  This option should be turned off by 
default for the current release of JDK, and then turned on by default 
for some subsequent release (when, presumably, most applications that 
use this kind of regexp are fixed).


This all feels like a little bit too much for such a rarely used 
feature, but probably is a proper thing to do anyway :-)


If we have an agreement on these system properties, I can create a 
separate test to verify all possible combinations.




What's the best bit-twiddle?  Untested:
if ((c ^= 0x40) < 0x20) return c;
if ((c ^=0x20)  <= 26 && c > 0) return c;

0x40 is more readable than 64.

`((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case) 
ids.



Does ctrol-? get mapped to 0x7f ?


Yes. I've got it in the test at the end of the line 4997.

Would you please help review the updated webrev:

http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/

With kind regards,

Ivan




RFR (XS) 8230415 : Avoid redundant permission checking in FilePermissionCollection and SocketPermissionCollection

2019-08-30 Thread Ivan Gerasimov

Hello!

In the two implementations of PermissionCollection.implies(Permission), 
all the permissions are traversed, and their corresponding bit mask are 
checked.


For example, here's a snippet from FilePermission.java:

    int desired = fperm.getMask();
    int effective = 0;
    int needed = desired;

    for (Permission perm : perms.values()) {
    FilePermission fp = (FilePermission)perm;
    if (((needed & fp.getMask()) != 0) && 
fp.impliesIgnoreMask(fperm)) {

    effective |= fp.getMask();
    if ((effective & desired) == desired) {
    return true;
    }
    needed = (desired ^ effective);// <<< should be 
(desired & ~effective)

    }
    }

Here, if a permission's mask `fp.getMask()` intersects with `needed`, 
but does not fully cover all the needed bits, the variable `needed` is 
updated as XOR of desired and effective. This can raise a 
not-really-needed bits in the `needed` mask, so that for all subsequent 
permissions from the collection with that unneeded bits in the mask, an 
expensive fp.impliesIgnoreMask(fperm) will be executed.


The fix does not change the behavior, but helps avoid unnecessary calls 
to impliesIgnoreMask().


Would you please help review a trivial fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230415
WEBREV: http://cr.openjdk.java.net/~igerasim/8230415/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: RFR 8230365 : Pattern for a control-char matches non-control characters

2019-08-29 Thread Ivan Gerasimov

Hi Stuart!

Thank you for your comments on the proposal!
I totally agree with you that the fixes that result in changing the 
behavior have to be carefully planned and well thought.

Please see my answers inline.

On 8/29/19 6:15 PM, Stuart Marks wrote:

Hi Ivan,

This change certainly makes regex patterns more rigorous, but I'm 
concerned about the compatibility. This is a spec change and also a 
behavior change. While the current behavior might not strictly be 
correct, it does have some characteristics that applications might be 
depending on -- perhaps even by accident. If this change is made, it 
might cause subtle issues in applications that would be quite 
difficult to diagnose.



There are two types of changes in the proposal:
First, make \cx construct more restrictive w.r.t. possible values of x.
Second, make it case-insensitive, so \cz will mean the same as \cZ.

I agree that the later type of changes can potentially cause hard to 
diagnose failures in the existing applications.
The reason for this later change was an attempt to make Java regexp more 
Perl-compatible (and Perl does exactly this:  Treats \cx with a 
lower-case x as \cX with its upper-case counterpart).


The former type of change, on the other hand, might actually be useful 
for the existing applications, as it may allow to see (otherwise hard to 
find) bugs in the code.


For example, if some existing application has a regexp, which contains 
"\\c\t", this is most likely a programming error (or a nasty hack to 
code a char 'I'), so it would be beneficial to report it via throwing a 
PatternSyntaxException.




Examples of changes I'm concerned about are:

pattern \ca currently matches '!' would now match \u0001
pattern \cÀ currently matches \u0080 would now throw exception
pattern \c0 currently matches 'p' would now throw exception

and so forth. That is, using \c with characters in the range [a-z] 
would now match different characters from before, and using \c with 
characters outside the set that correspond to C0 control characters 
would now throw an exception whereas before they would matching 
something that was predictable, if in some sense incorrect.


There are some ways to mitigate the incompatibility, for example, by 
adding a system property, or by adding a Pattern flag that explicitly 
enables this behavior, though I'm not sure that either is worthwhile. 
Maybe there are less intrusive ways that I haven't thought of.


I think it would make sense to first separate the changes mentioned 
above, so it will be easier to reason about them.


While I think that making lower-case \\cx a synonym for the upper-case 
\\cX is a good thing to have (mainly for additional compatibility with 
Perl-style regexps), it's supposed be deferred to a later time.




The current behavior seems to be have been established around 1999 
(JDK 1.3?) so it's been around a long time, plenty of time for 
applications to have formed inadvertent dependencies on this behavior. 
An alternative would be simply to document the current behavior, even 
though it's arguably incorrect.


Is there some benefit to this change, for example, does it enable one 
to write an application that wasn't possible before because of this bug?



In my opinion, the benefits are:
1) Fail-early approach, which will let the developers catch the bugs 
earlier,
2) Better compatibility with Perl-style regexps, so that a wider class 
of regular expressions can be used across different platforms,
3) Well defined strict rules will allow IDEs to implement additional 
edit-time checks of regexp, which will again help developers.


Please find the updated webrev here:
http://cr.openjdk.java.net/~igerasim/8230365/01/webrev/

With kind regards,
Ivan



s'marks


On 8/29/19 4:39 PM, Ivan Gerasimov wrote:

Hello!

In a regular expression pattern a sequence of the form \\cx is 
allowed to specify a control character that corresponds to the name 
char x.


Current implementation has a few issues with that:
1)  It allows x to be just any character, including non-printable ones;
2)  The produced regexp may correspond to a non-control characters;
3)  The expression is case-sensitive, so, for example \\cA differs 
from \\ca, while they both have to produce ctrl-A.


It is proposed to make parsing more strict and reject invalid values 
of x, and also clarify the documentation to explicitly list valid 
values of x.


If we agree on this proposal, then a CSR will probably need to be 
filed to capture the changes in the regexp parsing.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230365
WEBREV: http://cr.openjdk.java.net/~igerasim/8230365/00/webrev/




--
With kind regards,
Ivan Gerasimov



RFR 8230365 : Pattern for a control-char matches non-control characters

2019-08-29 Thread Ivan Gerasimov

Hello!

In a regular expression pattern a sequence of the form \\cx is allowed 
to specify a control character that corresponds to the name char x.


Current implementation has a few issues with that:
1)  It allows x to be just any character, including non-printable ones;
2)  The produced regexp may correspond to a non-control characters;
3)  The expression is case-sensitive, so, for example \\cA differs from 
\\ca, while they both have to produce ctrl-A.


It is proposed to make parsing more strict and reject invalid values of 
x, and also clarify the documentation to explicitly list valid values of x.


If we agree on this proposal, then a CSR will probably need to be filed 
to capture the changes in the regexp parsing.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230365
WEBREV: http://cr.openjdk.java.net/~igerasim/8230365/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8230338 : Accurate error message about bad Unicode block name

2019-08-29 Thread Ivan Gerasimov

Thank you Roger!


On 8/29/19 6:39 AM, Roger Riggs wrote:

Looks fine Ivan,

Thanks, Roger



--
With kind regards,
Ivan Gerasimov



RFR (S) 8230338 : Accurate error message about bad Unicode block name

2019-08-29 Thread Ivan Gerasimov

Hello!

When Pattern.compile complains about a wrong Unicode block name, it 
includes the prefix "In" twice, like this:

''Unknown character property name {*In*/Is*In*BadBlockName} near index NN ''

Would you please help review a trivial fix, which makes the message more 
accurate?


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230338
WEBREV: http://cr.openjdk.java.net/~igerasim/8230338/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: RFR (S) 8226831 : Use Objects.equals() when appropriate

2019-08-27 Thread Ivan Gerasimov

Thank you Roger and Brian!

The fix has been pushed.

With kind regards,

Ivan


On 8/27/19 1:12 PM, Brian Burkhalter wrote:

Hi Ivan,

+1

Brian


On Aug 27, 2019, at 12:41 PM, Roger Riggs  wrote:

Hi Ivan,

These look fine.

Regards, Roger


On 8/27/19 2:21 PM, Ivan Gerasimov wrote:

Hello!

null-aware comparison of Objects is widely used, so it's tempting to use 
Objects.equals for that.

Unfortunately, there are concerns w.r.t performance (see JDK-8015417), so it 
seems to better avoid this kind of refactoring in the performance critical code.

Please find the webrev with a few places cleaned up (I had more, but decided to 
limit this patch due to the concern mentioned above).

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8226831
WEBREV: http://cr.openjdk.java.net/~igerasim/8226831/00/webrev/

Thanks in advance!




--
With kind regards,
Ivan Gerasimov



RFR (S) 8226831 : Use Objects.equals() when appropriate

2019-08-27 Thread Ivan Gerasimov

Hello!

null-aware comparison of Objects is widely used, so it's tempting to use 
Objects.equals for that.


Unfortunately, there are concerns w.r.t performance (see JDK-8015417), 
so it seems to better avoid this kind of refactoring in the performance 
critical code.


Please find the webrev with a few places cleaned up (I had more, but 
decided to limit this patch due to the concern mentioned above).


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8226831
WEBREV: http://cr.openjdk.java.net/~igerasim/8226831/00/webrev/

Thanks in advance!

--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-23 Thread Ivan Gerasimov

Thank you Brian!

This looks good to me.

With kind regards,

Ivan

On 8/23/19 1:33 PM, Brian Burkhalter wrote:

Hi Ivan,

On Aug 23, 2019, at 12:41 PM, Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


One minor comment.  Here:
3990 if (digits > 0) {
3991 padWithZeros(buf, digits);
3992 } else {
3993 buf.append('0');
3994 }

I cannot really see how digits may be <= 0, so it seems to me it can 
be safely replaced by just one line `padWithZeros(buf, digits);`.


Actually I don’t see how digits may be non-positive for signum == 0 
either. I’ve reduced the above to one line (L3990) in [1]. That is the 
only change versus version .04.


Alternatively, the entire branch `if (signum == 0) {` can be removed 
from smallToString (so that this method will only work with strictly 
positive numbers), and done only for results[1] at line 4083 because 
it is the only possible source of ZERO values in the process.


It also seems that the arithmetic in the very internal loop at 
4005-4017 can be slightly optimized, but this probably can be left 
for another enhancement.


I think I’ll leave it as is for now.

Thanks!

Brian

[1] http://cr.openjdk.java.net/~bpb/8229845/webrev.05/


--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-23 Thread Ivan Gerasimov

Hi Brian.
This looks good to me, thanks!

One minor comment.  Here:
3990 if (digits > 0) {
3991 padWithZeros(buf, digits);
3992 } else {
3993 buf.append('0');
3994 }

I cannot really see how digits may be <= 0, so it seems to me it can be 
safely replaced by just one line `padWithZeros(buf, digits);`.


Alternatively, the entire branch `if (signum == 0) {` can be removed 
from smallToString (so that this method will only work with strictly 
positive numbers), and done only for results[1] at line 4083 because it 
is the only possible source of ZERO values in the process.



It also seems that the arithmetic in the very internal loop at 4005-4017 
can be slightly optimized, but this probably can be left for another 
enhancement.


With kind regards,
Ivan


On 8/23/19 10:11 AM, Brian Burkhalter wrote:

Hi Ivan,

On Aug 22, 2019, at 1:24 PM, Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


I have a few further suggestions how to simplify the code.

[…]


Those are all good suggestions: thanks!


Here's the patch, which illustrates all the suggestions above:
http://cr.openjdk.java.net/~igerasim/8229845/00/webrev/
(I have only run the tests from test/jdk/java/math/BigInteger to 
verify it).


This patch has a problem:
4023 padWithZeros(buf, digits - s.length() + (numGroups - 1) * 
digitsPerLong[radix]);

It is missing parentheses:

4023         padWithZeros(buf, digits - (s.length() + (numGroups - 1) 
* digitsPerLong[radix]));


I was unable to find this by debugging but implemented your 
suggestions by hand and then found it by diff-ing the patches. My 
updated versions with your suggestions incorporated are: [1] delta, 
[2] absolute. I also increased the size of some of the values tested 
at line 798 of BigIntegerTest.


Thanks,

Brian

[1] http://cr.openjdk.java.net/~bpb/8229845/webrev.03-04/
[2] http://cr.openjdk.java.net/~bpb/8229845/webrev.04/


--
With kind regards,
Ivan Gerasimov



Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath

2019-08-23 Thread Ivan Gerasimov

Hi Julia!

On 8/23/19 3:36 AM, Julia Boes wrote:

Hi Ivan,

The change of the javadoc was made per Brian's request for consistency 
with java.lang.Math, see the description of the bug: 
https://bugs.openjdk.java.net/browse/JDK-8229485


Ah, I didn't realize that the operations listed in that paragraph are 
those that *do not* have a corresponding -Exact method.


The fix looks good then, thanks.

With kind regards,

Ivan



Best,

Julia

On 22/08/2019 17:25, Ivan Gerasimov wrote:

Is there a reason to change the text at lines 72-75?

Since you've just added the increment/decrement/negate methods, the 
old text seems to be more complete.


With kind regards,

Ivan 



--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-22 Thread Ivan Gerasimov

Thank you Brian, this looks much better!

I have a few further suggestions how to simplify the code.

1)
If this BigInteger is negative, it is negated in two places: @ 3943 and 
either @ 4003 or @ 4061.
It is better to keep abs value at the very beginning and make private 
toString(...) and smallToString() to only accept non-negative numbers.


2)
Once the 1) is done, it is possible to remove lines 3948-3954, and 
always call private toString(...), which will immediately delegate to 
smallToString() if necessary.  This is just to make the code shorter.


3)
In padWithZeros(), the condition check `if (digits > 0 && len < digits) 
{` is not necessary.
Actually, I would suggest to make a single argument, say, 
numLeadingZeros (in the current code it is `m`).


4)
I think that logic with making the argument `digits` == -1 for the first 
chunk of digits is over-complication:  `digits` can always be safely set 
to 1 in the initial call to the private toString() or smallToString().

Then the variable atBeginning is not needed at all.

5)
If 3) and 4) are done, the two lines 3988 and 3989 can be reduced to 
`padWithZeros(buf, digits);`


Here's the patch, which illustrates all the suggestions above:
http://cr.openjdk.java.net/~igerasim/8229845/00/webrev/
(I have only run the tests from test/jdk/java/math/BigInteger to verify it).

With kind regards,
Ivan


On 8/22/19 11:04 AM, Brian Burkhalter wrote:

Hi Ivan,

Please see the delta [1] and absolute [2] webrevs.

On Aug 21, 2019, at 7:38 PM, Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


A couple of comments

1)
3974 if (signum == 0) {
3975 buf.append('0');
3976 return;
3977 }

Shouldn't it be padded with zeroes, if digits > 0?
If I'm not mistaken, it could happen if result[1] at the end of 
toString() happens to be ZERO.


Good catch: you are absolutely correct.

It that's true, then it may be a good idea to add a testcase for a 
number with many trailing zeros in a string representation. 
Currently, test/jdk/java/math/BigInteger/BigIntegerTest.java in 
stringConv() operates on random numbers, which are unlikely to have 
huge number of trailing zeros.


I added testing this to stringConv(). The test fails without the most 
recent delta applied to the implementation.



2)
4120 /* zero is a string of NUM_ZEROS consecutive zeros. */
4121 private static final String zeros = "0".repeat(NUM_ZEROS);

Nit in the comment:  name of the constant should be zero*s*.


I changed the variable name to ZEROS as it is a constant.

Thanks,

Brian

[1] http://cr.openjdk.java.net/~bpb/8229845/webrev.02-03/
[2] http://cr.openjdk.java.net/~bpb/8229845/webrev.03/


--
With kind regards,
Ivan Gerasimov



Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath

2019-08-22 Thread Ivan Gerasimov

Thank you Julia!

On 8/22/19 2:06 AM, Julia Boes wrote:

Hi Ivan,

Thanks for pointing that out, the links are now added.

Updated webrev: 
http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.03/



Is there a reason to change the text at lines 72-75?

Since you've just added the increment/decrement/negate methods, the old 
text seems to be more complete.


With kind regards,

Ivan



Regards,

Julia

On 21/08/2019 20:33, Ivan Gerasimov wrote:

Hi Julia!

This looks good to me.

For all the other methods from StrictMath that have a corresponding 
method in Math there is a link in javadoc `@see Math#xxx`.


I think, it makes sense to provide such links for all new methods for 
consistency.


With kind regards,

Ivan


On 8/21/19 6:29 AM, Julia Boes wrote:

Hi all,

Thanks for the review. I incorporated the following changes:

- added "@since 14" tag

- removed test code from the CSR specification

- minor fixes in StrictMath.java, ExactArithTests.java as per Ivan's 
comments. The related fix in Math.java is included in this CSR: 
https://bugs.openjdk.java.net/browse/JDK-8229473


Updated webrev: 
http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.02/index.html


Regards,

Julia

On 14/08/2019 20:45, Joe Wang wrote:

Hi Daniel,

That makes sense. So it meant to say "converts the long to int only 
if it's within the range of an integer" (to be "exact"). It may 
then add "throws an exception if not". But the original is fine, 
just my 2 cents.


Best,
Joe

On 8/14/19 12:21 PM, Daniel Fuchs wrote:

Hi Joe,

On 14/08/2019 20:15, Joe Wang wrote:
The 2nd part is not necessary as that's what the "@throws" tag is 
for.


Not withstanding the fact that this is a copy of the API doc from
java.lang.Math, I'd argue that the second part is quite important.
It's why the method is called exact, and is probably the only
reason why you would call that method in the first place.
So I think the 2nd part must be in the synopsis too.

best regards,

-- daniel

On 8/14/19 9:01 AM, Julia Boes wrote:
>
> Hi,
>
> This fix adds decrementExact(), incrementExact(), and 
negateExact() to java.lang.StrictMath. The methods were added to 
java.lang.Math previously [1] and should have been added to 
java.lang.StrictMath for consistency.

>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8229485
>
> Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/
>
> CSR: https://bugs.openjdk.java.net/browse/JDK-8229702
>
> Thanks,
>
> Julia
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8022109







--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-21 Thread Ivan Gerasimov

Hi Brian!

A couple of comments

1)
3974 if (signum == 0) {
3975 buf.append('0');
3976 return;
3977 }

Shouldn't it be padded with zeroes, if digits > 0?
If I'm not mistaken, it could happen if result[1] at the end of 
toString() happens to be ZERO.


It that's true, then it may be a good idea to add a testcase for a 
number with many trailing zeros in a string representation.
Currently, test/jdk/java/math/BigInteger/BigIntegerTest.java in 
stringConv() operates on random numbers, which are unlikely to have huge 
number of trailing zeros.


2)
4120 /* zero is a string of NUM_ZEROS consecutive zeros. */
4121 private static final String zeros = "0".repeat(NUM_ZEROS);

Nit in the comment:  name of the constant should be zero*s*.

With kind regards,
Ivan


On 8/21/19 5:28 PM, Brian Burkhalter wrote:

Hello,


On Aug 20, 2019, at 4:31 PM, Brian Burkhalter  
wrote:


On Aug 20, 2019, at 2:45 PM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote:

Would it make sense to add an argument `digits` to smallToString (like the same 
named argument of toString, the minimum number of digits to pad to), so it 
could zero-pad the result itself?

This way we could avoid inserting zeros into the middle of a StringBuilder 
after a call to smallToString.

Indeed I do not like the zero insertion either. I’ll investigate your idea.

I modified the patch so that characters are exclusively appended to the 
StringBuilder thereby eliminating copying of characters which are already 
present after the insertion point, viz., the padding with zeros after calling 
smallToString() in the recursive path. A delta versus version .01 is at [1] and 
the diff versus the tip at [2].

Thanks,

Brian

[1] http://cr.openjdk.java.net/~bpb/8229845/webrev.01-02/
[2] http://cr.openjdk.java.net/~bpb/8229845/webrev.02/


--
With kind regards,
Ivan Gerasimov



Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath

2019-08-21 Thread Ivan Gerasimov

Hi Julia!

This looks good to me.

For all the other methods from StrictMath that have a corresponding 
method in Math there is a link in javadoc `@see Math#xxx`.


I think, it makes sense to provide such links for all new methods for 
consistency.


With kind regards,

Ivan


On 8/21/19 6:29 AM, Julia Boes wrote:

Hi all,

Thanks for the review. I incorporated the following changes:

- added "@since 14" tag

- removed test code from the CSR specification

- minor fixes in StrictMath.java, ExactArithTests.java as per Ivan's 
comments. The related fix in Math.java is included in this CSR: 
https://bugs.openjdk.java.net/browse/JDK-8229473


Updated webrev: 
http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.02/index.html


Regards,

Julia

On 14/08/2019 20:45, Joe Wang wrote:

Hi Daniel,

That makes sense. So it meant to say "converts the long to int only 
if it's within the range of an integer" (to be "exact"). It may then 
add "throws an exception if not". But the original is fine, just my 2 
cents.


Best,
Joe

On 8/14/19 12:21 PM, Daniel Fuchs wrote:

Hi Joe,

On 14/08/2019 20:15, Joe Wang wrote:

The 2nd part is not necessary as that's what the "@throws" tag is for.


Not withstanding the fact that this is a copy of the API doc from
java.lang.Math, I'd argue that the second part is quite important.
It's why the method is called exact, and is probably the only
reason why you would call that method in the first place.
So I think the 2nd part must be in the synopsis too.

best regards,

-- daniel

On 8/14/19 9:01 AM, Julia Boes wrote:
>
> Hi,
>
> This fix adds decrementExact(), incrementExact(), and 
negateExact() to java.lang.StrictMath. The methods were added to 
java.lang.Math previously [1] and should have been added to 
java.lang.StrictMath for consistency.

>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8229485
>
> Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/
>
> CSR: https://bugs.openjdk.java.net/browse/JDK-8229702
>
> Thanks,
>
> Julia
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8022109





--
With kind regards,
Ivan Gerasimov



Re: RFR 8229958: Provider.getService() scalability issue for legacy algorithms and message digests

2019-08-21 Thread Ivan Gerasimov

Hi Letu!

The fix introduces a read of non-volatile variable legacyChanged outside 
of synchronized block, which is not guaranteed to produce consistent 
results.


(Please note that in the mentioned fix for JDK-7092821 the variable 
servicesChanged was made volatile, so that it could be accessed outside 
the synchronized blocks.)


With kind regards,

Ivan

On 8/20/19 10:18 PM, Yang, Letu wrote:

Hi,

Please review the fix of https://bugs.openjdk.java.net/browse/JDK-8229958 where 
I made the change to allow majority of calls don't have to acquire the locks 
when checking the availability of the Provider object. Similar effort had been 
made in fixing https://bugs.openjdk.java.net/browse/JDK-7092821 , but it only 
helped the calls for new encryption algorithms. Xin had helped me to upload the 
CR: https://cr.openjdk.java.net/~xliu/8229958/webrev/ .

I had run tier-1 tests, and also a JMH test to confirm its performance 
improvement. We have also had it running in a load test environment of an 
application for a couple of days, and the improvement was confirmed as well.

Best regards,
Letu


--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-20 Thread Ivan Gerasimov

Hi Brian!

Would it make sense to add an argument `digits` to smallToString (like 
the same named argument of toString, the minimum number of digits to pad 
to), so it could zero-pad the result itself?


This way we could avoid inserting zeros into the middle of a 
StringBuilder after a call to smallToString.


With kind regards,
Ivan

On 8/20/19 11:51 AM, Brian Burkhalter wrote:

Delta [1] and revised [2] patches:

[1] http://cr.openjdk.java.net/~bpb/8229845/webrev.00-01/
[2] http://cr.openjdk.java.net/~bpb/8229845/webrev.01/

On Aug 20, 2019, at 1:05 AM, Aleksey Shipilev <mailto:sh...@redhat.com>> wrote:


On 8/19/19 10:08 PM, Brian Burkhalter wrote:

[2]http://cr.openjdk.java.net/~bpb/8229845/webrev.00/


Two drive-by comments:

*) It seems the "signum == 0" case in smallToString is regressing? 
Before, it just returned the

constant-pooled "0", now it goes via StringBuilder;


Yes, and I don’t see any way not to have it do so. Note that due to 
lines 3933-3936 in [2] this case will never be hit for values which 
have no more than 20 ints in their magnitude, i.e., those which do not 
recurse in toString(BigInteger,StringBuilder,int,int).



*) This might be "static final", while we are at it:

4109 private static String[] zeros = new String[64];


Above made final at L4111 in [2].

On Aug 20, 2019, at 10:28 AM, Ivan Gerasimov 
mailto:ivan.gerasi...@oracle.com>> wrote:


While we're here.

On 8/20/19 1:05 AM, Aleksey Shipilev wrote:

 *) This might be "static final", while we are at it:

4109 private static String[] zeros = new String[64];


It may make sense to create the only string as

private static final String zeros = "0".repeat(63);


Changed at L4111 in [2].


and then replace all

sb.append(zeros[x]) with sb.append(zeros, 0, x) and


L4006 in [2].


sb.insert(pos, zeros[x]) with sb.insert(pos, zeros, 0, x).


L4049 and L4054 in [2].

The difference would be in one extra bounds check, and as both 
append() and insert() are relatively expensive, this check would 
hardly be noticeable.


Thanks,

Brian


--
With kind regards,
Ivan Gerasimov



RFR 8211360 : Change #if DEF to #if defined(DEF)

2019-08-20 Thread Ivan Gerasimov

Hello!

It's a followup for JDK-8211146.

With that fix several C-preprocessor statements of form #elif __linux__ 
were changed to more accurate #elif defined(__linux__).


grep found a few more occurrences of the same pattern, which also would 
better be cleaned up.


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8211360
WEBREV: http://cr.openjdk.java.net/~igerasim/8211360/00/webrev/

Mach5 control build/testing went fine on all platforms.
Would you please help review?

--
With kind regards,
Ivan Gerasimov



Re: 8229845: Decrease memory consumption of BigInteger.toString()

2019-08-20 Thread Ivan Gerasimov

While we're here.

On 8/20/19 1:05 AM, Aleksey Shipilev wrote:

  *) This might be "static final", while we are at it:

4109 private static String[] zeros = new String[64];


It may make sense to create the only string as

private static final String zeros = "0".repeat(63);

and then replace all

sb.append(zeros[x]) with sb.append(zeros, 0, x) and

sb.insert(pos, zeros[x]) with sb.insert(pos, zeros, 0, x).

The difference would be in one extra bounds check, and as both append() 
and insert() are relatively expensive, this check would hardly be 
noticeable.


--

With kind regards,
Ivan Gerasimov



Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath

2019-08-14 Thread Ivan Gerasimov

A tiny addition:

On 8/14/19 10:56 AM, Ivan Gerasimov wrote:

Hi Julia!

In ExactArithTests.java
1)
169 long neg2 = -((long)x) ;
please remove the superfluous space at the end.

2)
176 long neg2 = (long) x - 1L;
should be `long neg2 = -((long)x);`

In StrictMath.java
3)

while you're here, could you please replace semicolon with comma:
 916  * Returns the value of the {@code long} argument;
 917  * throwing an exception if the value overflows an {@code int}.



In src/java.base/share/classes/java/lang/Math.java
4)

at line 1061
 * Returns the value of the {@code long} argument;
 * throwing an exception if the value overflows an {@code int}.

analogously to (3) above, could you please s/;/,/ ?

Thanks!



With kind regards,
Ivan


On 8/14/19 9:01 AM, Julia Boes wrote:


Hi,

This fix adds decrementExact(), incrementExact(), and negateExact() 
to java.lang.StrictMath. The methods were added to java.lang.Math 
previously [1] and should have been added to java.lang.StrictMath for 
consistency.


Bug: https://bugs.openjdk.java.net/browse/JDK-8229485

Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/

CSR: https://bugs.openjdk.java.net/browse/JDK-8229702

Thanks,

Julia

[1] https://bugs.openjdk.java.net/browse/JDK-8022109



--
With kind regards,
Ivan Gerasimov



Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath

2019-08-14 Thread Ivan Gerasimov

Hi Julia!

In ExactArithTests.java
1)
169 long neg2 = -((long)x) ;
please remove the superfluous space at the end.

2)
176 long neg2 = (long) x - 1L;
should be `long neg2 = -((long)x);`

In StrictMath.java
3)

while you're here, could you please replace semicolon with comma:
 916  * Returns the value of the {@code long} argument;
 917  * throwing an exception if the value overflows an {@code int}.

With kind regards,
Ivan


On 8/14/19 9:01 AM, Julia Boes wrote:


Hi,

This fix adds decrementExact(), incrementExact(), and negateExact() to 
java.lang.StrictMath. The methods were added to java.lang.Math 
previously [1] and should have been added to java.lang.StrictMath for 
consistency.


Bug: https://bugs.openjdk.java.net/browse/JDK-8229485

Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/

CSR: https://bugs.openjdk.java.net/browse/JDK-8229702

Thanks,

Julia

[1] https://bugs.openjdk.java.net/browse/JDK-8022109



--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information

2019-08-13 Thread Ivan Gerasimov

Hello!

A simple benchmark did not detect any negative impact of delegating 
substring(int) to substring(int, int):


Before fix:
Benchmark Mode  Cnt   Score   Error Units
StringSubstring.from26toEnd0  avgt   12  20.279 ± 0.306 ns/op
StringSubstring.from26toEnd1  avgt   12  20.677 ± 1.924 ns/op

After fix:
Benchmark Mode  Cnt   Score   Error Units
StringSubstring.from26toEnd0  avgt   12  19.835 ± 0.135 ns/op
StringSubstring.from26toEnd1  avgt   12  20.784 ± 1.791 ns/op


So here's the updated webrev: 
http://cr.openjdk.java.net/~igerasim/8221307/01/webrev/


Will you approve it to go?

With kind regards,

Ivan


On 8/8/19 6:56 AM, Claes Redestad wrote:

Hi,

perhaps a stupid question, but why isn't String::substring(int) 
calling substring(beginIndex, length())? That'd ensure consistent error

messages, but otherwise preserve semantics.

/Claes

On 2019-08-08 02:17, Ivan Gerasimov wrote:

Hello!

The exception thrown by substring(int) may look confusing.
For example it produces "String index out of range: -1" when the 
index is length+1.


It is proposed to make the error message more clear, similar to what 
we have for substring(int, int).


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307
WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/

Would you please help review?




--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information

2019-08-08 Thread Ivan Gerasimov

Hi Roger!  Thanks for the comments!

On 8/8/19 6:50 AM, Roger Riggs wrote:

Hi Ivan,

To be consistent with other checks of the index, can you use the 
checkIndex(index, length) method?

The message it produces would be appropriate for both cases.

Unfortunately, checkIndex(int, int) wouldn't be appropriate here because 
it only allows index < length, while we need index <= length here.


I think it makes sense to adopt the suggestion from Claes and just 
delegate to substring(int,int).




While you are there can you add a space after the comma
in lines 3680 and 3691.


Yes, done.



Is there an appropriate test?  There should be.


The substring() method is used across many regression tests, so we do 
have sanity checking.


I didn't think it's worth it to check for specific error message.

With kind regards,

Ivan




Thanks, Roger


On 8/7/19 8:17 PM, Ivan Gerasimov wrote:

Hello!

The exception thrown by substring(int) may look confusing.
For example it produces "String index out of range: -1" when the 
index is length+1.


It is proposed to make the error message more clear, similar to what 
we have for substring(int, int).


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307
WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/

Would you please help review?





--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information

2019-08-08 Thread Ivan Gerasimov

Hi Claes!

On 8/8/19 6:56 AM, Claes Redestad wrote:

Hi,

perhaps a stupid question, but why isn't String::substring(int) 
calling substring(beginIndex, length())? That'd ensure consistent error

messages, but otherwise preserve semantics.


It's a good point, actually!

I suspect that originally substring(int) had a separate implementation 
for performance reason.


I agree, it would make most sense to implement one via another unless 
there's a performance penalty.


Let me do some measurements, and then I'll report back.

With kind regards,

Ivan



/Claes

On 2019-08-08 02:17, Ivan Gerasimov wrote:

Hello!

The exception thrown by substring(int) may look confusing.
For example it produces "String index out of range: -1" when the 
index is length+1.


It is proposed to make the error message more clear, similar to what 
we have for substring(int, int).


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307
WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/

Would you please help review?




--
With kind regards,
Ivan Gerasimov



RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information

2019-08-07 Thread Ivan Gerasimov

Hello!

The exception thrown by substring(int) may look confusing.
For example it produces "String index out of range: -1" when the index 
is length+1.


It is proposed to make the error message more clear, similar to what we 
have for substring(int, int).


BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307
WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/

Would you please help review?

--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8228352 : CANON_EQ breaks when pattern contains supplementary codepoint

2019-08-01 Thread Ivan Gerasimov

Thank you Naoto for the review!

With kind regards,

Ivan


On 8/1/19 10:02 AM, naoto.s...@oracle.com wrote:

Hi Ivan,

The change looks good to me.

Naoto

On 7/31/19 9:21 PM, Ivan Gerasimov wrote:

Hello!

Pattern.compile fails with IOOB when normalizing a pattern that 
contains a surrogate pair.


Would you please help review a trivial 1-line fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8228352
WEBREV: http://cr.openjdk.java.net/~igerasim/8228352/00/webrev/




--
With kind regards,
Ivan Gerasimov



RFR (XS) 8228352 : CANON_EQ breaks when pattern contains supplementary codepoint

2019-07-31 Thread Ivan Gerasimov

Hello!

Pattern.compile fails with IOOB when normalizing a pattern that contains 
a surrogate pair.


Would you please help review a trivial 1-line fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8228352
WEBREV: http://cr.openjdk.java.net/~igerasim/8228352/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: 8193072: File.delete() should remove its path from DeleteOnExitHook.files

2019-07-10 Thread Ivan Gerasimov



On 7/10/19 5:17 PM, Brian Burkhalter wrote:

I incorporated Peter’s version, adding the security check in 
cancelDeleteOnExit(), tweaking its verbiage along with that of deleteOnExit(), 
and modified the test DeleteOnExit to verify the new method. The updated 
version is here:

http://cr.openjdk.java.net/~bpb/8193072/webrev.03/ 
<http://cr.openjdk.java.net/~bpb/8193072/webrev.03/>

There is possibility of a race here in a scenario like this:

File dir = new File("dir");
File file = new File("dir/file");

-- thread 1 --
dir.deleteOnExit();
file.deleteOnExit();
///
dir.cancelDeleteOnExit();
  thread 2 intervenes
file.cancelDeleteOnExit();
-- end --

-- thread 2 --
dir.deleteOnExit();
file.deleteOnExit();
-- end --

The result will be that the order of the registered files will change, 
and JVM will try to delete dir first (this will fail as it is not empty).


Of course it could be avoided, if cancellation were done in reverse 
order, though it's not immediately obvious from the documentation.


With kind regards,
Ivan

Thanks,

Brian


On Jul 10, 2019, at 11:17 AM, Brian Burkhalter  
wrote:

On Jul 10, 2019, at 5:36 AM, Peter Levart  wrote:

There are various interleavings of threads that could cause the file to be left 
undeleted when VM exits.

To cover concurrent scenarios such as above, the code could use reference 
counting. Like in the following patch:

http://cr.openjdk.java.net/~plevart/jdk-dev/8193072_File.undoDeleteOnExit/webrev.01/ 
<http://cr.openjdk.java.net/~plevart/jdk-dev/8193072_File.undoDeleteOnExit/webrev.01/>

This looks good to me modulo adding this
SecurityManager security = System.getSecurityManager();
if (security != null) {
security.checkDelete(path);
}
to cancelDeleteOnExit() as suggested below.


On Jul 10, 2019, at 7:51 AM, Sean Mullan  wrote:

On 7/9/19 7:40 PM, Brian Burkhalter wrote:

I don’t know. On the one hand this does not take an action like reading, 
writing, or deleting, but on the other it could end up causing files to be left 
lying around after VM termination which were expected to be deleted. I suppose 
that could be considered to be some sort of security issue.

Yes I think so.

I would probably just use the same permission (FilePermission(file,"delete")). 
If you have been granted permission to delete a file, then you should have permission to 
cancel that deletion as well.

That’s a  good idea.


--
With kind regards,
Ivan Gerasimov



Re: 8193072: File.delete() should remove its path from DeleteOnExitHook.files

2019-07-08 Thread Ivan Gerasimov

Hi Brian!

I believe this would introduce a behavior change in a scenario lile:
File f = ...;
f.deleteOnExit();
f.delete();
f.createNewFile();

I.e. when the with the same name is recreated.  Current behavior is to 
unlink such a file before exiting, no matter if it were explicitly 
deleted and then recreated or not.


With kind regards,
Ivan


On 7/8/19 1:11 PM, Brian Burkhalter wrote:

https://bugs.openjdk.java.net/browse/JDK-8193072 
<https://bugs.openjdk.java.net/browse/JDK-8193072>

There does appear to be a memory leak of sorts if one does something like

—

File[] files;
for (int i = 0; i < largeNumber; i++) {
 files[i] = File.createTempFile(“blah”, null);
 files[i].deleteOnExit();
}

// do something

for (int i = 0; i < largeNumber; i++) {
 files[i].delete();
}

// do something else before shutdown

—

The LinkedHashSet in DeleteOnExitHook will contain at least largeNumber Files 
until the VM shuts down even though the files were deleted.

The potential change is included below. The additional call to 
DeleteOnExitHook.remove() in File.delete() does not appear to have a measurable 
performance impact, at least trivially and in isolation.

Thanks,

Brian

--- a/src/java.base/share/classes/java/io/DeleteOnExitHook.java
+++ b/src/java.base/share/classes/java/io/DeleteOnExitHook.java
@@ -64,6 +64,15 @@
  files.add(file);
  }
  
+static synchronized void remove(String file) {

+if(files == null) {
+// DeleteOnExitHook is running. Too late to remove a file
+throw new IllegalStateException("Shutdown in progress");
+}
+
+files.remove(file);
+}
+
  static void runHooks() {
  LinkedHashSet theFiles;
  
--- a/src/java.base/share/classes/java/io/File.java

+++ b/src/java.base/share/classes/java/io/File.java
@@ -1050,6 +1050,7 @@
  if (isInvalid()) {
  return false;
  }
+DeleteOnExitHook.remove(path);
  return fs.delete(this);
  }



--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups

2019-06-28 Thread Ivan Gerasimov

Thank you Stuart for review and the edits!


On 6/28/19 4:30 PM, Stuart Marks wrote:

On 6/27/19 2:43 PM, Ivan Gerasimov wrote:

Yes, good point about the spec change.

Here's the CSR filed: https://bugs.openjdk.java.net/browse/JDK-8226914

Can you please review it at your convenience?


Thanks for filing this. I've made some minor edits and have marked it 
reviewed.


Note that when filing CSRs there are a few fields that need to be 
filled in but that aren't normally visible when viewing the issue; you 
have to press the Edit button to see them. These fields include 
Compatibility Risk, Compatibility Risk Description, Scope, and 
Reviewed By. I've filled these in appropriately.


I think this is OK for you to mark Finalized now.

Thanks,

s'marks



--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups

2019-06-27 Thread Ivan Gerasimov

Hi Stuart!


On 6/27/19 2:25 PM, Stuart Marks wrote:

Hi Ivan,

I think this fix is correct and clearly the spec needed to be fixed. 
Since it is a change to the normative portion of the spec, though, it 
needs a CSR. This is probably mostly a formality to make sure the 
change is tracked for TCK purposes. Can you file a CSR for this please?


Let me know if you need a review for it.


Yes, good point about the spec change.

Here's the CSR filed: https://bugs.openjdk.java.net/browse/JDK-8226914

Can you please review it at your convenience?

With kind regards,
Ivan


Thanks,

s'marks




--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups

2019-06-25 Thread Ivan Gerasimov

Thank you Brent for review!


On 6/25/19 10:40 AM, Brent Christian wrote:

That looks fine, Ivan.

-Brent

On 6/24/19 10:46 PM, Ivan Gerasimov wrote:

Hello!

Would someone volunteer to review this extra-small doc-only fix?

Thanks in advance!

With kind regards,

Ivan


On 5/28/19 9:33 AM, Ivan Gerasimov wrote:

Hello!

When the fix for JDK-7039066 (which added support for the flag 
UNICODE_CHARACTER_CLASS and corresponding embedded expression (?U)) 
was integrated back in JDK 7, the documentation for the 
non-capturing groups was not updated accordingly.


Would you please help review the fix, which only updates the javadoc?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224849
WEBREV: http://cr.openjdk.java.net/~igerasim/8224849/00/webrev/

Thanks in advance!







--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count

2019-06-25 Thread Ivan Gerasimov

Hello!

On 6/25/19 3:33 PM, Joseph D. Darcy wrote:

Hello,

Please file a CSR for this change; thanks,



Filed: https://bugs.openjdk.java.net/browse/JDK-8226784

Please take a look at your convenience.

With kind regards,
Ivan


-Joe

On 6/25/2019 11:30 AM, Brent Christian wrote:
I was musing about this myself.  If the changes are within the 
@implNote(s), then perhaps not?


-Brent

On 6/25/19 11:15 AM, Brian Burkhalter wrote:
>

Is a CSR in order?

Thanks,

Brian

On Jun 25, 2019, at 5:49 AM, Ivan Gerasimov 
 wrote:


Hello!

Would someone volunteer to review this extra-small doc-only fix?

Thanks in advance!

With kind regards,
Ivan


On 5/29/19 2:39 PM, Ivan Gerasimov wrote:

Hello!

In the implNote section of the javadoc for xxxSummaryStatistics 
classes it is mentioned that the sum is not checked for overflow.
It would be more accurate to add that neither does the 
implementation check the count for overflow.


Would you please help review this doc-only change?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716
WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/



--
With kind regards,
Ivan Gerasimov







--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count

2019-06-25 Thread Ivan Gerasimov

Hello!

Would someone volunteer to review this extra-small doc-only fix?

Thanks in advance!

With kind regards,
Ivan


On 5/29/19 2:39 PM, Ivan Gerasimov wrote:

Hello!

In the implNote section of the javadoc for xxxSummaryStatistics 
classes it is mentioned that the sum is not checked for overflow.
It would be more accurate to add that neither does the implementation 
check the count for overflow.


Would you please help review this doc-only change?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716
WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/



--
With kind regards,
Ivan Gerasimov



Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups

2019-06-24 Thread Ivan Gerasimov

Hello!

Would someone volunteer to review this extra-small doc-only fix?

Thanks in advance!

With kind regards,

Ivan


On 5/28/19 9:33 AM, Ivan Gerasimov wrote:

Hello!

When the fix for JDK-7039066 (which added support for the flag 
UNICODE_CHARACTER_CLASS and corresponding embedded expression (?U)) 
was integrated back in JDK 7, the documentation for the non-capturing 
groups was not updated accordingly.


Would you please help review the fix, which only updates the javadoc?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224849
WEBREV: http://cr.openjdk.java.net/~igerasim/8224849/00/webrev/

Thanks in advance!



--
With kind regards,
Ivan Gerasimov



Re: RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet

2019-06-06 Thread Ivan Gerasimov

On 6/6/19 8:39 AM, Andrew Haley wrote:


On 6/6/19 12:35 PM, Ivan Gerasimov wrote:

Could you please check it if it looks better now?

http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/

OK, it's just a hint, fair enough. Patch is OK.

Thank you!


For an extra gold star (:-) you could consider adding a comment like
"avoid overflow in the case of a humongous numBits". There's no need
for another review if you do so.

You got it!
Pushed with the suggested comment added.

With kind regards,
Ivan



Re: RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet

2019-06-06 Thread Ivan Gerasimov

Thank you Andrew!

The multiplier 6 was pre-existent, and I don't really want to change 
behavior in this legacy class.


The high limit of (Integer.MAX_VALUE - 20) was arbitrarily chosen.

I don't think it matters much in this case, but I can change it to 
(Integer.MAX_VALUE - 8), which is used in several other places.


I've just updated the code in-place to make the intent of the fix more 
clear.


Could you please check it if it looks better now?

http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/

Thanks in advance!

Ivan



On 6/6/19 3:57 AM, Andrew Haley wrote:

On 6/6/19 10:18 AM, Ivan Gerasimov wrote:

Hello!

It is yet another instance of integer overflow under certain extreme
circumstances.

This time it is when calculating the initial capacity of a StringBuilder
in BitSet.toString.

If there are too many elements in the set, we can't do much anyway.

The best effort is to avoid confusing NegativeArraySizeException and let
the method throw OOM.

Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225397
WEBREV: http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/

@@ -1184,7 +1184,9 @@

  int numBits = (wordsInUse > 128) ?
  cardinality() : wordsInUse * BITS_PER_WORD;
-StringBuilder b = new StringBuilder(6*numBits + 2);
+int sizeHint = (numBits <= (Integer.MAX_VALUE - 22) / 6) ?
+6 * numBits + 2 : Integer.MAX_VALUE - 20;
+StringBuilder b = new StringBuilder(sizeHint);
  b.append('{');

  int i = nextSetBit(0);

This needs a comment. What is significant about 6 and 22?



--
With kind regards,
Ivan Gerasimov



RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet

2019-06-06 Thread Ivan Gerasimov

Hello!

It is yet another instance of integer overflow under certain extreme 
circumstances.


This time it is when calculating the initial capacity of a StringBuilder 
in BitSet.toString.


If there are too many elements in the set, we can't do much anyway.

The best effort is to avoid confusing NegativeArraySizeException and let 
the method throw OOM.


Would you please help review the fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225397
WEBREV: http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR: 8225179: (regex) Minor Pattern cleanup

2019-06-04 Thread Ivan Gerasimov

Hi Claes!

Looks good to me, thanks!

LookBehindNode may be better named LookBehindEndNode, as it should only 
match at the very end of the look-behind token.


With kind regards,

Ivan


On 6/4/19 1:58 AM, Claes Redestad wrote:

Hi,

please review this j.u.regex.Pattern cleanup.

- refactor BitClass to be a BmpCharPredicate (which allows removing
two identical(!) lambdas), which improves startup and reduces
allocations when compiling Patterns.
- remove unused GroupRef class
- made anonymous lookbehindEnd Node instance into an explicit class
which will be lazily rather than eagerly loaded
- various cleanups of unused variables, methods and redundant
inititialization

Webrev: http://cr.openjdk.java.net/~redestad/8225179/open.00/
Bug:https://bugs.openjdk.java.net/browse/JDK-8225179

Testing: tier1-3

Thanks!

/Claes



--
With kind regards,
Ivan Gerasimov



Re: RFR 8225198 : Optimize regex tree for greedy quantifiers of type {N, }

2019-06-04 Thread Ivan Gerasimov

Thank you Claes and Brent for reviewing!

I'll edit the comment as suggested before pushing.

And yes, we do have a few testcases to verify this optimization in 
RegExTest.java / BMPTestCases.txt.


With kind regards,

Ivan


On 6/4/19 2:30 PM, Brent Christian wrote:

Hi, Ivan

The change looks fine.

I would call this "noreg-cleanup".  Tests are needed to confirm the 
correctness of the new code, which I imagine we already have.



One small suggestion:

src/java.base/share/classes/java/util/regex/Pattern.java


4271  * and unlimited maximum, for *, + and {N,} quantifiers.


If the maximum is still MAX_REPS, I'd prefer to continue to say that 
(rather than, "unlimited.")


Thanks.
-Brent

On 6/3/19 12:54 PM, Ivan Gerasimov wrote:

Hello!

When building a match-tree, the regex engine optimizes '*' and '+' 
greedy quantifiers by using special node of type CharPropertyGreedy.


This later class can be used unmodified for other greedy quantifiers 
without the upper limit, i.e. of type '{N,}'.


Would you please help review this simple optimization?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225198
WEBREV: http://cr.openjdk.java.net/~igerasim/8225198/00/webrev/

(With the fix, one unused class was also removed.  This class was to 
implement conditionals, which were never supported by Java regexs.)






--
With kind regards,
Ivan Gerasimov



RFR 8225198 : Optimize regex tree for greedy quantifiers of type {N, }

2019-06-03 Thread Ivan Gerasimov

Hello!

When building a match-tree, the regex engine optimizes '*' and '+' 
greedy quantifiers by using special node of type CharPropertyGreedy.


This later class can be used unmodified for other greedy quantifiers 
without the upper limit, i.e. of type '{N,}'.


Would you please help review this simple optimization?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225198
WEBREV: http://cr.openjdk.java.net/~igerasim/8225198/00/webrev/

(With the fix, one unused class was also removed.  This class was to 
implement conditionals, which were never supported by Java regexs.)


--
With kind regards,
Ivan Gerasimov



RFR 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count

2019-05-29 Thread Ivan Gerasimov

Hello!

In the implNote section of the javadoc for xxxSummaryStatistics classes 
it is mentioned that the sum is not checked for overflow.
It would be more accurate to add that neither does the implementation 
check the count for overflow.


Would you please help review this doc-only change?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716
WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/

--
With kind regards,
Ivan Gerasimov



Re: RFR: 8224986: (str) optimize StringBuilder.append(CharSequence, int, int) for String arguments

2019-05-29 Thread Ivan Gerasimov

Hi Claes!

It looks like for the cases when this.isLatin1() != s.isLatin1() the 
code is essentially the same as in appendChars(CharSequence, int, int).
So, to save some byte codes (which can actually help with inlining), it 
might have been written as:


private final void appendChars(String s, int off, int end) {
if (isLatin1() ^ s.isLatin1()) {
appendChars((CharSequence)s, off, end);
} else {
System.arraycopy(s.value(), off << coder, this.value, 
this.count << coder, (end - off) << coder);

count += end - off;
}
}

With kind regards,
Ivan

On 5/29/19 8:36 AM, Claes Redestad wrote:

Hi,

it's been pointed out[1] that append(string.substring(start, end)) can
outperform append(string, start, end) in some circumstances. In part due
the former being intrinsified, but also I think due it tickling some
string concat optimizations that allow for eliding the StringBuilder
itself completely when there's only one non-constant argument.

While not ruling out intrinsification of append(String, int, int), I was
able to achieve a reasonable speed-up by peeling off String arguments
to append(CharSequence, int, int) and dropping into a routine that uses
System.arraycopy when appropriate.

This seems like a reasonable improvement in the interim, and an
opportunity to incorporate the relevant microbenchmarks.

Bug:https://bugs.openjdk.java.net/browse/JDK-8224986
Webrev: http://cr.openjdk.java.net/~redestad/8224986/open.00/

I extended and incorporated the microbenchmark provided in [1] to also 
include variants with UTF16 Strings and mixes of arguments, ensuring

speedups on all appendBounds* variants[2].

Thanks!

/Claes

[1] 
http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/057834.html

[2]
Baseline:

Benchmark(length)  Mode  Cnt Score Error Units
appendBounds 1000  avgt   10   576.939 ±  21.125 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  2144.000 ±   0.001 B/op
appendBoundsMix  1000  avgt   10   855.308 ±  27.071 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  3160.000 ±   0.001 B/op
appendBoundsUtf161000  avgt   10  1424.518 ±  33.319 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  5192.000 ±   0.001 B/op

Patch:

Benchmark(length)  Mode  Cnt Score Error Units
appendBounds 1000  avgt   10   466.640 ±  15.069 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  2120.000 ±   0.001 B/op
appendBoundsMix  1000  avgt   10   758.886 ±  22.983 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  3160.000 ±   0.001 B/op
appendBoundsUtf161000  avgt   10  1320.360 ±  49.097 ns/op
 ·gc.alloc.rate.norm 1000  avgt   10  5192.000 ±   0.001 B/op



--
With kind regards,
Ivan Gerasimov



Re: RFR 8224789 : Parsing repetition count in regex does not detect numeric overflow

2019-05-28 Thread Ivan Gerasimov

Hi Brent!


On 5/28/19 4:06 PM, Brent Christian wrote:

Hi, Ivan

I agree with Roger that there are more test cases than necessary. 
Otherwise I think it looks pretty good.


Okay.  Then let's make the list of invalid ranges shorter, but add a 
randomly generated value!


Please find the updated webrev here:
http://cr.openjdk.java.net/~igerasim/8224789/01/webrev/

With kind regards,
Ivan

I find the addExact/multiplyExact code less readable. I'm not sure 
what could be done about it - maybe some different indentation:


cmin = Math.addExact(
 Math.multiplyExact(cmin, 10),
 ch - '0');

though that makes for some long lines.  Just a thought.

Thanks,
-Brent

On 5/24/19 11:28 PM, Ivan Gerasimov wrote:

Hello!

When Pattern.compile() parses the repetition count in the expressions 
like '.{100}', '.{1,2}' or '.{3,}' it fails to detect numeric 
overflow if the result is still non-negative.


Could you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224789
WEBREV: http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/

Also, reading a char at line 3274 is done with skip(), so the 
exception thrown at 3315 displays the position of the error more 
accurately.






--
With kind regards,
Ivan Gerasimov



Re: RFR 8224789 : Parsing repetition count in regex does not detect numeric overflow

2019-05-28 Thread Ivan Gerasimov

Thank you Roger for reviewing!


On 5/28/19 9:33 AM, Roger Riggs wrote:

Hi Ivan,

test/jdk/java/util/regex/RegExTest.java:

4954: The test should print the failing exception information and
4951: a message if the Pattern.compile does not fail to distinguish 
that failure from any others.



Yes, it will be useful, if the test fails.
So, I added some debug output.

The webrev was updated in place:
http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/

I don't think there is a need for so many cases of values > 2^31; 
there is no different code path for all those values.

They are fairly cheap, but redundant.

I wanted to cover cases when the number would overflow 64-bit integer, 
so that the test would fail if the count were implemented as a long.


Also, I tried to cover different configurations of the quantifier {N}, 
{N,}, {N,M} with both N and M overflowing int, unsigned int, long, 
unsigned long.


With kind regards,
Ivan


Otherwise, looks fine.

Thanks, Roger


On 05/25/2019 02:28 AM, Ivan Gerasimov wrote:

Hello!

When Pattern.compile() parses the repetition count in the expressions 
like '.{100}', '.{1,2}' or '.{3,}' it fails to detect numeric 
overflow if the result is still non-negative.


Could you please help review the patch?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224789
WEBREV: http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/

Also, reading a char at line 3274 is done with skip(), so the 
exception thrown at 3315 displays the position of the error more 
accurately.







--
With kind regards,
Ivan Gerasimov



  1   2   3   4   5   6   7   8   9   >