Re: RFR [15] 8242230: Whitespace typos, relaxed javadoc, formatting
Looks good to me, thanks! Kind regards, Ivan On 4/7/20 8:28 AM, Pavel Rappo wrote: Hi Ivan, On 7 Apr 2020, at 09:11, Ivan Gerasimov wrote: Hi Pavel! A couple of comments. 1) java/util/logging/Formatter.java This has one extra open curly brace: "{{@literal }" The leftmost curly brace is not a part of the "literal" inline tag, but rather a part of the message format string. Have a look at the method that the comment belongs to. Note what that method is looking for in a message string. 2) grep finds some more typos of the same kind that you've spotted. a) rgrep '^[ ]*\*'|grep ' ,'|less This find number of potential typos. For example, the javadoc for VarHandle [1] has 53 occurrances of space-before-comma. A few more are found in j.l.Thread, j.io.DataOutput, j.l.String, etc. b) rgrep '^[ ]*\*'|grep '\w- '|less This find the word 'network' broken with a hyphen at [2] and also in share/classes/sun/net/util/IPAddressUtil.java (I only searched under src/java.base) Thanks. I've included some of those: true positive, non-controversial, and significant cases where I believe I sufficiently understood context. For instance, I was not sure if I reliably grasped the applicability of the "statically represented using varargs" phrase used widely across the VarHandle type. It looks like that phrase was blankedly added to @return and @return tags. So, I left it out for now. The updated patch can be found here: http://cr.openjdk.java.net/~prappo/8242230/webrev.01/ *** Some of the cases this patch addresses suggest that we might need to do something about how the Standard Doclet treats newline characters in source files. More often than not, newline characters are purely to control the width of the source lines. When carried over to the output, they may produce undesirable effects. Punctuation marks and contents of the Standard Doclet tags may be affected. That problem [1] is neither new nor trivial to address. Still, we should add something to the Standard Doclet Specification [2]. I'm not sure what we can do about it now other than work around the current behavior where it is significant. -Pavel P.S. CC'ing to javadoc-...@openjdk.java.net --- [1] https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap [2] https://docs.oracle.com/en/java/javase/14/docs/specs/javadoc/doc-comment-spec.html With kind regards, Ivan [1] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/invoke/VarHandle.html [2] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/Inet4Address.html On 4/6/20 11:28 AM, Pavel Rappo wrote: Hello, Please review the change for https://bugs.openjdk.java.net/browse/JDK-8242230: http://cr.openjdk.java.net/~prappo/8242230/webrev.00/ This is a documentation cleanup. There are no code changes involved, and the changes in documentation are mostly trivial. The following packages are affected: java.lang, java.text, java.util, java.util.logging, javax.lang.model.util, jdk.internal.reflect -Pavel -- With kind regards, Ivan Gerasimov -- With kind regards, Ivan Gerasimov
Re: RFR [15] 8242230: Whitespace typos, relaxed javadoc, formatting
Hi Pavel! A couple of comments. 1) java/util/logging/Formatter.java This has one extra open curly brace: "{{@literal }" 2) grep finds some more typos of the same kind that you've spotted. a) rgrep '^[ ]*\*'|grep ' ,'|less This find number of potential typos. For example, the javadoc for VarHandle [1] has 53 occurrances of space-before-comma. A few more are found in j.l.Thread, j.io.DataOutput, j.l.String, etc. b) rgrep '^[ ]*\*'|grep '\w- '|less This find the word 'network' broken with a hyphen at [2] and also in share/classes/sun/net/util/IPAddressUtil.java (I only searched under src/java.base) With kind regards, Ivan [1] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/invoke/VarHandle.html [2] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/net/Inet4Address.html On 4/6/20 11:28 AM, Pavel Rappo wrote: Hello, Please review the change for https://bugs.openjdk.java.net/browse/JDK-8242230: http://cr.openjdk.java.net/~prappo/8242230/webrev.00/ This is a documentation cleanup. There are no code changes involved, and the changes in documentation are mostly trivial. The following packages are affected: java.lang, java.text, java.util, java.util.logging, javax.lang.model.util, jdk.internal.reflect -Pavel -- With kind regards, Ivan Gerasimov
Re: [15] RFR 8241727 : Typos: empty lines in javadoc, inconsistent indents, etc. (core-libs only)
Thank you Pavel! Pushed. On 3/27/20 5:09 AM, Pavel Rappo wrote: 140 a successful * query to a constant will always remain successful. Made me pause and read a bit of the surrounding context to make sure there's no "*" type of query. Looks good to me. Thanks for doing this. -Pavel On 27 Mar 2020, at 10:26, Ivan Gerasimov wrote: Hello! This is a javadoc-only fix. There are a few changes that will actually be visible (for example [1], [2]), but the vast majority of changes are to remove redundant empty lines, correct indentation, or otherwise restore harmony. Would you please help review this rather technical fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8241727 WEBREV: http://cr.openjdk.java.net/~igerasim/8241727/00/webrev/ Thank in advance! [1] Extra '+' in text: https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/io/ObjectInputStream.html#readObjectOverride() [2] Extra 'I' in text: https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/annotation/Annotation.html#hashCode() -- With kind regards, Ivan Gerasimov -- With kind regards, Ivan Gerasimov
[15] RFR 8241727 : Typos: empty lines in javadoc, inconsistent indents, etc. (core-libs only)
Hello! This is a javadoc-only fix. There are a few changes that will actually be visible (for example [1], [2]), but the vast majority of changes are to remove redundant empty lines, correct indentation, or otherwise restore harmony. Would you please help review this rather technical fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8241727 WEBREV: http://cr.openjdk.java.net/~igerasim/8241727/00/webrev/ Thank in advance! [1] Extra '+' in text: https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/io/ObjectInputStream.html#readObjectOverride() [2] Extra 'I' in text: https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/annotation/Annotation.html#hashCode() -- With kind regards, Ivan Gerasimov
Re: RFR 8237599 : Greedy matching against supplementary chars fails to respect the region
Thank you Roger for review! On 3/25/20 6:56 AM, Roger Riggs wrote: Hi Ivan, Looks fine. Interesting edge case, would never be seen with 8 bit charsets. Thanks, Roger On 3/21/20 3:15 AM, Ivan Gerasimov wrote: Gentle ping. The webrev was rebased to accommodate recent changes in RegExTest.java. The fix is to handle an edge case situation, which is supposedly not too common. Nevertheless, I think, it is important to handle it correctly. Thanks in advance! Ivan On 1/22/20 8:23 PM, Ivan Gerasimov wrote: Hello everyone! When the input of a j.u.regex.Matcher is restricted with .region() method, it can possibly cut off a half of a surrogate pair. It turns out that greedy matching implemented in the Pattern.CharPropertyGreedy class fails to recognize this edge case in two scenarios: 1) When it greedily consumes the input and meets a higher half of a surrogate pair that was cut off at the end of input, and 2) When it backs off and meets a lower half of a surrogate pair at the very beginning of input. In both cases, the engine reads the entire codepoint, crossing the boundaries of the set region. Instead, it should only read the half of the surrogate pair that lies inside the region and ignore the other half. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599 WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8237599 : Greedy matching against supplementary chars fails to respect the region
Gentle ping. The webrev was rebased to accommodate recent changes in RegExTest.java. The fix is to handle an edge case situation, which is supposedly not too common. Nevertheless, I think, it is important to handle it correctly. Thanks in advance! Ivan On 1/22/20 8:23 PM, Ivan Gerasimov wrote: Hello everyone! When the input of a j.u.regex.Matcher is restricted with .region() method, it can possibly cut off a half of a surrogate pair. It turns out that greedy matching implemented in the Pattern.CharPropertyGreedy class fails to recognize this edge case in two scenarios: 1) When it greedily consumes the input and meets a higher half of a surrogate pair that was cut off at the end of input, and 2) When it backs off and meets a lower half of a surrogate pair at the very beginning of input. In both cases, the engine reads the entire codepoint, crossing the boundaries of the set region. Instead, it should only read the half of the surrogate pair that lies inside the region and ignore the other half. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599 WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR [15] 8241014: Miscellaneous typos in documentation comments
Thank you Paul! grep found a few more occurrences of 'equals to' across java.base, so I fixed them as well. Here's the updated webrev: http://cr.openjdk.java.net/~igerasim/XXX-typos/01/webrev/ Pavel, I checked your portion of correction, everything looks good to me! One minor nit: In src/java.base/share/classes/java/util/StringJoiner.java can you please wrap the modified line, so it won't be that long? With kind regards, Ivan On 3/20/20 10:16 AM, Paul Sandoz wrote: --- a/src/java.base/share/classes/java/lang/invoke/MethodType.java +++ b/src/java.base/share/classes/java/lang/invoke/MethodType.java @@ -1379,12 +1379,12 @@ /** * This implementation returns {@code true} if {@code obj} is another - * {@code WeakEntry} whose referent is equals to this referent, or - * if {@code obj} is equals to the referent of this. This allows + * {@code WeakEntry} whose referent equals to this referent, or + * if {@code obj} equals to the referent of this. This allows * lookups to be made without wrapping in a {@code WeakEntry}. * * @param obj the object to compare - * @return true if {@code obj} is equals to this or the referent of this + * @return true if {@code obj} equals to this or the referent of this * @see MethodType#equals(Object) * @see Object#equals(Object) Use either: whose referent is equal to this referent, or whose referent equals this referent, The former is easier just delete the ’s’. Other bits look good. Paul. On Mar 13, 2020, at 7:03 PM, Ivan Gerasimov wrote: Hi Pavel! Can this please be combined with my collection of typos? http://cr.openjdk.java.net/~igerasim/XXX-typos/00/webrev/ Just to save cycles on reviewing :) With kind regards, Ivan On 3/13/20 8:42 AM, Pavel Rappo wrote: Hello, Please review the change for https://bugs.openjdk.java.net/browse/JDK-8241014: http://cr.openjdk.java.net/~prappo/8241014/webrev.00/ This is a documentation cleanup. There are no code changes involved, and the changes in documentation are mostly trivial. The following packages are affected: java.lang, java.nio.file, java.nio.file.attribute, java.security, java.time.chrono, java.time.temporal, java.util, java.util.regex, java.util.stream, javax.crypto, javax.security.cert, javax.tools That said, there are two changes that I'd prefer to be carefully reviewed by the experts in the corresponding areas. The first one is for a suspected typo in the javax.crypto.CryptoPolicyParser class, "AlgrithomParameterSpec". It is not unheard-of for typos to be kept and supported for the sake of backward compatibility. Sadly, we have a number of those in OpenJDK. Even though I performed reasonable checks, the proposed fix should better be verified by the security folk. The second one is for the doc comment for the java.util.stream.Stream.collect method. @apiNote The following will accumulate strings into an ArrayList: List asList = stringStream.collect(Collectors.toList()); Given that the spec for Collectors.toList() clearly says that ...There are no guarantees on the type, mutability, serializability, or thread-safety of the List returned;... I'd assume that @apiNote should be fixed as proposed. -Pavel P.S. Apologies for spamming multiple mailing lists. -- With kind regards, Ivan Gerasimov -- With kind regards, Ivan Gerasimov
Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes
Thank you Roger! Pushed. With kind regards, Ivan On 3/17/20 5:20 PM, Roger Riggs wrote: Hi Ivan, I see the CSR is approved. I'm fine with the change. Regards, Roger On 2/25/20 3:18 PM, Ivan Gerasimov wrote: Thank you Roger for reviewing CSR and the release note! On 2/11/20 12:49 PM, Roger Riggs wrote: Hi Ivan, Will this have enough of a compatibility concern to warrant a CSR? It will change the behavor of these cases. In the RegExTest, the failures should print which case is failing. (Line 4961, 4990). I agree that many testcases in RegExTest could provide better diagnostics in a case of a failure. I think, it maybe done as a separate cleanup. In the added testcase I made sure that both the input string and the pattern are printed upon failure. With kind regards, Ivan Regards, Roger On 2/7/20 3:05 PM, Ivan Gerasimov wrote: Gentle ping. I had to rebase the fix, as the code has diverged since the RFR was sent out 10 months ago. Also, the test was slightly modified to cover more cases. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/ Thanks in advance to the volunteer to review the fix! With kind regards, Ivan On 4/21/19 7:50 PM, Ivan Gerasimov wrote: Hello! It turns out, that the case-insensitive j.u.regex.Pattern still pays attention to the characters case when certain char classes are used. For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} continue to recognize only lower, upper and title case characters, even in case-insensitive context. For example, for POSIX char classes this behavior contradicts this paragraph: """ 9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. This definition of case-insensitive processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the string is matched using both its cases. ... """ I also checked how Perl is dealing with in such situation, and yes, it ignores the case with various \p{} classes when they are used in case-insensitive context, so all these tests run fine: 'A' =~ /\p{Lower}/i or die; 'a' =~ /\p{Upper}/i or die; 'A' =~ /\p{gc=Lt}/i or die; # title case 'a' =~ /\p{IsTitlecase}/i or die; 'Lj' =~ /\p{Lower}/i or die; # title-cased digraph 'lj' =~ /\p{Upper}/i or die; 'LJ' =~ /\p{Lt}/i or die; For reference, here's a lengthy document, describing precise rules used by Perl to deal with \p{} char classes: https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d So, for any Lower, Upper or Title case chars in case-insensitive context Perl uses set of "Cased Letters", with is just a combination of these three categories (aka "LC" general category). Would you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8196334: Optimize UUID#fromString
Sounds reasonable. I think the current proposal is clearly a progress, so it looks good. I agree that any further optimization can be considered within a separate enhancement request. With kind regards, Ivan On 2/28/20 2:59 PM, Claes Redestad wrote: Hi Ivan, I've considered trying it out, but it gets complicated as we don't want to penalize -XX:-UseCompactStrings. I think adding special methods for UUID inside StringLatin1 is a step in the wrong direction, but maybe there's something more generic we can consider. However, short of adding String.value()+coder() access via JavaLangAccess I'm not sure there's anything that'd net us any significant gain. And I don't think it's worth it to open up that particular can of worms right now just for this. /Claes On 2020-02-28 20:51, Ivan Gerasimov wrote: Hi Claes and Andriy! It looks good overall. I wonder if it'll be even faster if the fast path were implemented in StringLatin1 (for compact strings only) and was called via SharedSecrets/JavaLangAccess? Then, you could operate directly on bytes and avoid dealing with longs. With kind regards, Ivan On 2/28/20 6:22 AM, Claes Redestad wrote: Hi all, please review this patch to optimize UUID#fromString. Jon Chambers originally proposed a patch that used a strict parser to get a similar speed-up, but I failed to adapt it in a way that could fall back to the less strict behavior while maintaining a reasonable speed-up in the fast-path case. Sorry, Jon! The patch proposed here was recently contributed by Andriy Plokhotnyuk (OCA signed), and manages to get more than a 3x speed-up on the new fromString microbenchmark, while falling back gently to the current, less strict implementation if ever needed. I've done some light edits, and added a simple microbenchmark. Bug: https://bugs.openjdk.java.net/browse/JDK-8196334 Webrev: http://cr.openjdk.java.net/~redestad/8196334/open.00/ Testing: tier1-3 Thanks! /Claes -- With kind regards, Ivan Gerasimov
Re: RFR: 8196334: Optimize UUID#fromString
Hi Claes and Andriy! It looks good overall. I wonder if it'll be even faster if the fast path were implemented in StringLatin1 (for compact strings only) and was called via SharedSecrets/JavaLangAccess? Then, you could operate directly on bytes and avoid dealing with longs. With kind regards, Ivan On 2/28/20 6:22 AM, Claes Redestad wrote: Hi all, please review this patch to optimize UUID#fromString. Jon Chambers originally proposed a patch that used a strict parser to get a similar speed-up, but I failed to adapt it in a way that could fall back to the less strict behavior while maintaining a reasonable speed-up in the fast-path case. Sorry, Jon! The patch proposed here was recently contributed by Andriy Plokhotnyuk (OCA signed), and manages to get more than a 3x speed-up on the new fromString microbenchmark, while falling back gently to the current, less strict implementation if ever needed. I've done some light edits, and added a simple microbenchmark. Bug: https://bugs.openjdk.java.net/browse/JDK-8196334 Webrev: http://cr.openjdk.java.net/~redestad/8196334/open.00/ Testing: tier1-3 Thanks! /Claes -- With kind regards, Ivan Gerasimov
Re: RFR: 8240094: Optimize empty substring
Looks good to me! The copyright years need to be updated. With kind regards, Ivan On 2/26/20 6:12 AM, Claes Redestad wrote: Hi, I took the liberty to file https://bugs.openjdk.java.net/browse/JDK-8240094 for this. Webrev: http://cr.openjdk.java.net/~redestad/8240094/open.00/ I added a trivial micro and verified that there's a roughly 2.5x improvement in this targeted test on my machine (~10 ns/op -> ~4ns/op), while staying neutral on a selection of benchmarks that exercise newString. /Claes On 2020-02-26 14:19, Claes Redestad wrote: On 2020-02-26 11:01, Сергей Цыпанов wrote: Currently we have public static String stripLeading(byte[] value) { int left = indexOfNonWhitespace(value); if (left == value.length) { return ""; } return (left != 0) ? newString(value, left, value.length - left) : null; } With the patch we change behaviour of this method for the case when value.length == 0: public static String stripLeading(byte[] value) { int left = indexOfNonWhitespace(value); return (left != 0) ? newString(value, left, value.length - left) : null; } Unlike original method this code returns null instead of "" for empty array. This does not affect caller (String.stripLeading()) i. e. visible behaviour remains the same for String user, but is it OK in general? One observable difference on the public API I think will happen here is that new String("").stripLeading() == "" will change from true to false, since the null return from the inner method mean we return the argument. From a compatibility point of view I think this should be fine, as the identity of the returned empty string isn't specified. I don't think a CSR is required, but bringing it up since others think otherwise. Overall I like how the patch now cleans things up a bit on top of (likely) optimizing a few edge cases, and can volunteer to sponsor it if no one else has spoken up already. Doing some quick performance testing and possibly adding a microbenchmark before push would be good. Thanks! /Claes -- With kind regards, Ivan Gerasimov
Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes
Thank you Roger for reviewing CSR and the release note! On 2/11/20 12:49 PM, Roger Riggs wrote: Hi Ivan, Will this have enough of a compatibility concern to warrant a CSR? It will change the behavor of these cases. In the RegExTest, the failures should print which case is failing. (Line 4961, 4990). I agree that many testcases in RegExTest could provide better diagnostics in a case of a failure. I think, it maybe done as a separate cleanup. In the added testcase I made sure that both the input string and the pattern are printed upon failure. With kind regards, Ivan Regards, Roger On 2/7/20 3:05 PM, Ivan Gerasimov wrote: Gentle ping. I had to rebase the fix, as the code has diverged since the RFR was sent out 10 months ago. Also, the test was slightly modified to cover more cases. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/ Thanks in advance to the volunteer to review the fix! With kind regards, Ivan On 4/21/19 7:50 PM, Ivan Gerasimov wrote: Hello! It turns out, that the case-insensitive j.u.regex.Pattern still pays attention to the characters case when certain char classes are used. For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} continue to recognize only lower, upper and title case characters, even in case-insensitive context. For example, for POSIX char classes this behavior contradicts this paragraph: """ 9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. This definition of case-insensitive processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the string is matched using both its cases. ... """ I also checked how Perl is dealing with in such situation, and yes, it ignores the case with various \p{} classes when they are used in case-insensitive context, so all these tests run fine: 'A' =~ /\p{Lower}/i or die; 'a' =~ /\p{Upper}/i or die; 'A' =~ /\p{gc=Lt}/i or die; # title case 'a' =~ /\p{IsTitlecase}/i or die; 'Lj' =~ /\p{Lower}/i or die; # title-cased digraph 'lj' =~ /\p{Upper}/i or die; 'LJ' =~ /\p{Lt}/i or die; For reference, here's a lengthy document, describing precise rules used by Perl to deal with \p{} char classes: https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d So, for any Lower, Upper or Title case chars in case-insensitive context Perl uses set of "Cased Letters", with is just a combination of these three categories (aka "LC" general category). Would you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: [TRIVIAL] Fast-path for String.subsring(n,n)
Hi Sergei! Wouldn't it make sense to incorporate this fast path into StringUTF16.newString/StringLatin1.newString? This way other callers of these methods will also benefit from it. In particular, StringLatin1/StringUTF16.stripLeading(), StringLatin1/StringUTF16.stripTrailing() and maybe others could be slightly simplified. With kind regards, Ivan On 2/25/20 7:42 AM, Сергей Цыпанов wrote: Hello, current implementation of String.substring(int,int) has a fast-path for the case when beginIndex = 0 and endIndex = length. I think there should be similar fast-path for the case beginIndex = endIndex (asuming both are valid): -- With kind regards, Ivan Gerasimov
Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes
Thank you Roger and Joe for the feedback! May I please ask you to review the CSR draft [1] and the Release Notes [2] for this issue: [1] https://bugs.openjdk.java.net/browse/JDK-8238984 [2] https://bugs.openjdk.java.net/browse/JDK-8239887 Thanks in advance! Ivan On 2/11/20 3:10 PM, Joe Darcy wrote: Hello, Yes, I believe this change should have a CSR, and most likely a release note. Thanks, -Joe On 2/11/2020 12:49 PM, Roger Riggs wrote: Hi Ivan, Will this have enough of a compatibility concern to warrant a CSR? It will change the behavor of these cases. In the RegExTest, the failures should print which case is failing. (Line 4961, 4990). Regards, Roger On 2/7/20 3:05 PM, Ivan Gerasimov wrote: Gentle ping. I had to rebase the fix, as the code has diverged since the RFR was sent out 10 months ago. Also, the test was slightly modified to cover more cases. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/ Thanks in advance to the volunteer to review the fix! With kind regards, Ivan On 4/21/19 7:50 PM, Ivan Gerasimov wrote: Hello! It turns out, that the case-insensitive j.u.regex.Pattern still pays attention to the characters case when certain char classes are used. For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} continue to recognize only lower, upper and title case characters, even in case-insensitive context. For example, for POSIX char classes this behavior contradicts this paragraph: """ 9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. This definition of case-insensitive processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the string is matched using both its cases. ... """ I also checked how Perl is dealing with in such situation, and yes, it ignores the case with various \p{} classes when they are used in case-insensitive context, so all these tests run fine: 'A' =~ /\p{Lower}/i or die; 'a' =~ /\p{Upper}/i or die; 'A' =~ /\p{gc=Lt}/i or die; # title case 'a' =~ /\p{IsTitlecase}/i or die; 'Lj' =~ /\p{Lower}/i or die; # title-cased digraph 'lj' =~ /\p{Upper}/i or die; 'LJ' =~ /\p{Lt}/i or die; For reference, here's a lengthy document, describing precise rules used by Perl to deal with \p{} char classes: https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d So, for any Lower, Upper or Title case chars in case-insensitive context Perl uses set of "Cased Letters", with is just a combination of these three categories (aka "LC" general category). Would you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR 8235812 : (regex) Unicode linebreak with quantifier does not match valid input
Thank you Roger for review! I've adjusted the test as you suggested and pushed the fix. With kind regards, Ivan On 2/10/20 1:11 PM, Roger Riggs wrote: Hi Ivan, This look fine. In the test TegExTest: 5074, I would output the failed cases to System.err. That way they get properly interleaved with the test progress output. No need for another review. Thanks, Roger On 2/5/20 8:22 PM, Ivan Gerasimov wrote: Hello! j.u.regex.Pattern supports a special char class \R, which is specified to be equal to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]. In particular, this means that the input "\r\n" must match to both patterns \R and \R\R. (In the later case, first \R matches \r and second \R matches \n.) A pattern \R{2} is expected to be equal to \R\R. However with the current implementation this does not hold (so, Pattern.matches("\\R{2}", "\r\n") == false, while Pattern.matches("\\R\\R", "\r\n") == true). The root cause of this bug is that the special char class \R is handled via dedicated class LineEnding, which is not able to correctly handle backtracking in presence of quantifiers). A simple solution is to treat \R with quantifiers as an anonymous group, which will make it comply with the specification. Without quantifiers, \R is still handled via more efficient implementation of LineEnding. Would you please help review the fix? Some minor cleanup was done along the way in the affected code. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235812 WEBREV: http://cr.openjdk.java.net/~igerasim/8235812/00/webrev/ Control build and testing (tiers1-4) are all green. -- With kind regards, Ivan Gerasimov
Re: RFR 8214245 : (regex) Case insensitive matching doesn't work correctly for some character classes
Gentle ping. I had to rebase the fix, as the code has diverged since the RFR was sent out 10 months ago. Also, the test was slightly modified to cover more cases. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/01/webrev/ Thanks in advance to the volunteer to review the fix! With kind regards, Ivan On 4/21/19 7:50 PM, Ivan Gerasimov wrote: Hello! It turns out, that the case-insensitive j.u.regex.Pattern still pays attention to the characters case when certain char classes are used. For example \p{IsLowerCase}, \p{IsUpperCase} and \p{IsTitleCase} continue to recognize only lower, upper and title case characters, even in case-insensitive context. For example, for POSIX char classes this behavior contradicts this paragraph: """ 9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched. This definition of case-insensitive processing is intended to allow matching of multi-character collating elements as well as characters, as each character in the string is matched using both its cases. ... """ I also checked how Perl is dealing with in such situation, and yes, it ignores the case with various \p{} classes when they are used in case-insensitive context, so all these tests run fine: 'A' =~ /\p{Lower}/i or die; 'a' =~ /\p{Upper}/i or die; 'A' =~ /\p{gc=Lt}/i or die; # title case 'a' =~ /\p{IsTitlecase}/i or die; 'Lj' =~ /\p{Lower}/i or die; # title-cased digraph 'lj' =~ /\p{Upper}/i or die; 'LJ' =~ /\p{Lt}/i or die; For reference, here's a lengthy document, describing precise rules used by Perl to deal with \p{} char classes: https://perldoc.perl.org/perluniprops.html#Properties-accessible-through-%5cp%7b%7d-and-%5cP%7b%7d So, for any Lower, Upper or Title case chars in case-insensitive context Perl uses set of "Cased Letters", with is just a combination of these three categories (aka "LC" general category). Would you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8214245 WEBREV: http://cr.openjdk.java.net/~igerasim/8214245/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR 8235812 : (regex) Unicode linebreak with quantifier does not match valid input
Hello! j.u.regex.Pattern supports a special char class \R, which is specified to be equal to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]. In particular, this means that the input "\r\n" must match to both patterns \R and \R\R. (In the later case, first \R matches \r and second \R matches \n.) A pattern \R{2} is expected to be equal to \R\R. However with the current implementation this does not hold (so, Pattern.matches("\\R{2}", "\r\n") == false, while Pattern.matches("\\R\\R", "\r\n") == true). The root cause of this bug is that the special char class \R is handled via dedicated class LineEnding, which is not able to correctly handle backtracking in presence of quantifiers). A simple solution is to treat \R with quantifiers as an anonymous group, which will make it comply with the specification. Without quantifiers, \R is still handled via more efficient implementation of LineEnding. Would you please help review the fix? Some minor cleanup was done along the way in the affected code. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235812 WEBREV: http://cr.openjdk.java.net/~igerasim/8235812/00/webrev/ Control build and testing (tiers1-4) are all green. -- With kind regards, Ivan Gerasimov
RFR 8237599 : Greedy matching against supplementary chars fails to respect the region
Hello everyone! When the input of a j.u.regex.Matcher is restricted with .region() method, it can possibly cut off a half of a surrogate pair. It turns out that greedy matching implemented in the Pattern.CharPropertyGreedy class fails to recognize this edge case in two scenarios: 1) When it greedily consumes the input and meets a higher half of a surrogate pair that was cut off at the end of input, and 2) When it backs off and meets a lower half of a surrogate pair at the very beginning of input. In both cases, the engine reads the entire codepoint, crossing the boundaries of the set region. Instead, it should only read the half of the surrogate pair that lies inside the region and ignore the other half. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8237599 WEBREV: http://cr.openjdk.java.net/~igerasim/8237599/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8236034 : Use optimized Ques node for curly {0,1} quantifier
Thank you Roger for review! Pushed. -- With kind regards, Ivan Gerasimov
Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList
Thank you Roger for your review! Happy New Year to you too! :) With kind regards, Ivan On 1/9/20 10:36 AM, Roger Riggs wrote: Hi Ivan, Happy New Year! This change looks fine. Roger On 11/23/19 2:30 AM, Ivan Gerasimov wrote: Re-sending the request with the correct Subject line - the bug is not (only) about the iterator. On 11/20/19 10:11 PM, Ivan Gerasimov wrote: Hello! When a sublist of a sublist of an ArrayList is created, its modCount is initialized from the ArrayList root, and not from its immediate parent. This means that modifying that 2nd level sublist will reset modCounts of the entire chain up to the root, and consequently the 1st level sublist won't detect concurrent modification done to the ArrayList root. Along with the fix, handling of modCount is slightly improved in a few other places. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423 WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: Use single character replace variant in Resources.toPackageName(String)
FYI. I changed the year to 2020 and pushed the fix. Thank you Christoph and Alan! On 1/6/20 11:49 PM, Alan Bateman wrote: On 07/01/2020 07:09, Ivan Gerasimov wrote: So, I filed a Jira bug: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236705 WEBREV: http://cr.openjdk.java.net/~igerasim/8236705/00/webrev/ I added these source files in JDK 9, it was just an oversight that I didn't use the single char replace. My guess is that the change ModulePath deriveModuleDescriptor won't be noticeable in any real benchmark. This method is deriving a module descriptor for a legacy JAR file on the module path so it's scanning the contents of the JAR file and all attributes in its main manifest. The Resources toPackageName method is used more widely so it may help in a few cases. Anyway, this change looks fine except the copyright year (you've changed it to 2019 rather than 2020). -Alan -- With kind regards, Ivan Gerasimov
Re: Use single character replace variant in Resources.toPackageName(String)
So, I filed a Jira bug: BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236705 WEBREV: http://cr.openjdk.java.net/~igerasim/8236705/00/webrev/ If we agree on the content of the fix, I can push it after a sanity build. With kind regards, Ivan On 1/6/20 10:41 PM, Ivan Gerasimov wrote: Hi Christoph! First, the same optimization can be done in src/java.base/share/classes/jdk/internal/module/ModulePath.java: mainClass = mainClass.replace("/", "."); Second, I wonder what was the JDK version you ran your benchmark on? After the fix for JDK-8222955 the method replace(CharSequence, CharSequence) now detects one-char arguments as a special case. Thus, I think, more work should actually reduce the difference in performance between two versions of replace. Still, I think this optimization is worthwhile. I can sponsor it for you. With kind regards, Ivan On 1/6/20 9:33 AM, Christoph Dreis wrote: Hi and a happy new year, I recently noticed that jdk.internal.module.Resources.toPackageName(String) makes use of String.replace(CharSequence, CharSequence) while it could use the single char variant in my opinion: diff --git a/src/java.base/share/classes/jdk/internal/module/Resources.java b/src/java.base/share/classes/jdk/internal/module/Resources.java --- a/src/java.base/share/classes/jdk/internal/module/Resources.java +++ b/src/java.base/share/classes/jdk/internal/module/Resources.java @@ -64,7 +64,7 @@ if (index == -1 || index == name.length()-1) { return ""; } else { - return name.substring(0, index).replace("/", "."); + return name.substring(0, index).replace('/', '.'); } } I ran an isolated benchmark with some test data on it, which shows the following results Benchmark (param) Mode Cnt Score Error Units MyBenchmark.testNew java/lang avgt 10 14,905 ± 0,130 ns/op MyBenchmark.testNew:·gc.alloc.rate.norm java/lang avgt 10 48,000 ± 0,001 B/op MyBenchmark.testNew a/b/c/d/e avgt 10 23,122 ± 0,389 ns/op MyBenchmark.testNew:·gc.alloc.rate.norm a/b/c/d/e avgt 10 96,000 ± 0,001 B/op MyBenchmark.testOld java/lang avgt 10 16,614 ± 0,420 ns/op MyBenchmark.testOld:·gc.alloc.rate.norm java/lang avgt 10 48,000 ± 0,001 B/op MyBenchmark.testOld a/b/c/d/e avgt 10 84,745 ± 1,329 ns/op MyBenchmark.testOld:·gc.alloc.rate.norm a/b/c/d/e avgt 10 120,000 ± 0,001 B/op As you can see the more replacing needs to be done the better the newer version seems to be perform. In case this is considered worthwhile I would need someone to sponsor the patch. If not, I'm sorry for the noise. Cheers, Christoph Dreis -- With kind regards, Ivan Gerasimov
Re: Use single character replace variant in Resources.toPackageName(String)
Hi Christoph! First, the same optimization can be done in src/java.base/share/classes/jdk/internal/module/ModulePath.java: mainClass = mainClass.replace("/", "."); Second, I wonder what was the JDK version you ran your benchmark on? After the fix for JDK-8222955 the method replace(CharSequence, CharSequence) now detects one-char arguments as a special case. Thus, I think, more work should actually reduce the difference in performance between two versions of replace. Still, I think this optimization is worthwhile. I can sponsor it for you. With kind regards, Ivan On 1/6/20 9:33 AM, Christoph Dreis wrote: Hi and a happy new year, I recently noticed that jdk.internal.module.Resources.toPackageName(String) makes use of String.replace(CharSequence, CharSequence) while it could use the single char variant in my opinion: diff --git a/src/java.base/share/classes/jdk/internal/module/Resources.java b/src/java.base/share/classes/jdk/internal/module/Resources.java --- a/src/java.base/share/classes/jdk/internal/module/Resources.java +++ b/src/java.base/share/classes/jdk/internal/module/Resources.java @@ -64,7 +64,7 @@ if (index == -1 || index == name.length()-1) { return ""; } else { -return name.substring(0, index).replace("/", "."); +return name.substring(0, index).replace('/', '.'); } } I ran an isolated benchmark with some test data on it, which shows the following results Benchmark(param) Mode Cnt Score Error Units MyBenchmark.testNew java/lang avgt 1014,905 ± 0,130 ns/op MyBenchmark.testNew:·gc.alloc.rate.normjava/lang avgt 1048,000 ± 0,001B/op MyBenchmark.testNew a/b/c/d/e avgt 1023,122 ± 0,389 ns/op MyBenchmark.testNew:·gc.alloc.rate.norm a/b/c/d/e avgt 1096,000 ± 0,001B/op MyBenchmark.testOldjava/lang avgt 1016,614 ± 0,420 ns/op MyBenchmark.testOld:·gc.alloc.rate.norm java/lang avgt 1048,000 ± 0,001B/op MyBenchmark.testOld a/b/c/d/e avgt 1084,745 ± 1,329 ns/op MyBenchmark.testOld:·gc.alloc.rate.norma/b/c/d/e avgt 10 120,000 ± 0,001B/op As you can see the more replacing needs to be done the better the newer version seems to be perform. In case this is considered worthwhile I would need someone to sponsor the patch. If not, I'm sorry for the noise. Cheers, Christoph Dreis -- With kind regards, Ivan Gerasimov
Re: RFR 8225466 : Optimize matching BMP Slice nodes
Hi Roger. Thank you for taking a look! The variant with a single loop was the first thing I tried (should have mentioned that in the review request). Unfortunately, that showed performance decrease. I suspect that hitting the end of the input should be a less common thing, that's why it it beneficial to separate it as a slow path. With kind regards, Ivan On 12/18/19 7:48 AM, Roger Riggs wrote: Hi Ivan, Though the new code has a good effect, the asymmetry and duplication seems unnecessary. Can it be structured to have a single copy of the loop comparing the available range and still get the desired performance improvement. Like: boolean match(Matcher matcher,int i, CharSequence seq) { int[] buf =buffer; int len = buf.length; for (int j =0; j < Math.min(len, matcher.to); j++) { if (buf[j] != seq.charAt(i+j)) return false; } if (len >= matcher.to) { matcher.hitEnd =true; return false; } return next.match(matcher, i+len, seq); } Regards, Roger On 10/28/19 9:03 PM, Ivan Gerasimov wrote: Hello! When building a Pattern object, the regex parser recognizes "slices" - continuous char subsequences, which all have to be matched case-sensitively/case-insensitively. Matching with such a slice is implemented as a simple loop over a portion of the input. In the current implementation, on each iteration of the loop it is checked if we have hit the end of the input (which is an uncommon case). This check can be done only once, before the loop, which will make the loop lighter. Benchmark shows up to +4% to the throughput for the case-insensitive matching. Would you please help review the enhancement? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466 WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/ --- benchmark results --- UNFIXED Benchmark Mode Cnt Score Error Units PatternBench.sliceIFind avgt 16 190.612 ? 0.336 ns/op FIXED Benchmark Mode Cnt Score Error Units PatternBench.sliceIFind avgt 16 182.954 ? 0.493 ns/op --- -- With kind regards, Ivan Gerasimov
Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList
Gentle ping. Will someone please volunteer to review this fix of a regression? Thanks in advance! On 11/20/19 10:11 PM, Ivan Gerasimov wrote: Hello! When a sublist of a sublist of an ArrayList is created, its modCount is initialized from the ArrayList root, and not from its immediate parent. This means that modifying that 2nd level sublist will reset modCounts of the entire chain up to the root, and consequently the 1st level sublist won't detect concurrent modification done to the ArrayList root. Along with the fix, handling of modCount is slightly improved in a few other places. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423 WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR 8236034 : Use optimized Ques node for curly {0,1} quantifier
Hello! In the regular expressions, the quantifier ? can be equally written as {0,1}. While for the former variant an optimized algorithm is used (coded in the Pattern.Ques class), the later variant uses more general and less efficient implementation (coded in the Pattern.Curly class). It would be beneficial, if we unify the implementation for these two variants of quantifier and make them both use Ques node. Also, I couldn't resist to do some refactoring in the affected potion of the code to eliminate the repetition of logics. The regression test was modified to exercise both forms of quantifiers equally well. Would you please help review this enhancement? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8236034 WEBREV: http://cr.openjdk.java.net/~igerasim/8236034/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node
Thank you Martin! Pushed. On 12/14/19 10:45 PM, Martin Buchholz wrote: Looks Good To Me. -- With kind regards, Ivan Gerasimov
Re: RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node
Thank you Martin for looking at this! On 12/14/19 6:58 AM, Martin Buchholz wrote: Hi Ivan, Did you mean to change from System.err to System.out? Yes. The first string that this utility prints in main() goes to System.out. Not sure why the rest would need to go to another stream. I might have fixed the overloading by giving one of the overloaded methods a better name. How about private static void printIndented(String s, int depth) Now, after removing the overload `print(String fmt, Object ... args)` (which was just an alias of printf(String fmt, Object ... args)), we have two print() methods, and both of them indent the text. I am not planning to invest much into this utility, just wanted to fix that annoying glitch. With kind regards, Ivan On Fri, Dec 13, 2019 at 12:11 PM Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: Hello! The java.util.regex package contains a debugging utility PrintPattern which is used to display the internal representation structure of a compiled regex pattern. When it prints a node that is linked to another node that has already been printed, it is supposed to print a link, but fails. The error is due to ambigous overloading of print() method. Would you please help review a trivial fix? No regression test because the utility is not something officially supported. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235930 WEBREV: http://cr.openjdk.java.net/~igerasim/8235930/00/webrev/ -- With kind regards, Ivan Gerasimov -- With kind regards, Ivan Gerasimov
RFR 8235930 : java.util.regex.PrintPattern does not print a link to the next node
Hello! The java.util.regex package contains a debugging utility PrintPattern which is used to display the internal representation structure of a compiled regex pattern. When it prints a node that is linked to another node that has already been printed, it is supposed to print a link, but fails. The error is due to ambigous overloading of print() method. Would you please help review a trivial fix? No regression test because the utility is not something officially supported. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8235930 WEBREV: http://cr.openjdk.java.net/~igerasim/8235930/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR 8234147 : Avoid looking up standard charsets in core libraries
Hi Ulf! Thank you for the links! The enhancement requests that you pointed out are about making the lookup faster. My current proposal, however, is to completely avoid the lookup of the standard charsets in certain specific cases. With kind regards, Ivan On 11/30/19 1:32 PM, Ulf Zibis wrote: Hi Ivan, you may compare your proposal with: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6790402 https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6850361 https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6796087 -Ulf Am 27.11.19 um 05:39 schrieb Ivan Gerasimov: Hello! It is a cleanup fix with mostly two kinds of changes: - when a standard charset is specified by its name, use a preinitialized Charset constant instead, - replace the usage of StandardCharset.* constants with their sun.nio.cs.* equivalents to avoid accidental early initialization of rarely-used charsets. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147 WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/ I also had to modify one regression test that relied on a private auxiliary method, which was removed with the fix. Mach5 control build looks green. Would you please help review the fix? -- With kind regards, Ivan Gerasimov
Re: RFR 8234147 : Avoid looking up standard charsets in core libraries
Thank you Alan for the further review! On 11/27/19 12:53 PM, Alan Bateman wrote: On 27/11/2019 11:52, Ivan Gerasimov wrote: : It's not clear how to distinguish the classes inside the java.base module that do not have to depend on sun.nio.cs. If you feel strong about NTLM and XML, I can replace sun.nio.cs.* instances with corresponding java.charset.StandardCharsets.* constants in these classes. There isn't any way to say what is core and non-core in java.base so you have to use your judgement. My personal view is that the NTLM, XML, SOCKS and several others in this patch should stick to the standard APIs if possible as their performance will likely be dominated by other factors. Personally, I think that using constants sun.nio.cs.xxx.INSTANCE is not too bad even in the code that is unlikely to be executed during the VM startup. One small advantage is that if the code is copy/pasted within the java.base module, it will not bring the risk of early initialization of StandardCharsets. With NTLM, however, switching to StandardCharsets allows to remove sun.nio.cs.UTF_16LE.INSTANCE and all other corresponding modifications. So, I used StandardCharsets in NTLM (and in XML and SOCKS, as you suggested), and left sun.nio.cs constants in all other places. Here's the updated webrev: http://cr.openjdk.java.net/~igerasim/8234147/02/webrev/ It builds fine, tests run fine. -- With kind regards, Ivan Gerasimov
Re: RFR 8234147 : Avoid looking up standard charsets in core libraries
Thank you Alan for reviewing! Please see the comments inline. On 11/26/19 11:54 PM, Alan Bateman wrote: On 27/11/2019 04:39, Ivan Gerasimov wrote: Hello! It is a cleanup fix with mostly two kinds of changes: - when a standard charset is specified by its name, use a preinitialized Charset constant instead, - replace the usage of StandardCharset.* constants with their sun.nio.cs.* equivalents to avoid accidental early initialization of rarely-used charsets. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147 WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/ I also had to modify one regression test that relied on a private auxiliary method, which was removed with the fix. Mach5 control build looks green. Would you please help review the fix? The motivation is okay and most of the changes are okay but it does create a needless dependency on sun.nio.* classes in places that are unlikely (or impossible) to use during early startup. I think the change to the JDBC CachedRow implementation, prefs, and the jdk.net.httpserver module should be dropped as want fewer classes outside of java.base depending on java.base internals, not more. I also wonder about the NTLM, XML properties and a few more that shouldn't be dependency on sun.nio.* classes if possible. Please note that in fact sun.nio.cs.* was not used in java.net.http, java.prefs and java.sql.rowset. I only used those sun.nio constants inside the java.base module, exactly for the reason of avoiding additional dependencies. It's not clear how to distinguish the classes inside the java.base module that do not have to depend on sun.nio.cs. If you feel strong about NTLM and XML, I can replace sun.nio.cs.* instances with corresponding java.charset.StandardCharsets.* constants in these classes. Minot nit but if this is pushed then it would be good to keep the imports consistent with the existing style where you can, e.g. I see several files where the import sun.nio.* is put at the top of source files that already group imports of JDK internal classes together further down. Thanks, done. Unfortunately, imports conventions are not very consistent across the JDK code, so I mostly tried to preserve the pre-existing order of imports in each file. Mandy might want to comment on TrySetAccessibleTest. If I'm not mistaken it should say "non-exported" rather than "exported". Also I think we want this test to have a dependency on a few JDK internal and private methods as possible as it's a maintenance issue to keep it up to date (as you've experienced here with the removal of Perf.getBytes). If I got it correct, the test was meant to access a private static method, and after removing Perf.getBytes there were no such methods left in the Perf class. That's why I had to pick some other method to test, so I chose ModulePath.packageName instead. Here's the updated webrev with reorganized imports: http://cr.openjdk.java.net/~igerasim/8234147/01/webrev/ Thanks again! -- With kind regards, Ivan Gerasimov
RFR 8234147 : Avoid looking up standard charsets in core libraries
Hello! It is a cleanup fix with mostly two kinds of changes: - when a standard charset is specified by its name, use a preinitialized Charset constant instead, - replace the usage of StandardCharset.* constants with their sun.nio.cs.* equivalents to avoid accidental early initialization of rarely-used charsets. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234147 WEBREV: http://cr.openjdk.java.net/~igerasim/8234147/00/webrev/ I also had to modify one regression test that relied on a private auxiliary method, which was removed with the fix. Mach5 control build looks green. Would you please help review the fix? -- With kind regards, Ivan Gerasimov
Re: RFR 8234423 : Modifying ArrayList.subList().subList() resets modCount of subList
Re-sending the request with the correct Subject line - the bug is not (only) about the iterator. On 11/20/19 10:11 PM, Ivan Gerasimov wrote: Hello! When a sublist of a sublist of an ArrayList is created, its modCount is initialized from the ArrayList root, and not from its immediate parent. This means that modifying that 2nd level sublist will reset modCounts of the entire chain up to the root, and consequently the 1st level sublist won't detect concurrent modification done to the ArrayList root. Along with the fix, handling of modCount is slightly improved in a few other places. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423 WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR 8234423 : More accurate handling of modification counter in ArrayList.SubList.Iterator
Hello! When a sublist of a sublist of an ArrayList is created, its modCount is initialized from the ArrayList root, and not from its immediate parent. This means that modifying that 2nd level sublist will reset modCounts of the entire chain up to the root, and consequently the 1st level sublist won't detect concurrent modification done to the ArrayList root. Along with the fix, handling of modCount is slightly improved in a few other places. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8234423 WEBREV: http://cr.openjdk.java.net/~igerasim/8234423/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR (XXS) 8233658 : Escape + in the expression describing Runtime.Version string
Thank you Naoto! Pushed. On 11/6/19 8:24 AM, naoto.s...@oracle.com wrote: Hi Ivan, Looks good to me. Naoto On 11/5/19 9:39 PM, Ivan Gerasimov wrote: Hello! Here's yet another javadoc-only fix. Format of the Runtime.Version string is described as a list of regular expressions [1]: $VNUM(-$PRE)?\+$BUILD(-$OPT)? $VNUM-$PRE(-$OPT)? $VNUM(+-$OPT)? The character + in the last line should be escaped to be consistent with the first line and to avoid confusion with RE quantifiers. [1] https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Runtime.Version.html Would you please help review this trivial one-char fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233658 WEBREV: http://cr.openjdk.java.net/~igerasim/8233658/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR (XXS) 8233658 : Escape + in the expression describing Runtime.Version string
Hello! Here's yet another javadoc-only fix. Format of the Runtime.Version string is described as a list of regular expressions [1]: $VNUM(-$PRE)?\+$BUILD(-$OPT)? $VNUM-$PRE(-$OPT)? $VNUM(+-$OPT)? The character + in the last line should be escaped to be consistent with the first line and to avoid confusion with RE quantifiers. [1] https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Runtime.Version.html Would you please help review this trivial one-char fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233658 WEBREV: http://cr.openjdk.java.net/~igerasim/8233658/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8233650 : Javadoc for Math.floorMod(int, int) gives wrong example
Thank you Brian! Pushed. On 11/5/19 4:05 PM, Brian Burkhalter wrote: Hi Ivan, I think this looks good. Brian On Nov 5, 2019, at 3:35 PM, Ivan Gerasimov wrote: The javadoc for the method Math.floorMod(int, int) [1] contains an inaccuracy. There are two sets of examples given: One for the case when signs of the arguments are the same and the other for the case when signs of the arguments are different. The second set of examples wrongly contains an example from the first set. Would you please help review this javadoc-only fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233650 WEBREV: http://cr.openjdk.java.net/~igerasim/8233650/00/webrev/ [1] https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Math.html#floorMod(int,int) -- With kind regards, Ivan Gerasimov
RFR (S) 8233650 : Javadoc for Math.floorMod(int, int) gives wrong example
Hello! The javadoc for the method Math.floorMod(int, int) [1] contains an inaccuracy. There are two sets of examples given: One for the case when signs of the arguments are the same and the other for the case when signs of the arguments are different. The second set of examples wrongly contains an example from the first set. Would you please help review this javadoc-only fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8233650 WEBREV: http://cr.openjdk.java.net/~igerasim/8233650/00/webrev/ [1] https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/lang/Math.html#floorMod(int,int) -- With kind regards, Ivan Gerasimov
RFR 8225466 : Optimize matching BMP Slice nodes
Hello! When building a Pattern object, the regex parser recognizes "slices" - continuous char subsequences, which all have to be matched case-sensitively/case-insensitively. Matching with such a slice is implemented as a simple loop over a portion of the input. In the current implementation, on each iteration of the loop it is checked if we have hit the end of the input (which is an uncommon case). This check can be done only once, before the loop, which will make the loop lighter. Benchmark shows up to +4% to the throughput for the case-insensitive matching. Would you please help review the enhancement? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225466 WEBREV: http://cr.openjdk.java.net/~igerasim/8225466/00/webrev/ --- benchmark results --- UNFIXED Benchmark Mode Cnt Score Error Units PatternBench.sliceIFind avgt 16 190.612 ? 0.336 ns/op FIXED Benchmark Mode Cnt Score Error Units PatternBench.sliceIFind avgt 16 182.954 ? 0.493 ns/op --- -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma
Hi Chris! On 10/3/19 8:05 AM, Chris Hegarty wrote: Ivan, On 3 Oct 2019, at 04:41, Ivan Gerasimov wrote: ... So, I filed CSR: https://bugs.openjdk.java.net/browse/JDK-8231805 to cover the addition of @throws paragraph in the javadoc of SocketPermission. I would really appreciate it, if someone helped to review it. Since we’re here ... ;-) It would be good to specify the NPE behavior of the constructor. Here are the changes for SocketPermission. If you agree, fold them into your patch and CSR. ( I’ve included test changes to verify the new tighter spec ) https://cr.openjdk.java.net/~chegar/8230407.extra/ Yes, it's a good point, thanks! I've adopted your suggested changes and the test: http://cr.openjdk.java.net/~igerasim/8230407/02/webrev/ CSR was also updated accordingly: https://bugs.openjdk.java.net/browse/JDK-8231805 With kind regards, Ivan -Chris. -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma
Thanks Peter, your variant looks cute! Let's keep this code in mind as a candidate for refactoring once the switch expression will make its way to the standard. I would hesitate to allow using experimental features in building the JDK just for that code :) With kind regards, Ivan On 10/2/19 11:40 PM, Peter Levart wrote: Hi Ivan, On 10/1/19 10:26 PM, Ivan Gerasimov wrote: Hello! The constructors of SocketPermission and FilePermission expect a String argument with comma-separated list of actions. If the list is malformed, then the constructors throw IllegalArgumentException. It turns out that the current implementation fails to throw IAE if the list starts with a leading comma. Would you please help review a simple fix, which will make the behavior more consistent? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407 WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/ With new switch expressions the logic could be a little clearer. For example: for (boolean seencomma = false; i >= matchlen && !seencomma; i--) { seencomma = switch (a[i - matchlen]) { case ' ', '\r', '\n', '\f', '\t': yield false; case ',': if (i > matchlen) yield true; default: throw new IllegalArgumentException( "invalid permission: " + actions); }; } This is still experimental, but I think it will be proposed to be standard soon. If you want to backport it later, then maybe you don't want to do that now. Regards, Peter -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma
Thank you Joe for checking it! On 10/2/19 4:38 PM, Joe Darcy wrote: Hello, At least from a quick reading, either the spec change or the behavior change would seem to merit a CSR. Sigh. I was hopping it'll be a quick fix :-) So, I filed CSR: https://bugs.openjdk.java.net/browse/JDK-8231805 to cover the addition of @throws paragraph in the javadoc of SocketPermission. I would really appreciate it, if someone helped to review it. W.r.t the behavior change, I don't think the fix has to be counted as such. Current implementation already would throw IllegalArgumentException if the action list were malformed (for example if it were " , accept", or "connect,,accept", or "connect,", etc.) The only case when it would *not* throw IAE is when the argument *immediately* starts with a comma, and that's what the fix is about. It's not like if we used to allow commas in arbitrary places and stopped doing that. Instead, it just turned out that the code fails to catch one specific pattern of malformed action list. With kind regards, Ivan Cheers, -Joe On 10/2/2019 4:26 PM, Ivan Gerasimov wrote: Hi Chris! Thank you very much for review! I agree that it makes sense to update the javadoc for consistency. I don't think CSR is required in this case, is it? (IAE is unchecked anyway, and the fix doesn't really change the behavior.) Here's the updated webrev: http://cr.openjdk.java.net/~igerasim/8230407/01/webrev/ With kind regards, Ivan On 10/2/19 6:44 AM, Chris Hegarty wrote: Ivan, On 01/10/2019 21:26, Ivan Gerasimov wrote: Hello! The constructors of SocketPermission and FilePermission expect a String argument with comma-separated list of actions. If the list is malformed, then the constructors throw IllegalArgumentException. It turns out that the current implementation fails to throw IAE if the list starts with a leading comma. Would you please help review a simple fix, which will make the behavior more consistent? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407 WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/ The implementation changes look ok. The SocketPermission constructor should be updated to specify IAE too, right? -Chris. -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma
Hi Chris! Thank you very much for review! I agree that it makes sense to update the javadoc for consistency. I don't think CSR is required in this case, is it? (IAE is unchecked anyway, and the fix doesn't really change the behavior.) Here's the updated webrev: http://cr.openjdk.java.net/~igerasim/8230407/01/webrev/ With kind regards, Ivan On 10/2/19 6:44 AM, Chris Hegarty wrote: Ivan, On 01/10/2019 21:26, Ivan Gerasimov wrote: Hello! The constructors of SocketPermission and FilePermission expect a String argument with comma-separated list of actions. If the list is malformed, then the constructors throw IllegalArgumentException. It turns out that the current implementation fails to throw IAE if the list starts with a leading comma. Would you please help review a simple fix, which will make the behavior more consistent? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407 WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/ The implementation changes look ok. The SocketPermission constructor should be updated to specify IAE too, right? -Chris. -- With kind regards, Ivan Gerasimov
Re: 8229022: BufferedReader performance can be improved by using StringBuilder
Hi Brian! This looks good to me! With kind regards, Ivan On 10/1/19 5:13 PM, Brian Burkhalter wrote: While the performance improvement that I measured for this proposed change [1] using JMH was only in the 2% to 8% range as opposed to the 13% claimed in the issue description, given the simplicity of the change [2] it is probably worth doing. All use of the StringBuilder is within a synchronized block within a single package-scope method, so there should be no problem with access by multiple threads. Thanks, Brian [1] https://bugs.openjdk.java.net/browse/JDK-8229022 [2] diff --- a/src/java.base/share/classes/java/io/BufferedReader.java +++ b/src/java.base/share/classes/java/io/BufferedReader.java @@ -314,7 +314,7 @@ * @throws IOException If an I/O error occurs */ String readLine(boolean ignoreLF, boolean[] term) throws IOException { -StringBuffer s = null; +StringBuilder s = null; int startChar; synchronized (lock) { @@ -372,7 +372,7 @@ } if (s == null) -s = new StringBuffer(defaultExpectedLineLength); +s = new StringBuilder(defaultExpectedLineLength); s.append(cb, startChar, i - startChar); } }
RFR (S) 8230407 : SocketPermission and FilePermission action list allows leading comma
Hello! The constructors of SocketPermission and FilePermission expect a String argument with comma-separated list of actions. If the list is malformed, then the constructors throw IllegalArgumentException. It turns out that the current implementation fails to throw IAE if the list starts with a leading comma. Would you please help review a simple fix, which will make the behavior more consistent? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230407 WEBREV: http://cr.openjdk.java.net/~igerasim/8230407/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8230415 : Avoid redundant permission checking in FilePermissionCollection and SocketPermissionCollection
Thank you Sean for reviewing! With kind regards, Ivan On 9/27/19 7:20 AM, Sean Mullan wrote: Hi Ivan, The fix looks good. Good catch. --Sean On 8/30/19 7:32 PM, Ivan Gerasimov wrote: Hello! In the two implementations of PermissionCollection.implies(Permission), all the permissions are traversed, and their corresponding bit mask are checked. For example, here's a snippet from FilePermission.java: int desired = fperm.getMask(); int effective = 0; int needed = desired; for (Permission perm : perms.values()) { FilePermission fp = (FilePermission)perm; if (((needed & fp.getMask()) != 0) && fp.impliesIgnoreMask(fperm)) { effective |= fp.getMask(); if ((effective & desired) == desired) { return true; } needed = (desired ^ effective);// <<< should be (desired & ~effective) } } Here, if a permission's mask `fp.getMask()` intersects with `needed`, but does not fully cover all the needed bits, the variable `needed` is updated as XOR of desired and effective. This can raise a not-really-needed bits in the `needed` mask, so that for all subsequent permissions from the collection with that unneeded bits in the mask, an expensive fp.impliesIgnoreMask(fperm) will be executed. The fix does not change the behavior, but helps avoid unnecessary calls to impliesIgnoreMask(). Would you please help review a trivial fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230415 WEBREV: http://cr.openjdk.java.net/~igerasim/8230415/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8230829 : Matcher matches a surrogate pair that crosses border of the region
Thank you Naoto for reviewing! With kind regards, Ivan On 9/12/19 8:02 AM, naoto.s...@oracle.com wrote: Looks good to me, Ivan. Naoto On 9/10/19 9:13 PM, Ivan Gerasimov wrote: Hello! When the regex.Matcher matches a supplementary codepoint, it may inadvertently cross the border of the explicitly specified region without noticing. Would you please help review a simple fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230829 WEBREV: http://cr.openjdk.java.net/~igerasim/8230829/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
RFR 8230829 : Matcher matches a surrogate pair that crosses border of the region
Hello! When the regex.Matcher matches a supplementary codepoint, it may inadvertently cross the border of the explicitly specified region without noticing. Would you please help review a simple fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230829 WEBREV: http://cr.openjdk.java.net/~igerasim/8230829/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Thank you Stuart for the analysis! Please see my comments inline. On 9/9/19 4:39 PM, Stuart Marks wrote: On 9/5/19 1:43 PM, Ivan Gerasimov wrote: Personally, I don't have a strong preference here. The compatibility property are meant to be temporary anyways. Maybe if we have a single option that will control several different aspects of behavior, it will be harder to get rid of it? Partially, because it will be tempting to reuse it for other similar changes, should they be needed. OK, let's take an inventory of what behavior changes are being contemplated for regexes: JDK-8230675 restrict IDs for control chars JDK-xxx allow case-insensitive IDs for control chars *NOTE* JDK-8225021 Treat ambiguous embedded flags as parse syntax errors JDK-8214245 Case insensitive matching doesn't work correctly for some character classes I quickly searched JBS and found several more bugs/enhancements requests that, if implemented, may result in the behavior changes. Here's (presumably incomplete) list: JDK-8218146 $ matches before end of line, even without MULTILINE mode JDK-8217977 Matcher matching trailing high surrogate reports false for requireEnd() JDK-8217501 Matcher.hitEnd returns false for incomplete surrogate pairs JDK-8217496 Matcher.group() can return null after usePattern JDK-8216332 Grapheme regex does not work with emoji sequences JDK-8199594 Regex Pattern class improperly ignores spaces in character classes JDK-8187083 Regex: Capturing groups inside a lookahead aren't backtracked JDK-8187082 Regex: Nested capturing groups under lazy repetition aren't backtracked JDK-8183391 Regex: End of line found more than once for non-multiline regex JDK-8179668 Valid regex patterns match the latter half of complete surrogate pairs JDK-8029966 Broken supplementary character support in regex JDK-6919621 Matcher find returns wrong result in java 1.6 for certain patterns All of them are of low priorities, so I don't anticipate active work on these bugs in the near future. Though at least some of them, if fixed, would make the Java regexp engine better, so it probably wouldn't make sense to just abandon these request because of the compatibility reasons. *NOTE* this was part of the original JDK-8230675 proposal, but you removed it after discussion. I don't know if we decided never to do this, or whether we're merely considering it separately. It seemed to me that there was a possibility that we'd do this in the future. I was thinking of filling an enhancement request with the fix version set to TBD, so we can return to this proposal in some future release, if desirable. Is this all the behavior changes being contemplated, or is this simply the set that we happened to have stumbled across recently? Put another way, if we decided to do some further analysis of regexes, would we run across other issues where we might say, "Yeah, we ought to fix that, but that would be a potentially incompatible behavior change, so we need to add another property." ? In practice, such properties are only removed after a very long time, or perhaps even "never." It's not like this change would be added in this release (JDK 14), with backward compatibility support removed in a year (say, JDK 16) along with the property. The property, and the backward compatibility mode, would stick around in the code for many years. What I want to avoid doing is to introduce behavior changes -- and properties to control them -- in a piecemeal fashion. It looks like we might have three or four candidates already. Would we want to live with three or four properties? If we did this and continued with additional changes, we might end up with six or eight or ten properties over time. I'd like to see us look ahead a bit and take stock of what changes we're contemplating, and then decided how to deal with compatibility and migration based on that. I'd like to avoid making individual changes (and adding properties) one at a time, with decisions made in isolation, because that will lead to a proliferation of properties. So, there are two alternatives at the table at this time: 1) A single compatibility property to revert to the old behavior; The property is reused for each of listed above bugs, so with each fix a portion of revert logic is added to the property. PROS: Easy to implement and maintain. CONS: Over time, can become hard to track, what exactly the property controls, so may be hard to use. For example, if a user turns on this property to revert a single aspect of behavior, one will get all other behavior oddities. 2) Individual properties for every change of behavior. PROS: If needed, the behavior can be fine-grained. Easier to understand what the expected behavior would be with every set of properties set. CONS: Complex to maintain. For the majority of cases would be just an overkill. Also, can greatly increase number of testi
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Hi Stuart! Thanks for the comments, and please see my answers inline: On 9/5/19 1:18 PM, Stuart Marks wrote: Conceptually I think having the restriction is fine. If we were designing this as a new feature, I'd recommend putting in the restrictions from the very beginning. However, since the old behavior has been out there for 20 years, my main concern is compatibility. Having system properties to control this is ... ok ... I guess. I wish there were a better way, but it seems the simplest. However, I'd strongly advise making the behavior change and the compatibility mode(s) as simple as possible. Having more configuration options complicates the code and complicates the testing, and I really don't see the need for it. So yes, I agree with abandoning ALLOW_LOWERCASE_CONTROL_CHAR_IDS. Okay. I've just sent out the updated webrev without this option: http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/ And filed the CSR for the spec change and the new property: https://bugs.openjdk.java.net/browse/JDK-8230675 I guess now we get to have a bikeshed on the property name. Does this apply only to mapping of control characters, or is there some other form of pattern syntax that could be made more strict? Ivan, I seem to recall you talking to me about making some changes in the syntax or interpretation of the construct that enables/disables various flags. This one: |(?idmsuxU-idmsuxU)| Would it make sense to have a single property, e.g. jdk.util.regex.strictSyntax, that governs this one as well? And are there other possibilities for tuning up the parsing behavior that we should be taking? Yes, that was this bug: https://bugs.openjdk.java.net/browse/JDK-8225021 (Treat ambiguous embedded flags as parse syntax errors) And actually I have yet another bug of similar flavor: https://bugs.openjdk.java.net/browse/JDK-8214245 (Case insensitive matching doesn't work correctly for some character classes) However, this last one is about changing the matching rules, and not about the syntax (And it is still waiting for review!). I'd rather not have a situation where we fix up one area of syntax and add a property for it, fix up another area and add a different property, leading to property proliferation. Personally, I don't have a strong preference here. The compatibility property are meant to be temporary anyways. Maybe if we have a single option that will control several different aspects of behavior, it will be harder to get rid of it? Partially, because it will be tempting to reuse it for other similar changes, should they be needed. With kind regards, Ivan s'marks On 9/4/19 9:00 PM, Martin Buchholz wrote: Thanks, Ivan. We're mostly in agreement. + * If {@code true} then lower-case control-character ids are mapped to the + * their upper-case counterparts. Extra "the". After all these decades I only now realize that c ^= 0x40 moves '?' to the end of the ASCII range and all the other controls to the start! Should we support lower-case controls? Compatibility with perl regex still matters, but a lot less than in 2003. But the key is that we got the WRONG ANSWER previously, so when we restrict the control ids let's just make lower case controls syntax errors. Silently changing behavior is bad for users. ... so let's abandon ALLOW_LOWERCASE_CONTROL_CHAR_IDS. An alternative: int ch = read() ^ 0x40; if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch; On Wed, Sep 4, 2019 at 6:49 PM Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: Thank you Martin! On 8/30/19 6:19 PM, Martin Buchholz wrote: > There's a strong expectation that ctrl-A and ctrl-a both map to > \u0001, so I support Ivan's initiative. > I'm surprised java regex gets this wrong. > Might need a transitional system property. > Right. I think it would be best to introduce two system properties: The first, to turn on/off the restrictions on the control-char names. This will be enabled by default, and will permit names from the limited list: capital letters and a few other special characters. The second one, to enable mapping of lower-case control-char names to their upper-case counterpart. This option should be turned off by default for the current release of JDK, and then turned on by default for some subsequent release (when, presumably, most applications that use this kind of regexp are fixed). This all feels like a little bit too much for such a rarely used feature, but probably is a proper thing to do anyway :-) If we have an agreement on these system properties, I can create a separate test to verify all possible combinations. > What's the best bit-twiddle? Untested: > if ((c ^= 0x40) < 0x20) return c; > if ((c ^=0x20) <= 26 && c > 0) return c; >
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Thank you Martin again! Here's the updated webrev without the lower-case control char ids: http://cr.openjdk.java.net/~igerasim/8230365/03/webrev/ I've also filed a CSR to record the changes in bahavior: https://bugs.openjdk.java.net/browse/JDK-8230675 Could you please help review it? On 9/4/19 9:00 PM, Martin Buchholz wrote: Thanks, Ivan. We're mostly in agreement. + * If {@code true} then lower-case control-character ids are mapped to the + * their upper-case counterparts. Extra "the". After all these decades I only now realize that c ^= 0x40 moves '?' to the end of the ASCII range and all the other controls to the start! Should we support lower-case controls? Compatibility with perl regex still matters, but a lot less than in 2003. But the key is that we got the WRONG ANSWER previously, so when we restrict the control ids let's just make lower case controls syntax errors. Silently changing behavior is bad for users. ... so let's abandon ALLOW_LOWERCASE_CONTROL_CHAR_IDS. An alternative: int ch = read() ^ 0x40; if (!RESTRICTED_CONTROL_CHAR_IDS || ch < 0x20 || ch == 0x7f) return ch; This code will probably be most efficient for the common case. However, I'd prefer to use the auxiliary method isCntrlId() in this case, as it is self-documenting and still efficient enough. With kind regards, Ivan
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Hello Bernd! Thank you for your comments! I'm going to proceed with only the restriction part of the change for now, so no blind conversion of lower-case control chars will happen. A system property will allow the users to return to the previous less restrictive behavior, should they decide to keep malformed patterns unchanged. I'll post the updated webrev and CSR request shortly. With kind regards, Ivan On 9/4/19 10:54 PM, Bernd Eckenfels wrote: Hallo, Since not all combinations make sense (Exception+convert) a multi value might be better: jdk.regex.control=WARN|EXCEPTION|STANDARD|LEGACY With Exception generating an error, Standard beeing the planned new default (treating upper/lower same and error on all undefined chars) and legacy beeing the manual fallback to current behavior and WARN the same fallback but with logging. I guess some form of early feedback like EXCPETION or WARN is needed, even when it is between a rock and a hard place. Maybe have at least one iteration where it defaults to LEGACY (+Release Notes announcement), then WARN and then finally STANDARD? Gruss Bernd -- http://bernd.eckenfels.net Von: core-libs-dev im Auftrag von Ivan Gerasimov Gesendet: Donnerstag, September 5, 2019 4:00 AM An: Martin Buchholz; Stuart Marks Cc: core-libs-dev Betreff: Re: RFR 8230365 : Pattern for a control-char matches non-control characters Thank you Martin! On 8/30/19 6:19 PM, Martin Buchholz wrote: There's a strong expectation that ctrl-A and ctrl-a both map to \u0001, so I support Ivan's initiative. I'm surprised java regex gets this wrong. Might need a transitional system property. Right. I think it would be best to introduce two system properties: The first, to turn on/off the restrictions on the control-char names. This will be enabled by default, and will permit names from the limited list: capital letters and a few other special characters. The second one, to enable mapping of lower-case control-char names to their upper-case counterpart. This option should be turned off by default for the current release of JDK, and then turned on by default for some subsequent release (when, presumably, most applications that use this kind of regexp are fixed). This all feels like a little bit too much for such a rarely used feature, but probably is a proper thing to do anyway :-) If we have an agreement on these system properties, I can create a separate test to verify all possible combinations. What's the best bit-twiddle? Untested: if ((c ^= 0x40) < 0x20) return c; if ((c ^=0x20) <= 26 && c > 0) return c; 0x40 is more readable than 64. `((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case) ids. Does ctrol-? get mapped to 0x7f ? Yes. I've got it in the test at the end of the line 4997. Would you please help review the updated webrev: http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/ With kind regards, Ivan -- With kind regards, Ivan Gerasimov
Re: Reply: what to do next to fix JDK-8230557. thanks
Hi Peter! On 9/5/19 7:24 AM, Peter Levart wrote: Hi Ivan, On 9/5/19 11:22 AM, Ivan Gerasimov wrote: Hello! BitSet is known to be flawed in many ways:?0?2 its size(), length(), cardinality() and nextClearBit?6?7() can return meaningless negative values. I was thinking about disallowing setting any index greater than (Integer.MAX_VALUE - 63), for example by throwing OutOfMemoryError. An index of Integer.MAX_VALUE - 64 would be the greatest index then, an index of Integer.MAX_VALUE - 63 already needs an array of longs of length 2^25 which results in BitSet size() of 2^31 ... We could do it without changing the specification. The calls to: new BitSet(Integer.MAX_VALUE - 63) ... new BitSet(Integer.MAX_VALUE) would also have to throw then. BitSet.valueOf(...) methods don't even check the passed in arguments lengths, so size() can really return a meaningless negative or positive number. They all would have to throw if the passed-in length of array/buffer is too large. So would you not specify when those methods throw? Yes.?0?2 My point was that any method that requires memory allocation should be expected to possibly throw OOM, so we don't have to specify it explicitly. A similar thing happened with compact Strings, for example.?0?2 When the internal storage was changed to byte[] array, all of a sudden, it became impossible to create UTF16 Strings (or StringBuilders) of length > (Integer.MAX_VALUE / 2). W.r.t. BitSet, personally, I would rather not fix its current behavior unless there is a very strong demand for the change. Just wanted to mention one approach, which would limit usage of this class to the boundaries it was originally designed for. With kind regards, Ivan Regards, Peter With kind regards, Ivan On 9/5/19 1:16 AM, wrote: Hi, Peter. I understand your point, but I think it's unreasonable for the reason that source code compatibility problem, it's really a bug. User can't understand why the size method return a negative number. Best, lamber-ken ---- ??:"Peter Levart"??:""<2217232...@qq.com;"core-libs-dev" :"David Holmes"indexes is: 0 ... 2^31 - 1 (0 ... Integer.MAX_VALUE). The maximum "size" of BitSet is therefore 2^31. Unfortunately, this value can't be "corectly" represented with signed 32 bit integer (int). Only in this corner case, - 2^31 (Integer.MIN_VALUE) is the interpreted value returned from size(). If one would interpret it as unsigned 32 bit integer value, it is entirely correct. For example, this holds: Integer.toUnsignedLong(new BitSet(Integer.MAX_VALUE).size()) == 1L << 31 It is also always true what javadoc says about size(): "The maximum element in the set is the size - 1st element" The following holds also for this corner case: new BitSet(Integer.MAX_VALUE).size() - 1 == Integer.MAX_VALUE; So perhaps, the fix could be just to describe this corner case in the spec. And perhaps, to support the following use case in the corner case: BitSet set1 = ... ... BitSet set2 = new BitSet(set1.size()); ... by modifying the BitSet constructor to accept the Integer.MIN_VALUE in addition to all the non-negative values as the 'nbits' parameter. What do you think? Regards, Peter On 9/5/19 8:31 AM, David Holmes wrote: Hi, On 5/09/2019 4:11 pm, wrote: Thanks very much. *BUG-LINK:* https://bugs.openjdk.java.net/browse/JDK-8230557 *Describe: * the return type of the method BitSet#size is int, so the method may return a negative value in some case, for example, will return -2147483648. ``` BitSet bitSet = new BitSet(Integer.MAX_VALUE); for (int i = 0; i < Integer.MAX_VALUE - 1000; i++) { bitSet.set(i); } System.out.println(bitSet.size()); ``` EXPECTED: 2147483648, but ACTUAL: -2147483648. *FIX* I have put the patch in the attachment of the mail. In case the attachment got stripped form the mailing list the proposed fix is: - public int size() { - return words.length * BITS_PER_WORD; + public long size() { + return (long) words.length * BITS_PER_WORD; Unfortunately this simple fix it not possible. You can't just change the return type of the method to long as that is a source-incompatible change and would not be approved [1]. Instead the return value should be capped at Integer.MAX_VALUE - but I'll leave that for someone from core-libs team to pick up. Also look at the evaluation in: https://bugs.openjdk.java.net/browse/JDK-4213570 Cheers, David [1] https://wiki.openjdk.java.net/display/csr/CSR+FAQs ---- *??:*"David Holmes" *??:*""<2217232...@qq.com;"core-libs-dev" d...@openjdk.java.net; *:*Re: ?? what to do next to fix JDK-8230557. thanks On 5/09/2019 3:46 pm, wrote: hi, develope
Re: Reply: what to do next to fix JDK-8230557. thanks
Hello! BitSet is known to be flawed in many ways:?0?2 its size(), length(), cardinality() and nextClearBit?6?7() can return meaningless negative values. I was thinking about disallowing setting any index greater than (Integer.MAX_VALUE - 63), for example by throwing OutOfMemoryError. We could do it without changing the specification. With kind regards, Ivan On 9/5/19 1:16 AM, wrote: Hi, Peter. I understand your point, but I think it's unreasonable for the reason that source code compatibility problem, it's really a bug. User can't understand why the size method return a negative number. Best, lamber-ken ---- ??:"Peter Levart"https://bugs.openjdk.java.net/browse/JDK-8230557 *Describe: * the return type of the method BitSet#size is int, so the method may return a negative value in some case, for example, will return -2147483648. ``` BitSet bitSet = new BitSet(Integer.MAX_VALUE); for (int i = 0; i < Integer.MAX_VALUE - 1000; i++) { bitSet.set(i); } System.out.println(bitSet.size()); ``` EXPECTED: 2147483648, but ACTUAL: -2147483648. *FIX* I have put the patch in the attachment of the mail. In case the attachment got stripped form the mailing list the proposed fix is: - public int size() { - return words.length * BITS_PER_WORD; + public long size() { + return (long) words.length * BITS_PER_WORD; Unfortunately this simple fix it not possible. You can't just change the return type of the method to long as that is a source-incompatible change and would not be approved [1]. Instead the return value should be capped at Integer.MAX_VALUE - but I'll leave that for someone from core-libs team to pick up. Also look at the evaluation in: https://bugs.openjdk.java.net/browse/JDK-4213570 Cheers, David [1] https://wiki.openjdk.java.net/display/csr/CSR+FAQs ---- *??:*"David Holmes"http://openjdk.java.net/contribute/ and you would seem to be up to step 2. :) Cheers, David [1]https://bugs.openjdk.java.net/browse/JDK-8230557 [2]http://openjdk.java.net/contribute/ [3]https://www.oracle.com/technetwork/community/oca-486395.html --nbsp;nbsp;-- ??:nbsp;"Bug Report Notification" -- With kind regards, Ivan Gerasimov
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Thank you Martin! On 8/30/19 6:19 PM, Martin Buchholz wrote: There's a strong expectation that ctrl-A and ctrl-a both map to \u0001, so I support Ivan's initiative. I'm surprised java regex gets this wrong. Might need a transitional system property. Right. I think it would be best to introduce two system properties: The first, to turn on/off the restrictions on the control-char names. This will be enabled by default, and will permit names from the limited list: capital letters and a few other special characters. The second one, to enable mapping of lower-case control-char names to their upper-case counterpart. This option should be turned off by default for the current release of JDK, and then turned on by default for some subsequent release (when, presumably, most applications that use this kind of regexp are fixed). This all feels like a little bit too much for such a rarely used feature, but probably is a proper thing to do anyway :-) If we have an agreement on these system properties, I can create a separate test to verify all possible combinations. What's the best bit-twiddle? Untested: if ((c ^= 0x40) < 0x20) return c; if ((c ^=0x20) <= 26 && c > 0) return c; 0x40 is more readable than 64. `((ch-0x3f)|(0x5f-ch)) >= 0` does the trick for regular (non-lower-case) ids. Does ctrol-? get mapped to 0x7f ? Yes. I've got it in the test at the end of the line 4997. Would you please help review the updated webrev: http://cr.openjdk.java.net/~igerasim/8230365/02/webrev/ With kind regards, Ivan
RFR (XS) 8230415 : Avoid redundant permission checking in FilePermissionCollection and SocketPermissionCollection
Hello! In the two implementations of PermissionCollection.implies(Permission), all the permissions are traversed, and their corresponding bit mask are checked. For example, here's a snippet from FilePermission.java: int desired = fperm.getMask(); int effective = 0; int needed = desired; for (Permission perm : perms.values()) { FilePermission fp = (FilePermission)perm; if (((needed & fp.getMask()) != 0) && fp.impliesIgnoreMask(fperm)) { effective |= fp.getMask(); if ((effective & desired) == desired) { return true; } needed = (desired ^ effective);// <<< should be (desired & ~effective) } } Here, if a permission's mask `fp.getMask()` intersects with `needed`, but does not fully cover all the needed bits, the variable `needed` is updated as XOR of desired and effective. This can raise a not-really-needed bits in the `needed` mask, so that for all subsequent permissions from the collection with that unneeded bits in the mask, an expensive fp.impliesIgnoreMask(fperm) will be executed. The fix does not change the behavior, but helps avoid unnecessary calls to impliesIgnoreMask(). Would you please help review a trivial fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230415 WEBREV: http://cr.openjdk.java.net/~igerasim/8230415/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR 8230365 : Pattern for a control-char matches non-control characters
Hi Stuart! Thank you for your comments on the proposal! I totally agree with you that the fixes that result in changing the behavior have to be carefully planned and well thought. Please see my answers inline. On 8/29/19 6:15 PM, Stuart Marks wrote: Hi Ivan, This change certainly makes regex patterns more rigorous, but I'm concerned about the compatibility. This is a spec change and also a behavior change. While the current behavior might not strictly be correct, it does have some characteristics that applications might be depending on -- perhaps even by accident. If this change is made, it might cause subtle issues in applications that would be quite difficult to diagnose. There are two types of changes in the proposal: First, make \cx construct more restrictive w.r.t. possible values of x. Second, make it case-insensitive, so \cz will mean the same as \cZ. I agree that the later type of changes can potentially cause hard to diagnose failures in the existing applications. The reason for this later change was an attempt to make Java regexp more Perl-compatible (and Perl does exactly this: Treats \cx with a lower-case x as \cX with its upper-case counterpart). The former type of change, on the other hand, might actually be useful for the existing applications, as it may allow to see (otherwise hard to find) bugs in the code. For example, if some existing application has a regexp, which contains "\\c\t", this is most likely a programming error (or a nasty hack to code a char 'I'), so it would be beneficial to report it via throwing a PatternSyntaxException. Examples of changes I'm concerned about are: pattern \ca currently matches '!' would now match \u0001 pattern \cÀ currently matches \u0080 would now throw exception pattern \c0 currently matches 'p' would now throw exception and so forth. That is, using \c with characters in the range [a-z] would now match different characters from before, and using \c with characters outside the set that correspond to C0 control characters would now throw an exception whereas before they would matching something that was predictable, if in some sense incorrect. There are some ways to mitigate the incompatibility, for example, by adding a system property, or by adding a Pattern flag that explicitly enables this behavior, though I'm not sure that either is worthwhile. Maybe there are less intrusive ways that I haven't thought of. I think it would make sense to first separate the changes mentioned above, so it will be easier to reason about them. While I think that making lower-case \\cx a synonym for the upper-case \\cX is a good thing to have (mainly for additional compatibility with Perl-style regexps), it's supposed be deferred to a later time. The current behavior seems to be have been established around 1999 (JDK 1.3?) so it's been around a long time, plenty of time for applications to have formed inadvertent dependencies on this behavior. An alternative would be simply to document the current behavior, even though it's arguably incorrect. Is there some benefit to this change, for example, does it enable one to write an application that wasn't possible before because of this bug? In my opinion, the benefits are: 1) Fail-early approach, which will let the developers catch the bugs earlier, 2) Better compatibility with Perl-style regexps, so that a wider class of regular expressions can be used across different platforms, 3) Well defined strict rules will allow IDEs to implement additional edit-time checks of regexp, which will again help developers. Please find the updated webrev here: http://cr.openjdk.java.net/~igerasim/8230365/01/webrev/ With kind regards, Ivan s'marks On 8/29/19 4:39 PM, Ivan Gerasimov wrote: Hello! In a regular expression pattern a sequence of the form \\cx is allowed to specify a control character that corresponds to the name char x. Current implementation has a few issues with that: 1) It allows x to be just any character, including non-printable ones; 2) The produced regexp may correspond to a non-control characters; 3) The expression is case-sensitive, so, for example \\cA differs from \\ca, while they both have to produce ctrl-A. It is proposed to make parsing more strict and reject invalid values of x, and also clarify the documentation to explicitly list valid values of x. If we agree on this proposal, then a CSR will probably need to be filed to capture the changes in the regexp parsing. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230365 WEBREV: http://cr.openjdk.java.net/~igerasim/8230365/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR 8230365 : Pattern for a control-char matches non-control characters
Hello! In a regular expression pattern a sequence of the form \\cx is allowed to specify a control character that corresponds to the name char x. Current implementation has a few issues with that: 1) It allows x to be just any character, including non-printable ones; 2) The produced regexp may correspond to a non-control characters; 3) The expression is case-sensitive, so, for example \\cA differs from \\ca, while they both have to produce ctrl-A. It is proposed to make parsing more strict and reject invalid values of x, and also clarify the documentation to explicitly list valid values of x. If we agree on this proposal, then a CSR will probably need to be filed to capture the changes in the regexp parsing. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230365 WEBREV: http://cr.openjdk.java.net/~igerasim/8230365/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8230338 : Accurate error message about bad Unicode block name
Thank you Roger! On 8/29/19 6:39 AM, Roger Riggs wrote: Looks fine Ivan, Thanks, Roger -- With kind regards, Ivan Gerasimov
RFR (S) 8230338 : Accurate error message about bad Unicode block name
Hello! When Pattern.compile complains about a wrong Unicode block name, it includes the prefix "In" twice, like this: ''Unknown character property name {*In*/Is*In*BadBlockName} near index NN '' Would you please help review a trivial fix, which makes the message more accurate? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8230338 WEBREV: http://cr.openjdk.java.net/~igerasim/8230338/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR (S) 8226831 : Use Objects.equals() when appropriate
Thank you Roger and Brian! The fix has been pushed. With kind regards, Ivan On 8/27/19 1:12 PM, Brian Burkhalter wrote: Hi Ivan, +1 Brian On Aug 27, 2019, at 12:41 PM, Roger Riggs wrote: Hi Ivan, These look fine. Regards, Roger On 8/27/19 2:21 PM, Ivan Gerasimov wrote: Hello! null-aware comparison of Objects is widely used, so it's tempting to use Objects.equals for that. Unfortunately, there are concerns w.r.t performance (see JDK-8015417), so it seems to better avoid this kind of refactoring in the performance critical code. Please find the webrev with a few places cleaned up (I had more, but decided to limit this patch due to the concern mentioned above). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8226831 WEBREV: http://cr.openjdk.java.net/~igerasim/8226831/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
RFR (S) 8226831 : Use Objects.equals() when appropriate
Hello! null-aware comparison of Objects is widely used, so it's tempting to use Objects.equals for that. Unfortunately, there are concerns w.r.t performance (see JDK-8015417), so it seems to better avoid this kind of refactoring in the performance critical code. Please find the webrev with a few places cleaned up (I had more, but decided to limit this patch due to the concern mentioned above). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8226831 WEBREV: http://cr.openjdk.java.net/~igerasim/8226831/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
Thank you Brian! This looks good to me. With kind regards, Ivan On 8/23/19 1:33 PM, Brian Burkhalter wrote: Hi Ivan, On Aug 23, 2019, at 12:41 PM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: One minor comment. Here: 3990 if (digits > 0) { 3991 padWithZeros(buf, digits); 3992 } else { 3993 buf.append('0'); 3994 } I cannot really see how digits may be <= 0, so it seems to me it can be safely replaced by just one line `padWithZeros(buf, digits);`. Actually I don’t see how digits may be non-positive for signum == 0 either. I’ve reduced the above to one line (L3990) in [1]. That is the only change versus version .04. Alternatively, the entire branch `if (signum == 0) {` can be removed from smallToString (so that this method will only work with strictly positive numbers), and done only for results[1] at line 4083 because it is the only possible source of ZERO values in the process. It also seems that the arithmetic in the very internal loop at 4005-4017 can be slightly optimized, but this probably can be left for another enhancement. I think I’ll leave it as is for now. Thanks! Brian [1] http://cr.openjdk.java.net/~bpb/8229845/webrev.05/ -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
Hi Brian. This looks good to me, thanks! One minor comment. Here: 3990 if (digits > 0) { 3991 padWithZeros(buf, digits); 3992 } else { 3993 buf.append('0'); 3994 } I cannot really see how digits may be <= 0, so it seems to me it can be safely replaced by just one line `padWithZeros(buf, digits);`. Alternatively, the entire branch `if (signum == 0) {` can be removed from smallToString (so that this method will only work with strictly positive numbers), and done only for results[1] at line 4083 because it is the only possible source of ZERO values in the process. It also seems that the arithmetic in the very internal loop at 4005-4017 can be slightly optimized, but this probably can be left for another enhancement. With kind regards, Ivan On 8/23/19 10:11 AM, Brian Burkhalter wrote: Hi Ivan, On Aug 22, 2019, at 1:24 PM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: I have a few further suggestions how to simplify the code. […] Those are all good suggestions: thanks! Here's the patch, which illustrates all the suggestions above: http://cr.openjdk.java.net/~igerasim/8229845/00/webrev/ (I have only run the tests from test/jdk/java/math/BigInteger to verify it). This patch has a problem: 4023 padWithZeros(buf, digits - s.length() + (numGroups - 1) * digitsPerLong[radix]); It is missing parentheses: 4023 padWithZeros(buf, digits - (s.length() + (numGroups - 1) * digitsPerLong[radix])); I was unable to find this by debugging but implemented your suggestions by hand and then found it by diff-ing the patches. My updated versions with your suggestions incorporated are: [1] delta, [2] absolute. I also increased the size of some of the values tested at line 798 of BigIntegerTest. Thanks, Brian [1] http://cr.openjdk.java.net/~bpb/8229845/webrev.03-04/ [2] http://cr.openjdk.java.net/~bpb/8229845/webrev.04/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath
Hi Julia! On 8/23/19 3:36 AM, Julia Boes wrote: Hi Ivan, The change of the javadoc was made per Brian's request for consistency with java.lang.Math, see the description of the bug: https://bugs.openjdk.java.net/browse/JDK-8229485 Ah, I didn't realize that the operations listed in that paragraph are those that *do not* have a corresponding -Exact method. The fix looks good then, thanks. With kind regards, Ivan Best, Julia On 22/08/2019 17:25, Ivan Gerasimov wrote: Is there a reason to change the text at lines 72-75? Since you've just added the increment/decrement/negate methods, the old text seems to be more complete. With kind regards, Ivan -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
Thank you Brian, this looks much better! I have a few further suggestions how to simplify the code. 1) If this BigInteger is negative, it is negated in two places: @ 3943 and either @ 4003 or @ 4061. It is better to keep abs value at the very beginning and make private toString(...) and smallToString() to only accept non-negative numbers. 2) Once the 1) is done, it is possible to remove lines 3948-3954, and always call private toString(...), which will immediately delegate to smallToString() if necessary. This is just to make the code shorter. 3) In padWithZeros(), the condition check `if (digits > 0 && len < digits) {` is not necessary. Actually, I would suggest to make a single argument, say, numLeadingZeros (in the current code it is `m`). 4) I think that logic with making the argument `digits` == -1 for the first chunk of digits is over-complication: `digits` can always be safely set to 1 in the initial call to the private toString() or smallToString(). Then the variable atBeginning is not needed at all. 5) If 3) and 4) are done, the two lines 3988 and 3989 can be reduced to `padWithZeros(buf, digits);` Here's the patch, which illustrates all the suggestions above: http://cr.openjdk.java.net/~igerasim/8229845/00/webrev/ (I have only run the tests from test/jdk/java/math/BigInteger to verify it). With kind regards, Ivan On 8/22/19 11:04 AM, Brian Burkhalter wrote: Hi Ivan, Please see the delta [1] and absolute [2] webrevs. On Aug 21, 2019, at 7:38 PM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: A couple of comments 1) 3974 if (signum == 0) { 3975 buf.append('0'); 3976 return; 3977 } Shouldn't it be padded with zeroes, if digits > 0? If I'm not mistaken, it could happen if result[1] at the end of toString() happens to be ZERO. Good catch: you are absolutely correct. It that's true, then it may be a good idea to add a testcase for a number with many trailing zeros in a string representation. Currently, test/jdk/java/math/BigInteger/BigIntegerTest.java in stringConv() operates on random numbers, which are unlikely to have huge number of trailing zeros. I added testing this to stringConv(). The test fails without the most recent delta applied to the implementation. 2) 4120 /* zero is a string of NUM_ZEROS consecutive zeros. */ 4121 private static final String zeros = "0".repeat(NUM_ZEROS); Nit in the comment: name of the constant should be zero*s*. I changed the variable name to ZEROS as it is a constant. Thanks, Brian [1] http://cr.openjdk.java.net/~bpb/8229845/webrev.02-03/ [2] http://cr.openjdk.java.net/~bpb/8229845/webrev.03/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath
Thank you Julia! On 8/22/19 2:06 AM, Julia Boes wrote: Hi Ivan, Thanks for pointing that out, the links are now added. Updated webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.03/ Is there a reason to change the text at lines 72-75? Since you've just added the increment/decrement/negate methods, the old text seems to be more complete. With kind regards, Ivan Regards, Julia On 21/08/2019 20:33, Ivan Gerasimov wrote: Hi Julia! This looks good to me. For all the other methods from StrictMath that have a corresponding method in Math there is a link in javadoc `@see Math#xxx`. I think, it makes sense to provide such links for all new methods for consistency. With kind regards, Ivan On 8/21/19 6:29 AM, Julia Boes wrote: Hi all, Thanks for the review. I incorporated the following changes: - added "@since 14" tag - removed test code from the CSR specification - minor fixes in StrictMath.java, ExactArithTests.java as per Ivan's comments. The related fix in Math.java is included in this CSR: https://bugs.openjdk.java.net/browse/JDK-8229473 Updated webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.02/index.html Regards, Julia On 14/08/2019 20:45, Joe Wang wrote: Hi Daniel, That makes sense. So it meant to say "converts the long to int only if it's within the range of an integer" (to be "exact"). It may then add "throws an exception if not". But the original is fine, just my 2 cents. Best, Joe On 8/14/19 12:21 PM, Daniel Fuchs wrote: Hi Joe, On 14/08/2019 20:15, Joe Wang wrote: The 2nd part is not necessary as that's what the "@throws" tag is for. Not withstanding the fact that this is a copy of the API doc from java.lang.Math, I'd argue that the second part is quite important. It's why the method is called exact, and is probably the only reason why you would call that method in the first place. So I think the 2nd part must be in the synopsis too. best regards, -- daniel On 8/14/19 9:01 AM, Julia Boes wrote: > > Hi, > > This fix adds decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath. The methods were added to java.lang.Math previously [1] and should have been added to java.lang.StrictMath for consistency. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8229485 > > Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/ > > CSR: https://bugs.openjdk.java.net/browse/JDK-8229702 > > Thanks, > > Julia > > [1] https://bugs.openjdk.java.net/browse/JDK-8022109 -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
Hi Brian! A couple of comments 1) 3974 if (signum == 0) { 3975 buf.append('0'); 3976 return; 3977 } Shouldn't it be padded with zeroes, if digits > 0? If I'm not mistaken, it could happen if result[1] at the end of toString() happens to be ZERO. It that's true, then it may be a good idea to add a testcase for a number with many trailing zeros in a string representation. Currently, test/jdk/java/math/BigInteger/BigIntegerTest.java in stringConv() operates on random numbers, which are unlikely to have huge number of trailing zeros. 2) 4120 /* zero is a string of NUM_ZEROS consecutive zeros. */ 4121 private static final String zeros = "0".repeat(NUM_ZEROS); Nit in the comment: name of the constant should be zero*s*. With kind regards, Ivan On 8/21/19 5:28 PM, Brian Burkhalter wrote: Hello, On Aug 20, 2019, at 4:31 PM, Brian Burkhalter wrote: On Aug 20, 2019, at 2:45 PM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: Would it make sense to add an argument `digits` to smallToString (like the same named argument of toString, the minimum number of digits to pad to), so it could zero-pad the result itself? This way we could avoid inserting zeros into the middle of a StringBuilder after a call to smallToString. Indeed I do not like the zero insertion either. I’ll investigate your idea. I modified the patch so that characters are exclusively appended to the StringBuilder thereby eliminating copying of characters which are already present after the insertion point, viz., the padding with zeros after calling smallToString() in the recursive path. A delta versus version .01 is at [1] and the diff versus the tip at [2]. Thanks, Brian [1] http://cr.openjdk.java.net/~bpb/8229845/webrev.01-02/ [2] http://cr.openjdk.java.net/~bpb/8229845/webrev.02/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath
Hi Julia! This looks good to me. For all the other methods from StrictMath that have a corresponding method in Math there is a link in javadoc `@see Math#xxx`. I think, it makes sense to provide such links for all new methods for consistency. With kind regards, Ivan On 8/21/19 6:29 AM, Julia Boes wrote: Hi all, Thanks for the review. I incorporated the following changes: - added "@since 14" tag - removed test code from the CSR specification - minor fixes in StrictMath.java, ExactArithTests.java as per Ivan's comments. The related fix in Math.java is included in this CSR: https://bugs.openjdk.java.net/browse/JDK-8229473 Updated webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.02/index.html Regards, Julia On 14/08/2019 20:45, Joe Wang wrote: Hi Daniel, That makes sense. So it meant to say "converts the long to int only if it's within the range of an integer" (to be "exact"). It may then add "throws an exception if not". But the original is fine, just my 2 cents. Best, Joe On 8/14/19 12:21 PM, Daniel Fuchs wrote: Hi Joe, On 14/08/2019 20:15, Joe Wang wrote: The 2nd part is not necessary as that's what the "@throws" tag is for. Not withstanding the fact that this is a copy of the API doc from java.lang.Math, I'd argue that the second part is quite important. It's why the method is called exact, and is probably the only reason why you would call that method in the first place. So I think the 2nd part must be in the synopsis too. best regards, -- daniel On 8/14/19 9:01 AM, Julia Boes wrote: > > Hi, > > This fix adds decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath. The methods were added to java.lang.Math previously [1] and should have been added to java.lang.StrictMath for consistency. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8229485 > > Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/ > > CSR: https://bugs.openjdk.java.net/browse/JDK-8229702 > > Thanks, > > Julia > > [1] https://bugs.openjdk.java.net/browse/JDK-8022109 -- With kind regards, Ivan Gerasimov
Re: RFR 8229958: Provider.getService() scalability issue for legacy algorithms and message digests
Hi Letu! The fix introduces a read of non-volatile variable legacyChanged outside of synchronized block, which is not guaranteed to produce consistent results. (Please note that in the mentioned fix for JDK-7092821 the variable servicesChanged was made volatile, so that it could be accessed outside the synchronized blocks.) With kind regards, Ivan On 8/20/19 10:18 PM, Yang, Letu wrote: Hi, Please review the fix of https://bugs.openjdk.java.net/browse/JDK-8229958 where I made the change to allow majority of calls don't have to acquire the locks when checking the availability of the Provider object. Similar effort had been made in fixing https://bugs.openjdk.java.net/browse/JDK-7092821 , but it only helped the calls for new encryption algorithms. Xin had helped me to upload the CR: https://cr.openjdk.java.net/~xliu/8229958/webrev/ . I had run tier-1 tests, and also a JMH test to confirm its performance improvement. We have also had it running in a load test environment of an application for a couple of days, and the improvement was confirmed as well. Best regards, Letu -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
Hi Brian! Would it make sense to add an argument `digits` to smallToString (like the same named argument of toString, the minimum number of digits to pad to), so it could zero-pad the result itself? This way we could avoid inserting zeros into the middle of a StringBuilder after a call to smallToString. With kind regards, Ivan On 8/20/19 11:51 AM, Brian Burkhalter wrote: Delta [1] and revised [2] patches: [1] http://cr.openjdk.java.net/~bpb/8229845/webrev.00-01/ [2] http://cr.openjdk.java.net/~bpb/8229845/webrev.01/ On Aug 20, 2019, at 1:05 AM, Aleksey Shipilev <mailto:sh...@redhat.com>> wrote: On 8/19/19 10:08 PM, Brian Burkhalter wrote: [2]http://cr.openjdk.java.net/~bpb/8229845/webrev.00/ Two drive-by comments: *) It seems the "signum == 0" case in smallToString is regressing? Before, it just returned the constant-pooled "0", now it goes via StringBuilder; Yes, and I don’t see any way not to have it do so. Note that due to lines 3933-3936 in [2] this case will never be hit for values which have no more than 20 ints in their magnitude, i.e., those which do not recurse in toString(BigInteger,StringBuilder,int,int). *) This might be "static final", while we are at it: 4109 private static String[] zeros = new String[64]; Above made final at L4111 in [2]. On Aug 20, 2019, at 10:28 AM, Ivan Gerasimov mailto:ivan.gerasi...@oracle.com>> wrote: While we're here. On 8/20/19 1:05 AM, Aleksey Shipilev wrote: *) This might be "static final", while we are at it: 4109 private static String[] zeros = new String[64]; It may make sense to create the only string as private static final String zeros = "0".repeat(63); Changed at L4111 in [2]. and then replace all sb.append(zeros[x]) with sb.append(zeros, 0, x) and L4006 in [2]. sb.insert(pos, zeros[x]) with sb.insert(pos, zeros, 0, x). L4049 and L4054 in [2]. The difference would be in one extra bounds check, and as both append() and insert() are relatively expensive, this check would hardly be noticeable. Thanks, Brian -- With kind regards, Ivan Gerasimov
RFR 8211360 : Change #if DEF to #if defined(DEF)
Hello! It's a followup for JDK-8211146. With that fix several C-preprocessor statements of form #elif __linux__ were changed to more accurate #elif defined(__linux__). grep found a few more occurrences of the same pattern, which also would better be cleaned up. BUGURL: https://bugs.openjdk.java.net/browse/JDK-8211360 WEBREV: http://cr.openjdk.java.net/~igerasim/8211360/00/webrev/ Mach5 control build/testing went fine on all platforms. Would you please help review? -- With kind regards, Ivan Gerasimov
Re: 8229845: Decrease memory consumption of BigInteger.toString()
While we're here. On 8/20/19 1:05 AM, Aleksey Shipilev wrote: *) This might be "static final", while we are at it: 4109 private static String[] zeros = new String[64]; It may make sense to create the only string as private static final String zeros = "0".repeat(63); and then replace all sb.append(zeros[x]) with sb.append(zeros, 0, x) and sb.insert(pos, zeros[x]) with sb.insert(pos, zeros, 0, x). The difference would be in one extra bounds check, and as both append() and insert() are relatively expensive, this check would hardly be noticeable. -- With kind regards, Ivan Gerasimov
Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath
A tiny addition: On 8/14/19 10:56 AM, Ivan Gerasimov wrote: Hi Julia! In ExactArithTests.java 1) 169 long neg2 = -((long)x) ; please remove the superfluous space at the end. 2) 176 long neg2 = (long) x - 1L; should be `long neg2 = -((long)x);` In StrictMath.java 3) while you're here, could you please replace semicolon with comma: 916 * Returns the value of the {@code long} argument; 917 * throwing an exception if the value overflows an {@code int}. In src/java.base/share/classes/java/lang/Math.java 4) at line 1061 * Returns the value of the {@code long} argument; * throwing an exception if the value overflows an {@code int}. analogously to (3) above, could you please s/;/,/ ? Thanks! With kind regards, Ivan On 8/14/19 9:01 AM, Julia Boes wrote: Hi, This fix adds decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath. The methods were added to java.lang.Math previously [1] and should have been added to java.lang.StrictMath for consistency. Bug: https://bugs.openjdk.java.net/browse/JDK-8229485 Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/ CSR: https://bugs.openjdk.java.net/browse/JDK-8229702 Thanks, Julia [1] https://bugs.openjdk.java.net/browse/JDK-8022109 -- With kind regards, Ivan Gerasimov
Re: RFR: 8229485: Add decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath
Hi Julia! In ExactArithTests.java 1) 169 long neg2 = -((long)x) ; please remove the superfluous space at the end. 2) 176 long neg2 = (long) x - 1L; should be `long neg2 = -((long)x);` In StrictMath.java 3) while you're here, could you please replace semicolon with comma: 916 * Returns the value of the {@code long} argument; 917 * throwing an exception if the value overflows an {@code int}. With kind regards, Ivan On 8/14/19 9:01 AM, Julia Boes wrote: Hi, This fix adds decrementExact(), incrementExact(), and negateExact() to java.lang.StrictMath. The methods were added to java.lang.Math previously [1] and should have been added to java.lang.StrictMath for consistency. Bug: https://bugs.openjdk.java.net/browse/JDK-8229485 Webrev: http://cr.openjdk.java.net/~dfuchs/jboes/8229485/webrev.01/ CSR: https://bugs.openjdk.java.net/browse/JDK-8229702 Thanks, Julia [1] https://bugs.openjdk.java.net/browse/JDK-8022109 -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information
Hello! A simple benchmark did not detect any negative impact of delegating substring(int) to substring(int, int): Before fix: Benchmark Mode Cnt Score Error Units StringSubstring.from26toEnd0 avgt 12 20.279 ± 0.306 ns/op StringSubstring.from26toEnd1 avgt 12 20.677 ± 1.924 ns/op After fix: Benchmark Mode Cnt Score Error Units StringSubstring.from26toEnd0 avgt 12 19.835 ± 0.135 ns/op StringSubstring.from26toEnd1 avgt 12 20.784 ± 1.791 ns/op So here's the updated webrev: http://cr.openjdk.java.net/~igerasim/8221307/01/webrev/ Will you approve it to go? With kind regards, Ivan On 8/8/19 6:56 AM, Claes Redestad wrote: Hi, perhaps a stupid question, but why isn't String::substring(int) calling substring(beginIndex, length())? That'd ensure consistent error messages, but otherwise preserve semantics. /Claes On 2019-08-08 02:17, Ivan Gerasimov wrote: Hello! The exception thrown by substring(int) may look confusing. For example it produces "String index out of range: -1" when the index is length+1. It is proposed to make the error message more clear, similar to what we have for substring(int, int). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307 WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/ Would you please help review? -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information
Hi Roger! Thanks for the comments! On 8/8/19 6:50 AM, Roger Riggs wrote: Hi Ivan, To be consistent with other checks of the index, can you use the checkIndex(index, length) method? The message it produces would be appropriate for both cases. Unfortunately, checkIndex(int, int) wouldn't be appropriate here because it only allows index < length, while we need index <= length here. I think it makes sense to adopt the suggestion from Claes and just delegate to substring(int,int). While you are there can you add a space after the comma in lines 3680 and 3691. Yes, done. Is there an appropriate test? There should be. The substring() method is used across many regression tests, so we do have sanity checking. I didn't think it's worth it to check for specific error message. With kind regards, Ivan Thanks, Roger On 8/7/19 8:17 PM, Ivan Gerasimov wrote: Hello! The exception thrown by substring(int) may look confusing. For example it produces "String index out of range: -1" when the index is length+1. It is proposed to make the error message more clear, similar to what we have for substring(int, int). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307 WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/ Would you please help review? -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information
Hi Claes! On 8/8/19 6:56 AM, Claes Redestad wrote: Hi, perhaps a stupid question, but why isn't String::substring(int) calling substring(beginIndex, length())? That'd ensure consistent error messages, but otherwise preserve semantics. It's a good point, actually! I suspect that originally substring(int) had a separate implementation for performance reason. I agree, it would make most sense to implement one via another unless there's a performance penalty. Let me do some measurements, and then I'll report back. With kind regards, Ivan /Claes On 2019-08-08 02:17, Ivan Gerasimov wrote: Hello! The exception thrown by substring(int) may look confusing. For example it produces "String index out of range: -1" when the index is length+1. It is proposed to make the error message more clear, similar to what we have for substring(int, int). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307 WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/ Would you please help review? -- With kind regards, Ivan Gerasimov
RFR (XS) 8221307 : String.substring() OOB exception on start index reports improper information
Hello! The exception thrown by substring(int) may look confusing. For example it produces "String index out of range: -1" when the index is length+1. It is proposed to make the error message more clear, similar to what we have for substring(int, int). BUGURL: https://bugs.openjdk.java.net/browse/JDK-8221307 WEBREV: http://cr.openjdk.java.net/~igerasim/8221307/00/webrev/ Would you please help review? -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8228352 : CANON_EQ breaks when pattern contains supplementary codepoint
Thank you Naoto for the review! With kind regards, Ivan On 8/1/19 10:02 AM, naoto.s...@oracle.com wrote: Hi Ivan, The change looks good to me. Naoto On 7/31/19 9:21 PM, Ivan Gerasimov wrote: Hello! Pattern.compile fails with IOOB when normalizing a pattern that contains a surrogate pair. Would you please help review a trivial 1-line fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8228352 WEBREV: http://cr.openjdk.java.net/~igerasim/8228352/00/webrev/ -- With kind regards, Ivan Gerasimov
RFR (XS) 8228352 : CANON_EQ breaks when pattern contains supplementary codepoint
Hello! Pattern.compile fails with IOOB when normalizing a pattern that contains a surrogate pair. Would you please help review a trivial 1-line fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8228352 WEBREV: http://cr.openjdk.java.net/~igerasim/8228352/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: 8193072: File.delete() should remove its path from DeleteOnExitHook.files
On 7/10/19 5:17 PM, Brian Burkhalter wrote: I incorporated Peter’s version, adding the security check in cancelDeleteOnExit(), tweaking its verbiage along with that of deleteOnExit(), and modified the test DeleteOnExit to verify the new method. The updated version is here: http://cr.openjdk.java.net/~bpb/8193072/webrev.03/ <http://cr.openjdk.java.net/~bpb/8193072/webrev.03/> There is possibility of a race here in a scenario like this: File dir = new File("dir"); File file = new File("dir/file"); -- thread 1 -- dir.deleteOnExit(); file.deleteOnExit(); /// dir.cancelDeleteOnExit(); thread 2 intervenes file.cancelDeleteOnExit(); -- end -- -- thread 2 -- dir.deleteOnExit(); file.deleteOnExit(); -- end -- The result will be that the order of the registered files will change, and JVM will try to delete dir first (this will fail as it is not empty). Of course it could be avoided, if cancellation were done in reverse order, though it's not immediately obvious from the documentation. With kind regards, Ivan Thanks, Brian On Jul 10, 2019, at 11:17 AM, Brian Burkhalter wrote: On Jul 10, 2019, at 5:36 AM, Peter Levart wrote: There are various interleavings of threads that could cause the file to be left undeleted when VM exits. To cover concurrent scenarios such as above, the code could use reference counting. Like in the following patch: http://cr.openjdk.java.net/~plevart/jdk-dev/8193072_File.undoDeleteOnExit/webrev.01/ <http://cr.openjdk.java.net/~plevart/jdk-dev/8193072_File.undoDeleteOnExit/webrev.01/> This looks good to me modulo adding this SecurityManager security = System.getSecurityManager(); if (security != null) { security.checkDelete(path); } to cancelDeleteOnExit() as suggested below. On Jul 10, 2019, at 7:51 AM, Sean Mullan wrote: On 7/9/19 7:40 PM, Brian Burkhalter wrote: I don’t know. On the one hand this does not take an action like reading, writing, or deleting, but on the other it could end up causing files to be left lying around after VM termination which were expected to be deleted. I suppose that could be considered to be some sort of security issue. Yes I think so. I would probably just use the same permission (FilePermission(file,"delete")). If you have been granted permission to delete a file, then you should have permission to cancel that deletion as well. That’s a good idea. -- With kind regards, Ivan Gerasimov
Re: 8193072: File.delete() should remove its path from DeleteOnExitHook.files
Hi Brian! I believe this would introduce a behavior change in a scenario lile: File f = ...; f.deleteOnExit(); f.delete(); f.createNewFile(); I.e. when the with the same name is recreated. Current behavior is to unlink such a file before exiting, no matter if it were explicitly deleted and then recreated or not. With kind regards, Ivan On 7/8/19 1:11 PM, Brian Burkhalter wrote: https://bugs.openjdk.java.net/browse/JDK-8193072 <https://bugs.openjdk.java.net/browse/JDK-8193072> There does appear to be a memory leak of sorts if one does something like — File[] files; for (int i = 0; i < largeNumber; i++) { files[i] = File.createTempFile(“blah”, null); files[i].deleteOnExit(); } // do something for (int i = 0; i < largeNumber; i++) { files[i].delete(); } // do something else before shutdown — The LinkedHashSet in DeleteOnExitHook will contain at least largeNumber Files until the VM shuts down even though the files were deleted. The potential change is included below. The additional call to DeleteOnExitHook.remove() in File.delete() does not appear to have a measurable performance impact, at least trivially and in isolation. Thanks, Brian --- a/src/java.base/share/classes/java/io/DeleteOnExitHook.java +++ b/src/java.base/share/classes/java/io/DeleteOnExitHook.java @@ -64,6 +64,15 @@ files.add(file); } +static synchronized void remove(String file) { +if(files == null) { +// DeleteOnExitHook is running. Too late to remove a file +throw new IllegalStateException("Shutdown in progress"); +} + +files.remove(file); +} + static void runHooks() { LinkedHashSet theFiles; --- a/src/java.base/share/classes/java/io/File.java +++ b/src/java.base/share/classes/java/io/File.java @@ -1050,6 +1050,7 @@ if (isInvalid()) { return false; } +DeleteOnExitHook.remove(path); return fs.delete(this); } -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups
Thank you Stuart for review and the edits! On 6/28/19 4:30 PM, Stuart Marks wrote: On 6/27/19 2:43 PM, Ivan Gerasimov wrote: Yes, good point about the spec change. Here's the CSR filed: https://bugs.openjdk.java.net/browse/JDK-8226914 Can you please review it at your convenience? Thanks for filing this. I've made some minor edits and have marked it reviewed. Note that when filing CSRs there are a few fields that need to be filled in but that aren't normally visible when viewing the issue; you have to press the Edit button to see them. These fields include Compatibility Risk, Compatibility Risk Description, Scope, and Reviewed By. I've filled these in appropriately. I think this is OK for you to mark Finalized now. Thanks, s'marks -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups
Hi Stuart! On 6/27/19 2:25 PM, Stuart Marks wrote: Hi Ivan, I think this fix is correct and clearly the spec needed to be fixed. Since it is a change to the normative portion of the spec, though, it needs a CSR. This is probably mostly a formality to make sure the change is tracked for TCK purposes. Can you file a CSR for this please? Let me know if you need a review for it. Yes, good point about the spec change. Here's the CSR filed: https://bugs.openjdk.java.net/browse/JDK-8226914 Can you please review it at your convenience? With kind regards, Ivan Thanks, s'marks -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups
Thank you Brent for review! On 6/25/19 10:40 AM, Brent Christian wrote: That looks fine, Ivan. -Brent On 6/24/19 10:46 PM, Ivan Gerasimov wrote: Hello! Would someone volunteer to review this extra-small doc-only fix? Thanks in advance! With kind regards, Ivan On 5/28/19 9:33 AM, Ivan Gerasimov wrote: Hello! When the fix for JDK-7039066 (which added support for the flag UNICODE_CHARACTER_CLASS and corresponding embedded expression (?U)) was integrated back in JDK 7, the documentation for the non-capturing groups was not updated accordingly. Would you please help review the fix, which only updates the javadoc? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224849 WEBREV: http://cr.openjdk.java.net/~igerasim/8224849/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count
Hello! On 6/25/19 3:33 PM, Joseph D. Darcy wrote: Hello, Please file a CSR for this change; thanks, Filed: https://bugs.openjdk.java.net/browse/JDK-8226784 Please take a look at your convenience. With kind regards, Ivan -Joe On 6/25/2019 11:30 AM, Brent Christian wrote: I was musing about this myself. If the changes are within the @implNote(s), then perhaps not? -Brent On 6/25/19 11:15 AM, Brian Burkhalter wrote: > Is a CSR in order? Thanks, Brian On Jun 25, 2019, at 5:49 AM, Ivan Gerasimov wrote: Hello! Would someone volunteer to review this extra-small doc-only fix? Thanks in advance! With kind regards, Ivan On 5/29/19 2:39 PM, Ivan Gerasimov wrote: Hello! In the implNote section of the javadoc for xxxSummaryStatistics classes it is mentioned that the sum is not checked for overflow. It would be more accurate to add that neither does the implementation check the count for overflow. Would you please help review this doc-only change? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716 WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/ -- With kind regards, Ivan Gerasimov -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count
Hello! Would someone volunteer to review this extra-small doc-only fix? Thanks in advance! With kind regards, Ivan On 5/29/19 2:39 PM, Ivan Gerasimov wrote: Hello! In the implNote section of the javadoc for xxxSummaryStatistics classes it is mentioned that the sum is not checked for overflow. It would be more accurate to add that neither does the implementation check the count for overflow. Would you please help review this doc-only change? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716 WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR (XS) 8224849 : Flag (?U:...) is allowed for non-capturing groups
Hello! Would someone volunteer to review this extra-small doc-only fix? Thanks in advance! With kind regards, Ivan On 5/28/19 9:33 AM, Ivan Gerasimov wrote: Hello! When the fix for JDK-7039066 (which added support for the flag UNICODE_CHARACTER_CLASS and corresponding embedded expression (?U)) was integrated back in JDK 7, the documentation for the non-capturing groups was not updated accordingly. Would you please help review the fix, which only updates the javadoc? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224849 WEBREV: http://cr.openjdk.java.net/~igerasim/8224849/00/webrev/ Thanks in advance! -- With kind regards, Ivan Gerasimov
Re: RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet
On 6/6/19 8:39 AM, Andrew Haley wrote: On 6/6/19 12:35 PM, Ivan Gerasimov wrote: Could you please check it if it looks better now? http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/ OK, it's just a hint, fair enough. Patch is OK. Thank you! For an extra gold star (:-) you could consider adding a comment like "avoid overflow in the case of a humongous numBits". There's no need for another review if you do so. You got it! Pushed with the suggested comment added. With kind regards, Ivan
Re: RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet
Thank you Andrew! The multiplier 6 was pre-existent, and I don't really want to change behavior in this legacy class. The high limit of (Integer.MAX_VALUE - 20) was arbitrarily chosen. I don't think it matters much in this case, but I can change it to (Integer.MAX_VALUE - 8), which is used in several other places. I've just updated the code in-place to make the intent of the fix more clear. Could you please check it if it looks better now? http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/ Thanks in advance! Ivan On 6/6/19 3:57 AM, Andrew Haley wrote: On 6/6/19 10:18 AM, Ivan Gerasimov wrote: Hello! It is yet another instance of integer overflow under certain extreme circumstances. This time it is when calculating the initial capacity of a StringBuilder in BitSet.toString. If there are too many elements in the set, we can't do much anyway. The best effort is to avoid confusing NegativeArraySizeException and let the method throw OOM. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225397 WEBREV: http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/ @@ -1184,7 +1184,9 @@ int numBits = (wordsInUse > 128) ? cardinality() : wordsInUse * BITS_PER_WORD; -StringBuilder b = new StringBuilder(6*numBits + 2); +int sizeHint = (numBits <= (Integer.MAX_VALUE - 22) / 6) ? +6 * numBits + 2 : Integer.MAX_VALUE - 20; +StringBuilder b = new StringBuilder(sizeHint); b.append('{'); int i = nextSetBit(0); This needs a comment. What is significant about 6 and 22? -- With kind regards, Ivan Gerasimov
RFR (s) 8225397 : Integer value miscalculation in toString() method of BitSet
Hello! It is yet another instance of integer overflow under certain extreme circumstances. This time it is when calculating the initial capacity of a StringBuilder in BitSet.toString. If there are too many elements in the set, we can't do much anyway. The best effort is to avoid confusing NegativeArraySizeException and let the method throw OOM. Would you please help review the fix? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225397 WEBREV: http://cr.openjdk.java.net/~igerasim/8225397/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8225179: (regex) Minor Pattern cleanup
Hi Claes! Looks good to me, thanks! LookBehindNode may be better named LookBehindEndNode, as it should only match at the very end of the look-behind token. With kind regards, Ivan On 6/4/19 1:58 AM, Claes Redestad wrote: Hi, please review this j.u.regex.Pattern cleanup. - refactor BitClass to be a BmpCharPredicate (which allows removing two identical(!) lambdas), which improves startup and reduces allocations when compiling Patterns. - remove unused GroupRef class - made anonymous lookbehindEnd Node instance into an explicit class which will be lazily rather than eagerly loaded - various cleanups of unused variables, methods and redundant inititialization Webrev: http://cr.openjdk.java.net/~redestad/8225179/open.00/ Bug:https://bugs.openjdk.java.net/browse/JDK-8225179 Testing: tier1-3 Thanks! /Claes -- With kind regards, Ivan Gerasimov
Re: RFR 8225198 : Optimize regex tree for greedy quantifiers of type {N, }
Thank you Claes and Brent for reviewing! I'll edit the comment as suggested before pushing. And yes, we do have a few testcases to verify this optimization in RegExTest.java / BMPTestCases.txt. With kind regards, Ivan On 6/4/19 2:30 PM, Brent Christian wrote: Hi, Ivan The change looks fine. I would call this "noreg-cleanup". Tests are needed to confirm the correctness of the new code, which I imagine we already have. One small suggestion: src/java.base/share/classes/java/util/regex/Pattern.java 4271 * and unlimited maximum, for *, + and {N,} quantifiers. If the maximum is still MAX_REPS, I'd prefer to continue to say that (rather than, "unlimited.") Thanks. -Brent On 6/3/19 12:54 PM, Ivan Gerasimov wrote: Hello! When building a match-tree, the regex engine optimizes '*' and '+' greedy quantifiers by using special node of type CharPropertyGreedy. This later class can be used unmodified for other greedy quantifiers without the upper limit, i.e. of type '{N,}'. Would you please help review this simple optimization? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225198 WEBREV: http://cr.openjdk.java.net/~igerasim/8225198/00/webrev/ (With the fix, one unused class was also removed. This class was to implement conditionals, which were never supported by Java regexs.) -- With kind regards, Ivan Gerasimov
RFR 8225198 : Optimize regex tree for greedy quantifiers of type {N, }
Hello! When building a match-tree, the regex engine optimizes '*' and '+' greedy quantifiers by using special node of type CharPropertyGreedy. This later class can be used unmodified for other greedy quantifiers without the upper limit, i.e. of type '{N,}'. Would you please help review this simple optimization? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8225198 WEBREV: http://cr.openjdk.java.net/~igerasim/8225198/00/webrev/ (With the fix, one unused class was also removed. This class was to implement conditionals, which were never supported by Java regexs.) -- With kind regards, Ivan Gerasimov
RFR 8224716 : Javadoc of Int/Long/DoubleSummaryStatistics should mention possible overflow of count
Hello! In the implNote section of the javadoc for xxxSummaryStatistics classes it is mentioned that the sum is not checked for overflow. It would be more accurate to add that neither does the implementation check the count for overflow. Would you please help review this doc-only change? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224716 WEBREV: http://cr.openjdk.java.net/~igerasim/8224716/00/webrev/ -- With kind regards, Ivan Gerasimov
Re: RFR: 8224986: (str) optimize StringBuilder.append(CharSequence, int, int) for String arguments
Hi Claes! It looks like for the cases when this.isLatin1() != s.isLatin1() the code is essentially the same as in appendChars(CharSequence, int, int). So, to save some byte codes (which can actually help with inlining), it might have been written as: private final void appendChars(String s, int off, int end) { if (isLatin1() ^ s.isLatin1()) { appendChars((CharSequence)s, off, end); } else { System.arraycopy(s.value(), off << coder, this.value, this.count << coder, (end - off) << coder); count += end - off; } } With kind regards, Ivan On 5/29/19 8:36 AM, Claes Redestad wrote: Hi, it's been pointed out[1] that append(string.substring(start, end)) can outperform append(string, start, end) in some circumstances. In part due the former being intrinsified, but also I think due it tickling some string concat optimizations that allow for eliding the StringBuilder itself completely when there's only one non-constant argument. While not ruling out intrinsification of append(String, int, int), I was able to achieve a reasonable speed-up by peeling off String arguments to append(CharSequence, int, int) and dropping into a routine that uses System.arraycopy when appropriate. This seems like a reasonable improvement in the interim, and an opportunity to incorporate the relevant microbenchmarks. Bug:https://bugs.openjdk.java.net/browse/JDK-8224986 Webrev: http://cr.openjdk.java.net/~redestad/8224986/open.00/ I extended and incorporated the microbenchmark provided in [1] to also include variants with UTF16 Strings and mixes of arguments, ensuring speedups on all appendBounds* variants[2]. Thanks! /Claes [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2019-January/057834.html [2] Baseline: Benchmark(length) Mode Cnt Score Error Units appendBounds 1000 avgt 10 576.939 ± 21.125 ns/op ·gc.alloc.rate.norm 1000 avgt 10 2144.000 ± 0.001 B/op appendBoundsMix 1000 avgt 10 855.308 ± 27.071 ns/op ·gc.alloc.rate.norm 1000 avgt 10 3160.000 ± 0.001 B/op appendBoundsUtf161000 avgt 10 1424.518 ± 33.319 ns/op ·gc.alloc.rate.norm 1000 avgt 10 5192.000 ± 0.001 B/op Patch: Benchmark(length) Mode Cnt Score Error Units appendBounds 1000 avgt 10 466.640 ± 15.069 ns/op ·gc.alloc.rate.norm 1000 avgt 10 2120.000 ± 0.001 B/op appendBoundsMix 1000 avgt 10 758.886 ± 22.983 ns/op ·gc.alloc.rate.norm 1000 avgt 10 3160.000 ± 0.001 B/op appendBoundsUtf161000 avgt 10 1320.360 ± 49.097 ns/op ·gc.alloc.rate.norm 1000 avgt 10 5192.000 ± 0.001 B/op -- With kind regards, Ivan Gerasimov
Re: RFR 8224789 : Parsing repetition count in regex does not detect numeric overflow
Hi Brent! On 5/28/19 4:06 PM, Brent Christian wrote: Hi, Ivan I agree with Roger that there are more test cases than necessary. Otherwise I think it looks pretty good. Okay. Then let's make the list of invalid ranges shorter, but add a randomly generated value! Please find the updated webrev here: http://cr.openjdk.java.net/~igerasim/8224789/01/webrev/ With kind regards, Ivan I find the addExact/multiplyExact code less readable. I'm not sure what could be done about it - maybe some different indentation: cmin = Math.addExact( Math.multiplyExact(cmin, 10), ch - '0'); though that makes for some long lines. Just a thought. Thanks, -Brent On 5/24/19 11:28 PM, Ivan Gerasimov wrote: Hello! When Pattern.compile() parses the repetition count in the expressions like '.{100}', '.{1,2}' or '.{3,}' it fails to detect numeric overflow if the result is still non-negative. Could you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224789 WEBREV: http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/ Also, reading a char at line 3274 is done with skip(), so the exception thrown at 3315 displays the position of the error more accurately. -- With kind regards, Ivan Gerasimov
Re: RFR 8224789 : Parsing repetition count in regex does not detect numeric overflow
Thank you Roger for reviewing! On 5/28/19 9:33 AM, Roger Riggs wrote: Hi Ivan, test/jdk/java/util/regex/RegExTest.java: 4954: The test should print the failing exception information and 4951: a message if the Pattern.compile does not fail to distinguish that failure from any others. Yes, it will be useful, if the test fails. So, I added some debug output. The webrev was updated in place: http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/ I don't think there is a need for so many cases of values > 2^31; there is no different code path for all those values. They are fairly cheap, but redundant. I wanted to cover cases when the number would overflow 64-bit integer, so that the test would fail if the count were implemented as a long. Also, I tried to cover different configurations of the quantifier {N}, {N,}, {N,M} with both N and M overflowing int, unsigned int, long, unsigned long. With kind regards, Ivan Otherwise, looks fine. Thanks, Roger On 05/25/2019 02:28 AM, Ivan Gerasimov wrote: Hello! When Pattern.compile() parses the repetition count in the expressions like '.{100}', '.{1,2}' or '.{3,}' it fails to detect numeric overflow if the result is still non-negative. Could you please help review the patch? BUGURL: https://bugs.openjdk.java.net/browse/JDK-8224789 WEBREV: http://cr.openjdk.java.net/~igerasim/8224789/00/webrev/ Also, reading a char at line 3274 is done with skip(), so the exception thrown at 3315 displays the position of the error more accurately. -- With kind regards, Ivan Gerasimov