Re: How to extract matches from (\d+[hms])+ ?
java/time/Duration.java has the pattern for the duration, which Is similar to Guy's suggestion. On Sep 24, 2014, at 10:08 PM, Wang Weijun weijun.w...@oracle.com wrote: On Sep 25, 2014, at 13:06, Guy Steele guy.ste...@oracle.com wrote: (A lurker sticking his nose in here! :-) Is it your intent also to match 30s1h or 20m30m as a time duration? If not, you might be better off with a pattern such as ((\\d+)h)?((\\d+)m)?((\\d+)s)? and then the whole problem caused by the outer + iteration disappear (but you may need to check whether the original string was empty). Yes, this is much better. But maybe that takes all the fun out of it. Let someone else enjoy it then. :-) Thanks Max --Guy Steele On Sep 25, 2014, at 12:51 AM, Wang Weijun weijun.w...@oracle.com wrote: Hi Sherman I want to match a time duration like 1h20m30s and 2h. It looks like if I directly use the pattern ((\\d+)([hms]))+, group(2) and group (3) only return the last match (i.e. 30 and s for 1h20m30s). So I tried multiple matching with (\\d)([hms]) only, but find() does not always match from the beginning, and lookingAt() does not advance after one call. This is my code now; int start = 0; while (true) { if (!m.find() || m.start() != start) { throw new Exception(); } start = m.end(); print(m.group(1), m.group(2)); if (m.hitEnd()) break; } print(Done); Is this the correct way? Thanks Max
Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
Hi Peter, On 09/25/2014 02:46 AM, Peter Levart wrote: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java Interesting. I have to say it once again: a) It is *an error* to use static finals as the benchmark input. They are perfectly constant-foldable in way too many cases. Break this habit, please. b) Explicit Blackholes are not needed, and just returning int from @Benchmark method helps readability a lot. Please break this habit as well. Having readable and maintainable benchmarks is a key. This is really great! Couldn't this be a tweak via HotSpot, instead uglifying and bloating the Java and hence the byte code? +1 This is for HotSpot compiler guys to answer. Theoretically I think it is possible. But it would have to be tailored to the very specific use case and I don't know if such specific transformation would have wide-enough applicability. If it would only speed-up String.hashCode and very similar loops, it is less trouble to do that by hand in one or few places... I would think this happens in user-specified hashCode() over arrays. IDEs would routinely inject the loop like that or delegate to Arrays.hashCode, that does the same loop. In other words, I would like to see this fixed on compiler side. This seems to be the strength-reduction playing tricks with loop unrolling, I'll submit a bug shortly. -Aleksey.
Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
On 09/25/2014 11:40 AM, Aleksey Shipilev wrote: This is for HotSpot compiler guys to answer. Theoretically I think it is possible. But it would have to be tailored to the very specific use case and I don't know if such specific transformation would have wide-enough applicability. If it would only speed-up String.hashCode and very similar loops, it is less trouble to do that by hand in one or few places... I would think this happens in user-specified hashCode() over arrays. IDEs would routinely inject the loop like that or delegate to Arrays.hashCode, that does the same loop. In other words, I would like to see this fixed on compiler side. This seems to be the strength-reduction playing tricks with loop unrolling, I'll submit a bug shortly. https://bugs.openjdk.java.net/browse/JDK-8059113 -Aleksey.
Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
On 09/25/2014 09:40 AM, Aleksey Shipilev wrote: Hi Peter, On 09/25/2014 02:46 AM, Peter Levart wrote: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java Interesting. I have to say it once again: a) It is *an error* to use static finals as the benchmark input. They are perfectly constant-foldable in way too many cases. Break this habit, please. Hi Aleksey, The constant in this example is only a reference to the char[] array. It's content is not. In String, this is a final instance field, which behaves similarly inside an instance method (usually it is read only once per method invocation). b) Explicit Blackholes are not needed, and just returning int from @Benchmark method helps readability a lot. Please break this habit as well. Having readable and maintainable benchmarks is a key. Ok, here's a modified benchmark: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java Which behaves similarly. Here are it's results: * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8 -gc true * * Benchmark Mode Samples Score Score error Units * j.t.HashBench2._hashCode thrpt8 8308858.217 353019.084 ops/s * j.t.HashBench2.hashCode0 thrpt8 8207337.729 217048.634 ops/s * j.t.HashBench2.hashCode1 thrpt8 13359572.359 345736.675 ops/s * j.t.HashBench2.hashCode2 thrpt8 15310621.202 238369.017 ops/s * j.t.HashBench2.hashCode3 thrpt8 17637944.829 232155.847 ops/s * j.t.HashBench2.hashCode3ithrpt8 17724181.444 509913.288 ops/s * j.t.HashBench2.hashCode3xthrpt8 8344128.432 159508.813 ops/s * j.t.HashBench2.hashCode4 thrpt8 16526850.489 969549.448 ops/s * j.t.HashBench2.hashCode5 thrpt8 17567765.554 917934.885 ops/s * j.t.HashBench2.hashCode6 thrpt8 17705074.332 419405.652 ops/s * j.t.HashBench2.hashCode7 thrpt8 18805633.563 209181.299 ops/s * j.t.HashBench2.hashCode8 thrpt8 18300123.201 376681.550 ops/s It would be interesting to see how it behaves on different CPUs. This is really great! Couldn't this be a tweak via HotSpot, instead uglifying and bloating the Java and hence the byte code? +1 This is for HotSpot compiler guys to answer. Theoretically I think it is possible. But it would have to be tailored to the very specific use case and I don't know if such specific transformation would have wide-enough applicability. If it would only speed-up String.hashCode and very similar loops, it is less trouble to do that by hand in one or few places... I would think this happens in user-specified hashCode() over arrays. IDEs would routinely inject the loop like that or delegate to Arrays.hashCode, that does the same loop. Arrays.hasCode() can be improved this way too. In other words, I would like to see this fixed on compiler side. This seems to be the strength-reduction playing tricks with loop unrolling, I'll submit a bug shortly. As I said, I don't think it has anything to do with loop unrolling. The transformation I applied in hashCode1,2,3, ... 8 produces code that executes 2, 3, 4, ... 9 independent multiplications in each chunk, which allow CPU's pipeline to execute them in parallel. I had to manually unroll the loop a bit just to achieve this transformation. But my manual unrolling does not bring the speed-up per se. The parallelization of multiplication does. This can be seen by observing the score of hashCode3x benchmark, which has the same loop structure as hashCode3, but performs multiplications in a way where each of them depends on the result of a previous one, which prevents the CPU from parallelizing them. This is not to say that such transformation couldn't be done on the JIT side. I just have a feeling that such transformation won't be widely used because it is very specific. It can only be used within integer arithmetic of the homogeneous width (since it changes the order of operations applied, the final result depends on which width is used when overflow happens). Floating arithmetic is equally unsiutable for such transformations that change order of operations. It can only help when the sequence of operations that are dependent on one another are changed into a sequence of independent operations and those operations have a weight that matters (such as multiplication). Regards, Peter -Aleksey.
Re: Lower overhead String encoding/decoding
Hi Alan, Thanks for the feedback. The direction seems reasonable but I wonder about the offset/length (and destOffet) parameters as this isn't consistent with how ByteBuffers were originally intended to be used. That is, when you read the bytes from the wire into a ByteBuffer and flip it then the position and limit will delimit the bytes to be decoded. If the constructor is changed to String(ByteBuffer in, Charset cs) and decodes the remaining bytes in the buffer to a String using the specified Charset then I think would be more consistent. Also I think this would give you a solution to the underflow case. I've updated the webrev to reflect this, removing the offset and length parameters and using position() and limit() instead. http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/ Similarly if getBytes is replaced with with a getBytes or encode(ByteBuffer, Charset cs) then then it would encode as many characters as possible into the output buffer and I think would be more consistent and also help with overflow case. I've also applied the this to the getBytes() method. I chose the getBytes() method name for consistency with the existing getBytes() method that returns a byte[]. To my mind encode() is a more natural name for the method, which you mention in your email, do people have a preference here? regards, Richard Warburton http://insightfullogic.com @RichardWarburto http://twitter.com/richardwarburto
Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
Note that compiler does not use multiplication here. This is strength reduced into a bit shift followed by a subtraction (i.e. h * 31 = h * 32 (i.e. sal 5) - h). The point about allowing execution units to run in parallel (given the break in data dependency) is very valid though. For kicks and giggles, I looked at what gcc, clang, and icc generate for similar C code at O2 and O3, and did not observe them making any transformations to break the data dependency. While it would be cool if compiler could do this transformation on its own, as you say Peter, the scope of where this transform is applicable is somewhat narrow(?). If speeding up the non-cached String.hashCode() or Arrays.hashCode() methods is desired, perhaps intrinsics could be implemented (utility for String.hashCode seems low though). In that case, one could then do this transformation manually and also vectorize it for long inputs. On Thu, Sep 25, 2014 at 7:09 AM, Peter Levart peter.lev...@gmail.com wrote: On 09/25/2014 09:40 AM, Aleksey Shipilev wrote: Hi Peter, On 09/25/2014 02:46 AM, Peter Levart wrote: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java Interesting. I have to say it once again: a) It is *an error* to use static finals as the benchmark input. They are perfectly constant-foldable in way too many cases. Break this habit, please. Hi Aleksey, The constant in this example is only a reference to the char[] array. It's content is not. In String, this is a final instance field, which behaves similarly inside an instance method (usually it is read only once per method invocation). b) Explicit Blackholes are not needed, and just returning int from @Benchmark method helps readability a lot. Please break this habit as well. Having readable and maintainable benchmarks is a key. Ok, here's a modified benchmark: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java Which behaves similarly. Here are it's results: * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8 -gc true * * Benchmark Mode Samples Score Score error Units * j.t.HashBench2._hashCode thrpt8 8308858.217 353019.084 ops/s * j.t.HashBench2.hashCode0 thrpt8 8207337.729 217048.634 ops/s * j.t.HashBench2.hashCode1 thrpt8 13359572.359 345736.675 ops/s * j.t.HashBench2.hashCode2 thrpt8 15310621.202 238369.017 ops/s * j.t.HashBench2.hashCode3 thrpt8 17637944.829 232155.847 ops/s * j.t.HashBench2.hashCode3ithrpt8 17724181.444 509913.288 ops/s * j.t.HashBench2.hashCode3xthrpt8 8344128.432 159508.813 ops/s * j.t.HashBench2.hashCode4 thrpt8 16526850.489 969549.448 ops/s * j.t.HashBench2.hashCode5 thrpt8 17567765.554 917934.885 ops/s * j.t.HashBench2.hashCode6 thrpt8 17705074.332 419405.652 ops/s * j.t.HashBench2.hashCode7 thrpt8 18805633.563 209181.299 ops/s * j.t.HashBench2.hashCode8 thrpt8 18300123.201 376681.550 ops/s It would be interesting to see how it behaves on different CPUs. This is really great! Couldn't this be a tweak via HotSpot, instead uglifying and bloating the Java and hence the byte code? +1 This is for HotSpot compiler guys to answer. Theoretically I think it is possible. But it would have to be tailored to the very specific use case and I don't know if such specific transformation would have wide-enough applicability. If it would only speed-up String.hashCode and very similar loops, it is less trouble to do that by hand in one or few places... I would think this happens in user-specified hashCode() over arrays. IDEs would routinely inject the loop like that or delegate to Arrays.hashCode, that does the same loop. Arrays.hasCode() can be improved this way too. In other words, I would like to see this fixed on compiler side. This seems to be the strength-reduction playing tricks with loop unrolling, I'll submit a bug shortly. As I said, I don't think it has anything to do with loop unrolling. The transformation I applied in hashCode1,2,3, ... 8 produces code that executes 2, 3, 4, ... 9 independent multiplications in each chunk, which allow CPU's pipeline to execute them in parallel. I had to manually unroll the loop a bit just to achieve this transformation. But my manual unrolling does not bring the speed-up per se. The parallelization of multiplication does. This can be seen by observing the score of hashCode3x benchmark, which has the same loop structure as hashCode3, but performs multiplications in a way where each of them depends on the result of a previous one, which prevents the CPU from parallelizing them. This is not to say that such transformation couldn't be done on the JIT side. I just have a feeling that such transformation won't be widely used because it is very specific. It can only be
Re: Lower overhead String encoding/decoding
Hi Richard, couple comments after a quick scan. (1) #474, the IOBE probably is no longer needed in case of ByteBuffer. parameter. (2) for methods that have the ByteBuffer as input, it would be desirable to specify clearly that the bytes are read from position to limit, and whether or not the position will be advanced after decoding/encoding (important). (3) getBytes(byte[], offset, charset) while I understand it might be useful to have such a method in certain circumstance, it is usually complicated to make it right/easy for user to actually use it. Consider the fact that the returned number of bytes encoded has no straightforward connection to how many underlying chars have been encoded (so the user of this method really have no idea how many underlying chars have been encoded into the dest buffer, is that enough? how big the buffer need to be to encode the whole string? ) especially the possibility that the last couple byte space might be short of encoding a char. Not like the getChars(), which has a easy, clear and direct link between the out going chars and underlying chars. I would suggest it might be better to leave it out. (4) StringCoding.decode() #239 remaining() should be used to return limit - position? (5) in case of untrusted, it might be more straightforward to get all bytes out of the buffer first (you are allocating a byte buffer here anyway, I don;t see obvious benefit to get a direct buffer, btw) and then pass it to the original/existing byte[]-char[] decoding implementation. We probably will take a deep look at the implementation later when the public api settled. -Sherman Richard Warburton wrote: Hi Alan, Thanks for the feedback. The direction seems reasonable but I wonder about the offset/length (and destOffet) parameters as this isn't consistent with how ByteBuffers were originally intended to be used. That is, when you read the bytes from the wire into a ByteBuffer and flip it then the position and limit will delimit the bytes to be decoded. If the constructor is changed to String(ByteBuffer in, Charset cs) and decodes the remaining bytes in the buffer to a String using the specified Charset then I think would be more consistent. Also I think this would give you a solution to the underflow case. I've updated the webrev to reflect this, removing the offset and length parameters and using position() and limit() instead. http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/ Similarly if getBytes is replaced with with a getBytes or encode(ByteBuffer, Charset cs) then then it would encode as many characters as possible into the output buffer and I think would be more consistent and also help with overflow case. I've also applied the this to the getBytes() method. I chose the getBytes() method name for consistency with the existing getBytes() method that returns a byte[]. To my mind encode() is a more natural name for the method, which you mention in your email, do people have a preference here? regards, Richard Warburton http://insightfullogic.com @RichardWarburto http://twitter.com/richardwarburto
Re: Lower overhead String encoding/decoding
On 9/25/14 9:24 AM, Xueming Shen wrote: Hi Richard, couple comments after a quick scan. (1) #474, the IOBE probably is no longer needed in case of ByteBuffer. parameter. (2) for methods that have the ByteBuffer as input, it would be desirable to specify clearly that the bytes are read from position to limit, and whether or not the position will be advanced after decoding/encoding (important). (3) getBytes(byte[], offset, charset) while I understand it might be useful to have such a method in certain circumstance, it is usually complicated to make it right/easy for user to actually use it. Consider the fact that the returned number of bytes encoded has no straightforward connection to how many underlying chars have been encoded (so the user of this method really have no idea how many underlying chars have been encoded into the dest buffer, is that enough? how big the buffer need to be to encode the whole string? ) especially the possibility that the last couple byte space might be short of encoding a char. Not like the getChars(), which has a easy, clear and direct link between the out going chars and underlying chars. I would suggest it might be better to leave it out. I would assume this is true for the ByteBuffer version as well...just doubt the usefulness of a method that it might only give you the bytes of part of the target string object. Given the getBytes(ByteBuffer) version has the additional position/limit info form the buffer itself, a workaround is to return the number of underlying chars copied...then you need one more parameter to let user to copy from index' for next round. -sherman (4) StringCoding.decode() #239 remaining() should be used to return limit - position? (5) in case of untrusted, it might be more straightforward to get all bytes out of the buffer first (you are allocating a byte buffer here anyway, I don;t see obvious benefit to get a direct buffer, btw) and then pass it to the original/existing byte[]-char[] decoding implementation. We probably will take a deep look at the implementation later when the public api settled. -Sherman Richard Warburton wrote: Hi Alan, Thanks for the feedback. The direction seems reasonable but I wonder about the offset/length (and destOffet) parameters as this isn't consistent with how ByteBuffers were originally intended to be used. That is, when you read the bytes from the wire into a ByteBuffer and flip it then the position and limit will delimit the bytes to be decoded. If the constructor is changed to String(ByteBuffer in, Charset cs) and decodes the remaining bytes in the buffer to a String using the specified Charset then I think would be more consistent. Also I think this would give you a solution to the underflow case. I've updated the webrev to reflect this, removing the offset and length parameters and using position() and limit() instead. http://cr.openjdk.java.net/~rwarburton/string-patch-webrev-6/ Similarly if getBytes is replaced with with a getBytes or encode(ByteBuffer, Charset cs) then then it would encode as many characters as possible into the output buffer and I think would be more consistent and also help with overflow case. I've also applied the this to the getBytes() method. I chose the getBytes() method name for consistency with the existing getBytes() method that returns a byte[]. To my mind encode() is a more natural name for the method, which you mention in your email, do people have a preference here? regards, Richard Warburton http://insightfullogic.com @RichardWarburto http://twitter.com/richardwarburto
Re: RFR (XS) CR 8058643: (str) Re-examine hashCode implementation
On 09/25/2014 06:00 PM, Vitaly Davidovich wrote: Note that compiler does not use multiplication here. This is strength reduced into a bit shift followed by a subtraction (i.e. h * 31 = h * 32 (i.e. sal 5) - h). I've noticed that in the disassembly files provided by Aleksey. Good to know, so one is not bothered to use bit shifts in situations where multiplication / division look clearer in source code. The point about allowing execution units to run in parallel (given the break in data dependency) is very valid though. For kicks and giggles, I looked at what gcc, clang, and icc generate for similar C code at O2 and O3, and did not observe them making any transformations to break the data dependency. So HotSpot can break the ice here ;-) While it would be cool if compiler could do this transformation on its own, as you say Peter, the scope of where this transform is applicable is somewhat narrow(?). If speeding up the non-cached String.hashCode() or Arrays.hashCode() methods is desired, perhaps intrinsics could be implemented (utility for String.hashCode seems low though). In that case, one could then do this transformation manually and also vectorize it for long inputs. I see there are multiple options - each with it's own benefits and drawbacks. Hard to choose. Regards, Peter On Thu, Sep 25, 2014 at 7:09 AM, Peter Levart peter.lev...@gmail.com mailto:peter.lev...@gmail.com wrote: On 09/25/2014 09:40 AM, Aleksey Shipilev wrote: Hi Peter, On 09/25/2014 02:46 AM, Peter Levart wrote: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench.java http://cr.openjdk.java.net/%7Eplevart/misc/StringHash/HashBench.java Interesting. I have to say it once again: a) It is *an error* to use static finals as the benchmark input. They are perfectly constant-foldable in way too many cases. Break this habit, please. Hi Aleksey, The constant in this example is only a reference to the char[] array. It's content is not. In String, this is a final instance field, which behaves similarly inside an instance method (usually it is read only once per method invocation). b) Explicit Blackholes are not needed, and just returning int from @Benchmark method helps readability a lot. Please break this habit as well. Having readable and maintainable benchmarks is a key. Ok, here's a modified benchmark: http://cr.openjdk.java.net/~plevart/misc/StringHash/HashBench2.java http://cr.openjdk.java.net/%7Eplevart/misc/StringHash/HashBench2.java Which behaves similarly. Here are it's results: * Results on JDK9, Linux, i7-2600K CPU, JMH args: -f 1 -wi 5 -i 8 -gc true * * Benchmark Mode Samples Score Score error Units * j.t.HashBench2._hashCode thrpt8 8308858.217 tel:8308858.217 353019.084 ops/s * j.t.HashBench2.hashCode0 thrpt8 8207337.729 217048.634 ops/s * j.t.HashBench2.hashCode1 thrpt8 13359572.359 345736.675 ops/s * j.t.HashBench2.hashCode2 thrpt8 15310621.202 238369.017 ops/s * j.t.HashBench2.hashCode3 thrpt8 17637944.829 tel:17637944.829 232155.847 ops/s * j.t.HashBench2.hashCode3ithrpt8 17724181.444 tel:17724181.444 509913.288 ops/s * j.t.HashBench2.hashCode3xthrpt8 8344128.432 159508.813 ops/s * j.t.HashBench2.hashCode4 thrpt8 16526850.489 969549.448 ops/s * j.t.HashBench2.hashCode5 thrpt8 17567765.554 917934.885 ops/s * j.t.HashBench2.hashCode6 thrpt8 17705074.332 tel:17705074.332 419405.652 ops/s * j.t.HashBench2.hashCode7 thrpt8 18805633.563 209181.299 ops/s * j.t.HashBench2.hashCode8 thrpt8 18300123.201 376681.550 ops/s It would be interesting to see how it behaves on different CPUs. This is really great! Couldn't this be a tweak via HotSpot, instead uglifying and bloating the Java and hence the byte code? +1 This is for HotSpot compiler guys to answer. Theoretically I think it is possible. But it would have to be tailored to the very specific use case and I don't know if such specific transformation would have wide-enough applicability. If it would only speed-up String.hashCode and very similar loops, it is less trouble to do that by hand in one or few places... I would think this happens in user-specified hashCode() over arrays. IDEs would routinely inject the loop like that or delegate to Arrays.hashCode, that does the