Re: Backport of JEP 254 (Compact Strings) to OpenJDK 8
On 01/12/2017 08:11 PM, John Platts wrote: > I am interested in OpenJDK 8 builds with JEP 254 (Compact Strings) support > backported from OpenJDK 9. I like the compact strings work that is being done > in JDK 9, but I am interested in a OpenJDK 8 build with backported JEP 254 > support since I am working with Java applications that might not work in > OpenJDK 9 yet and since I would like take advantage of the feature in a > backported OpenJDK 8 build instead of having to wait for the JDK 9 release. This was discussed a number of times both internally and externally, and the maintainers' consensus was that the feature is too intrusive VM- and JDK- wise. Backporting it would be very painful, and probably destabilizing for JDK 8. Think about the work one does enabling applications to run on JDK 9 as the price to get Compact Strings :) Thanks, -Aleksey
Backport of JEP 254 (Compact Strings) to OpenJDK 8
I am interested in OpenJDK 8 builds with JEP 254 (Compact Strings) support backported from OpenJDK 9. I like the compact strings work that is being done in JDK 9, but I am interested in a OpenJDK 8 build with backported JEP 254 support since I am working with Java applications that might not work in OpenJDK 9 yet and since I would like take advantage of the feature in a backported OpenJDK 8 build instead of having to wait for the JDK 9 release.
Re: JEP 254: Compact Strings - length limits
On Sep 6, 2016, at 2:18 PM, Tim Ellisonwrote: > > People stash all sorts of things in (immutable) Strings. Reducing the > limits in JDK9 seems like a regression. Was there any consideration to > using the older Java 8 StringCoding APIs for UTF-16 strings (already > highly perf tuned) and adding additional methods for compact strings > rather than rewriting everything as byte[]'s? It doesn't help now, but https://bugs.openjdk.java.net/browse/JDK-8161256 proposes a better way to stash immutable bits, CONSTANT_Data. (Caveat: Language bindings not yet included.) Eventually we'll get there. — John
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 2:18 PM, Tim Ellison wrote: Do we have a real use case that impacted by this change? People stash all sorts of things in (immutable) Strings. Reducing the limits in JDK9 seems like a regression. Was there any consideration to using the older Java 8 StringCoding APIs for UTF-16 strings (already highly perf tuned) and adding additional methods for compact strings rather than rewriting everything as byte[]'s? Hi Tim, I'm sorry I don't get the idea of "using StringCoding APIs for UTF-16 strings", can you explain a little more in detail? We did try various approaches, byte[] + flag, byte[] + coder, coder, char[] + coder, etc) the current one appears to be the best so far based on our measurement. Regards, Sherman
Re: JEP 254: Compact Strings - length limits
On 06/09/16 19:04, Xueming Shen wrote: > On 9/6/16, 10:09 AM, Tim Ellison wrote: >> Has it been noted that while JEP 254 reduces the space occupied by one >> byte per character strings, moving from a char[] to byte[] >> representation universally means that the maximum length of a UTF-16 >> (two bytes per char) string is now halved? Hey Sherman, > Yes, it's a known "limit" given the nature of the approach. It is > not considered to be an "incompatible change", because the max length > the String class and the corresponding buffer/builder classes can > support is really an implementation details, not a spec requirement. Don't confuse spec compliance with compatibility. Of course, the JEP should not break the formal specified behavior of String etc, but the goal was to ensure that the implementation be compatible with prior behavior. As you know, there are many places where compatible behavior beyond the spec is important to maintain. > The conclusion from the discussion back then was this is something we > can trade off for the benefits we gain from the approach. Out of curiosity, where was that? I did search for previous discussion of this topic but didn't see it -- it may be just my poor search foo. > Do we have a real use case that impacted by this change? People stash all sorts of things in (immutable) Strings. Reducing the limits in JDK9 seems like a regression. Was there any consideration to using the older Java 8 StringCoding APIs for UTF-16 strings (already highly perf tuned) and adding additional methods for compact strings rather than rewriting everything as byte[]'s? Regards, Tim >> Since the goal is "preserving full compatibility", this has been missed >> by failing to allow for UTF-16 strings of length greater than >> Integer.MAX_VALUE / 2. >> >> Regards, >> Tim >> >> >
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 12:58 PM, Charles Oliver Nutter wrote: On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen> wrote: Yes, it's a known "limit" given the nature of the approach. It is not considered to be an "incompatible change", because the max length the String class and the corresponding buffer/builder classes can support is really an implementation details, not a spec requirement. The conclusion from the discussion back then was this is something we can trade off for the benefits we gain from the approach. Do we have a real use case that impacted by this change? Well, doesn't this mean that any code out there consuming String data that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on OpenJDK 9? Yes, true. But arguably the code that uses huge length of String should have fallback code to handle the potential OOM exception, when the vm can't handle the size, as there is really no guarantee the vm can handle the > max_value/2 length of String. Not that such a case is a particularly good pattern, but I'm sure there's code out there doing it. On JRuby we routinely get bug reports complaining that we can't support strings larger than 2GB (and we have used byte[] for strings since 2006). That was a trade-off decision to make. Does JRuby have any better solution for such complain? ever consider to go back to use char[] to "fix" the problem? or some workaround such as to add another byte[] for example. btw, the single byte only string should work just fine :-) or :-( depends on the character set used. Sherman
Re: JEP 254: Compact Strings - length limits
On Sep 6, 2016, at 12:58 PM, Charles Oliver Nutterwrote: > > On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen > wrote: > >> Yes, it's a known "limit" given the nature of the approach. It is not >> considered >> to be an "incompatible change", because the max length the String class >> and >> the corresponding buffer/builder classes can support is really an >> implementation >> details, not a spec requirement. The conclusion from the discussion back >> then >> was this is something we can trade off for the benefits we gain from the >> approach. >> Do we have a real use case that impacted by this change? >> > > Well, doesn't this mean that any code out there consuming String data > that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on > OpenJDK 9? > > Not that such a case is a particularly good pattern, but I'm sure there's > code out there doing it. On JRuby we routinely get bug reports complaining > that we can't support strings larger than 2GB (and we have used byte[] for > strings since 2006). > > - Charlie The most basic scale requirement for strings is that they support class-file constants, which top out at a UTF8-length of 2**16. Lengths beyond that, to fill up the 'int' return value of String::length, are less well specified. FTR, we could have chosen char[], int[], or long[] (not byte[]) as the backing store for string data. With long[] we could have strings above 4G-chars. But it would have come with a perf. tax, since the T[].length field would need to be combined with an extra bit or two (from a flag byte) to complete the length. That's 2-3 extra instructions for loading a string length, or else a redundant length field. So it's a trade-off. Likewise, choosing a third format deepens branch depth in order to get to payload. Likewise, making the second format (of two) have a length field embedded in the payload section requires a conditional load or branch, in order to load the string length. Again, more instructions. The team has looked at 20 possibilities like these. The current design is fastest. I hope it flies. — John
Re: JEP 254: Compact Strings - length limits
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shenwrote: > Yes, it's a known "limit" given the nature of the approach. It is not > considered > to be an "incompatible change", because the max length the String class > and > the corresponding buffer/builder classes can support is really an > implementation > details, not a spec requirement. The conclusion from the discussion back > then > was this is something we can trade off for the benefits we gain from the > approach. > Do we have a real use case that impacted by this change? > Well, doesn't this mean that any code out there consuming String data that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on OpenJDK 9? Not that such a case is a particularly good pattern, but I'm sure there's code out there doing it. On JRuby we routinely get bug reports complaining that we can't support strings larger than 2GB (and we have used byte[] for strings since 2006). - Charlie
Re: JEP 254: Compact Strings - length limits
On 9/6/16, 10:09 AM, Tim Ellison wrote: Has it been noted that while JEP 254 reduces the space occupied by one byte per character strings, moving from a char[] to byte[] representation universally means that the maximum length of a UTF-16 (two bytes per char) string is now halved? Hi Tim, Yes, it's a known "limit" given the nature of the approach. It is not considered to be an "incompatible change", because the max length the String class and the corresponding buffer/builder classes can support is really an implementation details, not a spec requirement. The conclusion from the discussion back then was this is something we can trade off for the benefits we gain from the approach. Do we have a real use case that impacted by this change? Thanks, Sherman Since the goal is "preserving full compatibility", this has been missed by failing to allow for UTF-16 strings of length greater than Integer.MAX_VALUE / 2. Regards, Tim
Re: JEP 254: Compact Strings thoughts: character ranges outside ASCII + EASCII blocks
Hi Simon, On 09/25/2015 01:01 AM, Simon Spero wrote: > [Some of this is may simple or prohibitively tricksy depending on alignment > constraints (even though it's restricted to Prime Multilingual Plane :-) ] > > For some not un-realistic use cases, the most significant bytes for all the > characters in a string are identical, even if the string is non-latin. For > example, all the characters may be in the range U+0400--U+04FF, or > U+0500--U+05FF. > In these cases, it may be feasible to save the upper byte, then splat it > into place when reconstituting the UTF-16 chars. > > Because of the assignment of unicode code-points, this technique is not as > big as win as it might have been. Unlike (e.g.) 8859-5 or 8859-8, there are > no punctuation marks, digits, or whitespace characters, which restricts use > cases to very short strings (the lack of whitespace is the biggest > problem). For the 254-like coding system I was experimenting with, for the > cases were I didn't fall back to UTF-16, the savings were overwhelmed by > the cost of header words and padding. > > It is possible to handle some of these mixtures, on some architectures, > without resorting to LUTs or branches, but that's well in to non-goal > territory for JEP-254. There might be some useful win just from being able > to have an offset to be added to the packed value based if the high-bit is > set or not. Anyone here from Москва? Sure, many theoretical constructions may be devised. Not many of them are practical. JEP-254 wins big time exactly because many strings *are* single-byte storeable in ASCII/8859-1, *especially* those with long lengths. So, the very first thing you have to do is prove that an alternative scheme successfully encodes a fair amount of real strings. Otherwise, it does not worth exploring any further. As you say, a lack of "usual" characters like whitespace may be the deal breaker. Adding an alternative coder is easy, but making sure it does not regress the prevailing cases of 8859-1/UTF16 strings is much harder. Think about branching costs, eliminating the bit tricks that are employed now with binary 0/1 coder, etc. Thanks, -Aleksey
JEP 254: Compact Strings thoughts: character ranges outside ASCII + EASCII blocks
[Some of this is may simple or prohibitively tricksy depending on alignment constraints (even though it's restricted to Prime Multilingual Plane :-) ] For some not un-realistic use cases, the most significant bytes for all the characters in a string are identical, even if the string is non-latin. For example, all the characters may be in the range U+0400--U+04FF, or U+0500--U+05FF. In these cases, it may be feasible to save the upper byte, then splat it into place when reconstituting the UTF-16 chars. Because of the assignment of unicode code-points, this technique is not as big as win as it might have been. Unlike (e.g.) 8859-5 or 8859-8, there are no punctuation marks, digits, or whitespace characters, which restricts use cases to very short strings (the lack of whitespace is the biggest problem). For the 254-like coding system I was experimenting with, for the cases were I didn't fall back to UTF-16, the savings were overwhelmed by the cost of header words and padding. It is possible to handle some of these mixtures, on some architectures, without resorting to LUTs or branches, but that's well in to non-goal territory for JEP-254. There might be some useful win just from being able to have an offset to be added to the packed value based if the high-bit is set or not. Anyone here from Москва? Simon p.s. As part of the replacement for sun.misc.Unsafe, could we get a jdk.infernal/...ABitDodgy, which would allow the full set of SIMD instructions to be generated in an architecture independent fashion? (By architecture independent I mean if you ask for a NEON instruction on an amd64, or an SSE 4.2 string primitive on SPARC, that's what gets emitted).
Re: JEP 254: Compact Strings
TL;DR: In principle, we'd love to do more early testing of Hotspot / JDK features, but our benchmarks are, honestly, not all that great. We end up having to test against live services, which makes this sort of thing really hard. More info than you need: There are two real problems here: 1) To do apples-to-apples comparisons, we have to make sure that *our* patches all work with whatever version of Hotspot we're testing. 2) Pulling down a new JDK9 - even an official release - usually means that there are a lot of instabilities, half-finished work, and inefficiencies, so we can't really run tests very well against it. That's not a knock on Hotspot developers; the only way to know about some of these problems is to run the JDK in infrastructure like ours. ( An example of something that hit us hard that no one else would notice: http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/5ba37c4c0578 ) It took us months to forward port all of our patches to JDK8, and we've spent the last six months getting it to the point that we're comfortable enough to ship to our users (just in time for JDK7 EOL!). That's required disabling tiered compilation, heavily tweaking code cache flushing (which is still causing us CPU regressions), rewriting various parts of the metaspace to behave more efficiently, and fixing various incompatibilities with our internal patches. That's completely apart from the dozens of backwards incompatibilities introduced in JDK8 that triggered a very, very, very large code cleanup effort, from the new hash iteration order to the unicode update to the fact that if you call flush on a closed BufferedOutputStream it now throws an exception. (We actually ended up randomizing our hash iteration order, which helps us guard against broken code, is slightly more secure, and means that we never get bitten by that as part of an upgrade again.) In short, upgrading is in no sense cheap for us, and trying out new features is hard. We usually restrict ourselves to using new features that can be more-or-less cleanly patched to the version of the JDK we're using and hidden behind a flag. This is an important enough change that we might be able to make some effort, but we'll have to see how it goes. Jeremy On Mon, Jun 1, 2015 at 1:31 AM, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: (getting back to this) Hi Jeremy, On 05/16/2015 03:34 AM, Jeremy Manson wrote: So, I'm pretty dubious, mostly because of the risks mentioned in the JEP. If you need a flag-check and two code paths for every String method, that's going to make the String class more slow and bloated, and make it very difficult for the JIT compiler to do its job inlining and intrinsifying calls to String methods. Yes, we know that's a potential problem, e.g. outlined here: http://cr.openjdk.java.net/~shade/density/equals.txt The hope is that the string coder check would be amortized by the substantial performance improvement with the ubiquitous Latin1 (optimized) operations. Also, getting a few code generation quirks kicked out may further offset the perceived performance costs of doing this (you can do such a trick every so often, but not all the time). The proposed change here has the potential of doing the opposite with most String operations - trading off less GC overhead for more mutator cost. But String operations are a pretty big chunk of CPU time, on the whole. The thing is, many mutator ops on Strings are also improved, because the data become more easily cacheable and/or require less steps to complete (think vectorization that takes 2x less instructions). Does anyone really have a sense of how to make this kind of decision? The JEP seems mostly to be hoping that other organizations will do the testing for you. It is not true that JEP hopes to have other organizations to do testing for it. The JEP tries to illuminate that this is a performance-sensitive change, so early testing and feedback is very appreciated. So, if you have the String-intensive workloads in your org, can you try and run the prototype JDK against it? Our early runs on our workloads of interest show the appealing improvements. That is, the decision to integrate this is not done yet, as we don't have the complete performance picture and/or fully-tested prototype. In other words, there are quite a few blank spots to fill out. Your data may be the part of that picture when we decide to integrate in JDK 9. (I agree that it is worth doing some experimentation in this area, but I wanted to say this early, because if I could reach back in time and tell you *not* to make the substring change, I would. We seriously considered simply backing it out locally, but it would have been a lot of effort for us to maintain that kind of patch, and we didn't want our performance tradeoffs to be that much different from the stock JDK's.) This is your golden ticket: if you come back with concrete data in your
Re: JEP 254: Compact Strings
Hi Aleksey, While it's true that the denser format will require fewer cachelines, my experience is that most strings are smaller than a single cacheline worth of storage, maybe two lines in some cases; there's just a ton of them in the heap. So the heap footprint should be substantially reduced, but I'm not sure the cache pollution will be significantly reduced. There's currently no vectorization of char[] scanning (or any vectorization other than memcpy for that matter) - are you referring to the recent Intel contributions here or there's a plan to further improve vectorization in time for this JEP? Just curious. I agree that string fusion is separate from this change, and we've discussed this before. It just seems to me like that's a bigger perf problem today since even tiny/small strings (very common, IME) incur the indirection and bloat overhead, so would have liked to see that addressed first. If you're saying that's fully on valhalla's plate, ok, but I haven't seen anything proposed there yet. Thanks sent from my phone On Jun 1, 2015 4:50 AM, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: On 05/18/2015 05:35 PM, Vitaly Davidovich wrote: This part is a bit unclear for the proposed changes. While it's true that single byte encoding will be denser than two byte, most string ops end up walking the backing store linearly; prefetch (either implicit h/w or software-assisted) could hide the memory access latency. It will still pollute the caches though, and generally incur more instructions to be executed (e.g. think about the vectorized scan of the char[] array -- the compressed version will take 2x less instructions). Personally, what I'd like to see is fusing storage of String with its backing data, irrespective of encoding (i.e. removing the indirection to fetch the char[] or byte[]). This is not the target for this JEP, and the groundwork for String-char[] fusion is handled elsewhere (I put my hopes at Valhalla that will explore the exact path to add the exotic object shapes into the runtime). String-char[] fusion neither conflicts with the Compact String optimization, nor provides the alternative. Removing the excess headers from backing char[] array would solve the static overhead in Strings, while the String compaction would further compact the backing storage. Thanks, -Aleksey.
Re: JEP 254: Compact Strings
On 05/18/2015 05:35 PM, Vitaly Davidovich wrote: This part is a bit unclear for the proposed changes. While it's true that single byte encoding will be denser than two byte, most string ops end up walking the backing store linearly; prefetch (either implicit h/w or software-assisted) could hide the memory access latency. It will still pollute the caches though, and generally incur more instructions to be executed (e.g. think about the vectorized scan of the char[] array -- the compressed version will take 2x less instructions). Personally, what I'd like to see is fusing storage of String with its backing data, irrespective of encoding (i.e. removing the indirection to fetch the char[] or byte[]). This is not the target for this JEP, and the groundwork for String-char[] fusion is handled elsewhere (I put my hopes at Valhalla that will explore the exact path to add the exotic object shapes into the runtime). String-char[] fusion neither conflicts with the Compact String optimization, nor provides the alternative. Removing the excess headers from backing char[] array would solve the static overhead in Strings, while the String compaction would further compact the backing storage. Thanks, -Aleksey.
Re: JEP 254: Compact Strings
(getting back to this) Hi Jeremy, On 05/16/2015 03:34 AM, Jeremy Manson wrote: So, I'm pretty dubious, mostly because of the risks mentioned in the JEP. If you need a flag-check and two code paths for every String method, that's going to make the String class more slow and bloated, and make it very difficult for the JIT compiler to do its job inlining and intrinsifying calls to String methods. Yes, we know that's a potential problem, e.g. outlined here: http://cr.openjdk.java.net/~shade/density/equals.txt The hope is that the string coder check would be amortized by the substantial performance improvement with the ubiquitous Latin1 (optimized) operations. Also, getting a few code generation quirks kicked out may further offset the perceived performance costs of doing this (you can do such a trick every so often, but not all the time). The proposed change here has the potential of doing the opposite with most String operations - trading off less GC overhead for more mutator cost. But String operations are a pretty big chunk of CPU time, on the whole. The thing is, many mutator ops on Strings are also improved, because the data become more easily cacheable and/or require less steps to complete (think vectorization that takes 2x less instructions). Does anyone really have a sense of how to make this kind of decision? The JEP seems mostly to be hoping that other organizations will do the testing for you. It is not true that JEP hopes to have other organizations to do testing for it. The JEP tries to illuminate that this is a performance-sensitive change, so early testing and feedback is very appreciated. So, if you have the String-intensive workloads in your org, can you try and run the prototype JDK against it? Our early runs on our workloads of interest show the appealing improvements. That is, the decision to integrate this is not done yet, as we don't have the complete performance picture and/or fully-tested prototype. In other words, there are quite a few blank spots to fill out. Your data may be the part of that picture when we decide to integrate in JDK 9. (I agree that it is worth doing some experimentation in this area, but I wanted to say this early, because if I could reach back in time and tell you *not* to make the substring change, I would. We seriously considered simply backing it out locally, but it would have been a lot of effort for us to maintain that kind of patch, and we didn't want our performance tradeoffs to be that much different from the stock JDK's.) This is your golden ticket: if you come back with concrete data in your hands saying that the particular tradeoff the JEP made is not sensible for your applications, it would be considered in the decision to integrate. But, it should be a real data and/or contrived benchmark simulating the real-world scenario, not just theoretical appeals -- we know how misguided those can get. Thanks, -Aleksey
Re: JEP 254: Compact Strings
On 06/01/2015 03:54 PM, Vitaly Davidovich wrote: While it's true that the denser format will require fewer cachelines, my experience is that most strings are smaller than a single cacheline worth of storage, maybe two lines in some cases; there's just a ton of them in the heap. So the heap footprint should be substantially reduced, but I'm not sure the cache pollution will be significantly reduced. This calculation assumes object allocations are granular to the cache lines. They are not: if String takes less space within the cache line, it allows *more* object data to be squeezed there. In other words, with compact Strings, the entire dataset can take less cache lines, thus improving performance. There's currently no vectorization of char[] scanning (or any vectorization other than memcpy for that matter) - are you referring to the recent Intel contributions here or there's a plan to further improve vectorization in time for this JEP? Just curious. String methods are intensely intrinsified (and vectorized in those implementations). String::equals, String::compareTo, and some encoding/decoding come to mind. I really, really invite you to read the collateral materials from the JEP, where we explored quite a few performance characteristics already. Thanks, -Aleksey.
Re: JEP 254: Compact Strings
My calculation doesn't assume cacheline granularity; I'm looking at strictly the strings. What's allocated next to/around them is completely arbitrary, circumstantial, uncontrollable to a large extent, and often not repeatable. If you're claiming that some second or even third order locality effects will be measurable, I don't know how :). I'm sure there will be some as theoretically it's possible, but it'll be hard to demonstrate that on anything other than specially crafted microbenchmarks. Ok, you're talking about some string intrinsics and not general char[] being vectorized - fair enough. sent from my phone On Jun 1, 2015 9:31 AM, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: On 06/01/2015 03:54 PM, Vitaly Davidovich wrote: While it's true that the denser format will require fewer cachelines, my experience is that most strings are smaller than a single cacheline worth of storage, maybe two lines in some cases; there's just a ton of them in the heap. So the heap footprint should be substantially reduced, but I'm not sure the cache pollution will be significantly reduced. This calculation assumes object allocations are granular to the cache lines. They are not: if String takes less space within the cache line, it allows *more* object data to be squeezed there. In other words, with compact Strings, the entire dataset can take less cache lines, thus improving performance. There's currently no vectorization of char[] scanning (or any vectorization other than memcpy for that matter) - are you referring to the recent Intel contributions here or there's a plan to further improve vectorization in time for this JEP? Just curious. String methods are intensely intrinsified (and vectorized in those implementations). String::equals, String::compareTo, and some encoding/decoding come to mind. I really, really invite you to read the collateral materials from the JEP, where we explored quite a few performance characteristics already. Thanks, -Aleksey.
RE: JEP 254: Compact Strings
For what it's worth, we would welcome this change. We took a large memory hit and a small performance hit when we upgraded from 1.6 to 1.7 in some of our memory-bound applications. From a purely performance perspective, the most expensive CPU operations are memory access these days. Anything that halves memory reads will likely produce better performance. From an implementation perspective, having used 1.6's compressed strings feature in production, we are comfortable that none of our code, nor any of our dependencies rely on String internal representation in such a way as to cause a significant backward compatibility issue. Thanks Moh -Original Message- From: core-libs-dev [mailto:core-libs-dev-boun...@openjdk.java.net] On Behalf Of mark.reinh...@oracle.com Sent: Thursday, May 14, 2015 7:05 PM To: xueming.s...@oracle.com Cc: core-libs-dev@openjdk.java.net Subject: JEP 254: Compact Strings New JEP Candidate: http://openjdk.java.net/jeps/254 - Mark
Re: JEP 254: Compact Strings
From a purely performance perspective, the most expensive CPU operations are memory access these days. Very true ... for random accesses. Anything that halves memory reads will likely produce better performance. This part is a bit unclear for the proposed changes. While it's true that single byte encoding will be denser than two byte, most string ops end up walking the backing store linearly; prefetch (either implicit h/w or software-assisted) could hide the memory access latency. Personally, what I'd like to see is fusing storage of String with its backing data, irrespective of encoding (i.e. removing the indirection to fetch the char[] or byte[]). On Mon, May 18, 2015 at 10:24 AM, Rezaei, Mohammad A. mohammad.rez...@gs.com wrote: For what it's worth, we would welcome this change. We took a large memory hit and a small performance hit when we upgraded from 1.6 to 1.7 in some of our memory-bound applications. From a purely performance perspective, the most expensive CPU operations are memory access these days. Anything that halves memory reads will likely produce better performance. From an implementation perspective, having used 1.6's compressed strings feature in production, we are comfortable that none of our code, nor any of our dependencies rely on String internal representation in such a way as to cause a significant backward compatibility issue. Thanks Moh -Original Message- From: core-libs-dev [mailto:core-libs-dev-boun...@openjdk.java.net] On Behalf Of mark.reinh...@oracle.com Sent: Thursday, May 14, 2015 7:05 PM To: xueming.s...@oracle.com Cc: core-libs-dev@openjdk.java.net Subject: JEP 254: Compact Strings New JEP Candidate: http://openjdk.java.net/jeps/254 - Mark
Re: JEP 254: Compact Strings
So, I'm pretty dubious, mostly because of the risks mentioned in the JEP. If you need a flag-check and two code paths for every String method, that's going to make the String class more slow and bloated, and make it very difficult for the JIT compiler to do its job inlining and intrinsifying calls to String methods. At Google, we spent a fair bit of time last year climbing out of the performance hole that trimming substrings dropped us into - we had a fair bit of code that was based around substrings being approximately memory-neutral, and it cost us a lot of GC overhead and rewriting to make the change. The JDK itself still has exposed APIs that make tradeoffs based on cheap substrings (the URL(String) constructor does a lot of this, for example). The proposed change here has the potential of doing the opposite with most String operations - trading off less GC overhead for more mutator cost. But String operations are a pretty big chunk of CPU time, on the whole. Does anyone really have a sense of how to make this kind of decision? The JEP seems mostly to be hoping that other organizations will do the testing for you. (I agree that it is worth doing some experimentation in this area, but I wanted to say this early, because if I could reach back in time and tell you *not* to make the substring change, I would. We seriously considered simply backing it out locally, but it would have been a lot of effort for us to maintain that kind of patch, and we didn't want our performance tradeoffs to be that much different from the stock JDK's.) Jeremy On Thu, May 14, 2015 at 4:05 PM, mark.reinh...@oracle.com wrote: New JEP Candidate: http://openjdk.java.net/jeps/254 - Mark
JEP 254: Compact Strings
New JEP Candidate: http://openjdk.java.net/jeps/254 - Mark