Re: More memory-efficient internal representation for Strings: call for more data
Potentially in the future. It has been on a list of candidate enhancements for quite some time. As Aleksey just mentioned in his response, (he beat me to the punch), that work is not in scope as part of this project. Should also mention that the work from this project would not prohibit such an enhancement. hths, charlie On Dec 2, 2014, at 4:13 PM, Vitaly Davidovich vita...@gmail.com wrote: Any consideration towards removing the char[] (or byte[]) indirection altogether? .NET for example stores the bytes inline with the instance. Sent from my phone On Dec 2, 2014 4:59 PM, Aleksey Shipilev aleksey.shipi...@oracle.com mailto:aleksey.shipi...@oracle.com wrote: Hi, As you may already know, we are looking into more memory efficient representation for Strings: https://bugs.openjdk.java.net/browse/JDK-8054307 https://bugs.openjdk.java.net/browse/JDK-8054307 As part of preliminary performance work for this JEP, we have to collect the empirical data on usual characteristics of Strings and char[]-s normal applications have, as well as figure out the early estimates for the improvements based on that data. What we have so far is written up here: http://cr.openjdk.java.net/~shade/density/string-density-report.pdf http://cr.openjdk.java.net/~shade/density/string-density-report.pdf We would appreciate if people who are interested in this JEP can provide the additional data on their applications. It is double-interesting to have the data for the applications that process String data outside Latin1 plane. Our current data says these cases are rather rare. Please read the current report draft, and try to process your own heap dumps using the instructions in the Appendix. Thanks, -Aleksey.
More memory-efficient internal representation for Strings: call for more data
Hi, As you may already know, we are looking into more memory efficient representation for Strings: https://bugs.openjdk.java.net/browse/JDK-8054307 As part of preliminary performance work for this JEP, we have to collect the empirical data on usual characteristics of Strings and char[]-s normal applications have, as well as figure out the early estimates for the improvements based on that data. What we have so far is written up here: http://cr.openjdk.java.net/~shade/density/string-density-report.pdf We would appreciate if people who are interested in this JEP can provide the additional data on their applications. It is double-interesting to have the data for the applications that process String data outside Latin1 plane. Our current data says these cases are rather rare. Please read the current report draft, and try to process your own heap dumps using the instructions in the Appendix. Thanks, -Aleksey.
Re: More memory-efficient internal representation for Strings: call for more data
Any consideration towards removing the char[] (or byte[]) indirection altogether? .NET for example stores the bytes inline with the instance. Sent from my phone On Dec 2, 2014 4:59 PM, Aleksey Shipilev aleksey.shipi...@oracle.com wrote: Hi, As you may already know, we are looking into more memory efficient representation for Strings: https://bugs.openjdk.java.net/browse/JDK-8054307 As part of preliminary performance work for this JEP, we have to collect the empirical data on usual characteristics of Strings and char[]-s normal applications have, as well as figure out the early estimates for the improvements based on that data. What we have so far is written up here: http://cr.openjdk.java.net/~shade/density/string-density-report.pdf We would appreciate if people who are interested in this JEP can provide the additional data on their applications. It is double-interesting to have the data for the applications that process String data outside Latin1 plane. Our current data says these cases are rather rare. Please read the current report draft, and try to process your own heap dumps using the instructions in the Appendix. Thanks, -Aleksey.
Re: More memory-efficient internal representation for Strings: call for more data
Hi Vitaly, Please read the JEP proposal. String/char[] fusion (that's what you are describing) is out of scope for this work. Baby steps. Careful baby steps. -Aleksey. On 03.12.2014 01:13, Vitaly Davidovich wrote: Any consideration towards removing the char[] (or byte[]) indirection altogether? .NET for example stores the bytes inline with the instance. Sent from my phone On Dec 2, 2014 4:59 PM, Aleksey Shipilev aleksey.shipi...@oracle.com mailto:aleksey.shipi...@oracle.com wrote: Hi, As you may already know, we are looking into more memory efficient representation for Strings: https://bugs.openjdk.java.net/browse/JDK-8054307 As part of preliminary performance work for this JEP, we have to collect the empirical data on usual characteristics of Strings and char[]-s normal applications have, as well as figure out the early estimates for the improvements based on that data. What we have so far is written up here: http://cr.openjdk.java.net/~shade/density/string-density-report.pdf http://cr.openjdk.java.net/%7Eshade/density/string-density-report.pdf We would appreciate if people who are interested in this JEP can provide the additional data on their applications. It is double-interesting to have the data for the applications that process String data outside Latin1 plane. Our current data says these cases are rather rare. Please read the current report draft, and try to process your own heap dumps using the instructions in the Appendix. Thanks, -Aleksey.
Re: More memory-efficient internal representation for Strings: call for more data
String construction is a big performance issue for JDBC drivers. Most queries return some number of Strings. The overwhelming majority of those Strings will be short lived. The cost of constructing these Strings from network bytes is a large fraction of total execution time. Any increase in the cost of constructing a String will far out weigh any reduction in memory use, at least for query results. All of the proposed compression methods require an additional scan of the entire string. That's exactly the wrong direction. Something like the following pseudo-code is common inside a driver. { char[] c = new char[n]; for (i = 0; i n; i++) c[i] = charSource.next(); return new String(c); } The array copy inside the String constructor is a significant fraction of JDBC driver execution time. Adding an additional scan on top of it is making things worse regardless of the transient benefit of more compact storage. In the case of a query result the String will be likely never be promoted out of new space; the benefit of compression would be minimal. I don't dispute that Strings occupy a significant fraction of the heap or that a lot of those bytes are zero. And I certainly agree that reducing memory footprint is valuable, but any worsening of String construction time will likely be a problem. Douglas At 02:13 PM 12/2/2014, core-libs-dev-requ...@openjdk.java.net wrote: Date: Wed, 03 Dec 2014 00:59:10 +0300 From: Aleksey Shipilev aleksey.shipi...@oracle.com To: Java Core Libs core-libs-dev@openjdk.java.net Cc: charlie hunt charlie.h...@oracle.com Subject: More memory-efficient internal representation for Strings: call formore data Message-ID: 547e362e.5010...@oracle.com Content-Type: text/plain; charset=utf-8 Hi, As you may already know, we are looking into more memory efficient representation for Strings: https://bugs.openjdk.java.net/browse/JDK-8054307 As part of preliminary performance work for this JEP, we have to collect the empirical data on usual characteristics of Strings and char[]-s normal applications have, as well as figure out the early estimates for the improvements based on that data. What we have so far is written up here: http://cr.openjdk.java.net/~shade/density/string-density-report.pdf We would appreciate if people who are interested in this JEP can provide the additional data on their applications. It is double-interesting to have the data for the applications that process String data outside Latin1 plane. Our current data says these cases are rather rare. Please read the current report draft, and try to process your own heap dumps using the instructions in the Appendix. Thanks, -Aleksey.
Re: More memory-efficient internal representation for Strings: call for more data
Hi Douglas, On 12/03/2014 02:24 AM, Douglas Surber wrote: String construction is a big performance issue for JDBC drivers. Most queries return some number of Strings. The overwhelming majority of those Strings will be short lived. The cost of constructing these Strings from network bytes is a large fraction of total execution time. Any increase in the cost of constructing a String will far out weigh any reduction in memory use, at least for query results. You will also have to take into the account that shorter (compressed) Strings allow for more efficient operations on them. This is not to mention the GC costs are also usually hidden from the naive performance estimations: even though you can perceive the mutator is spending more time doing work, that might be offset by easier job for GC. All of the proposed compression methods require an additional scan of the entire string. That's exactly the wrong direction. Something like the following pseudo-code is common inside a driver. { char[] c = new char[n]; for (i = 0; i n; i++) c[i] = charSource.next(); return new String(c); } Good to know. We will be assessing the String(char[]) construction performance in the course of this performance work. What would you say is a characteristic high-level benchmark for the scenario you are describing? The array copy inside the String constructor is a significant fraction of JDBC driver execution time. Adding an additional scan on top of it is making things worse regardless of the transient benefit of more compact storage. In the case of a query result the String will be likely never be promoted out of new space; the benefit of compression would be minimal. It's hard to say at this point. We want to understand what footprint improvements we are talking about. I agree that if cost-benefit analysis will say the performance is degrading beyond the sane limits even if we are happy with memory savings, there is little reason to push this into the general JDK. Thanks, -Aleksey
Re: More memory-efficient internal representation for Strings: call for more data
The most common operation on most Strings in query results is to do nothing. Just construct the String, hold onto it while the rest of the transaction completes, then drop it on the floor. Probably the next most common is to encode the chars to write them to an OutputStream or send them back to the database. I'd be curious how a compact representation would help those operations. SPECjEnterprise is a widely used standard benchmark. It probably uses mostly (or even entirely) ASCII characters so it's not representative of many customers. My definition of sane limits might be different than yours. As far as I'm concerned String construction is already too slow and should be made faster by eliminating the char[] copy when possible. Douglas At 03:47 PM 12/2/2014, Aleksey Shipilev wrote: Hi Douglas, On 12/03/2014 02:24 AM, Douglas Surber wrote: String construction is a big performance issue for JDBC drivers. Most queries return some number of Strings. The overwhelming majority of those Strings will be short lived. The cost of constructing these Strings from network bytes is a large fraction of total execution time. Any increase in the cost of constructing a String will far out weigh any reduction in memory use, at least for query results. You will also have to take into the account that shorter (compressed) Strings allow for more efficient operations on them. This is not to mention the GC costs are also usually hidden from the naive performance estimations: even though you can perceive the mutator is spending more time doing work, that might be offset by easier job for GC. All of the proposed compression methods require an additional scan of the entire string. That's exactly the wrong direction. Something like the following pseudo-code is common inside a driver. { char[] c = new char[n]; for (i = 0; i n; i++) c[i] = charSource.next(); return new String(c); } Good to know. We will be assessing the String(char[]) construction performance in the course of this performance work. What would you say is a characteristic high-level benchmark for the scenario you are describing? The array copy inside the String constructor is a significant fraction of JDBC driver execution time. Adding an additional scan on top of it is making things worse regardless of the transient benefit of more compact storage. In the case of a query result the String will be likely never be promoted out of new space; the benefit of compression would be minimal. It's hard to say at this point. We want to understand what footprint improvements we are talking about. I agree that if cost-benefit analysis will say the performance is degrading beyond the sane limits even if we are happy with memory savings, there is little reason to push this into the general JDK. Thanks, -Aleksey
Re: More memory-efficient internal representation for Strings: call for more data
On 12/02/2014 04:42 PM, Douglas Surber wrote: The most common operation on most Strings in query results is to do nothing. Just construct the String, hold onto it while the rest of the transaction completes, then drop it on the floor. Probably the next most common is to encode the chars to write them to an OutputStream or send them back to the database. I'd be curious how a compact representation would help those operations. It depends on what inside those query results. If most of them are ascii, only a small portion are double byted user data (for example, it is true for most of the utf8 xml files), you might be able to save the cpu time/throughput by only copying half length of the bytes around their life circle, especially copy around is the only operation they are carrying on. -Sherman SPECjEnterprise is a widely used standard benchmark. It probably uses mostly (or even entirely) ASCII characters so it's not representative of many customers. My definition of sane limits might be different than yours. As far as I'm concerned String construction is already too slow and should be made faster by eliminating the char[] copy when possible. Douglas At 03:47 PM 12/2/2014, Aleksey Shipilev wrote: Hi Douglas, On 12/03/2014 02:24 AM, Douglas Surber wrote: String construction is a big performance issue for JDBC drivers. Most queries return some number of Strings. The overwhelming majority of those Strings will be short lived. The cost of constructing these Strings from network bytes is a large fraction of total execution time. Any increase in the cost of constructing a String will far out weigh any reduction in memory use, at least for query results. You will also have to take into the account that shorter (compressed) Strings allow for more efficient operations on them. This is not to mention the GC costs are also usually hidden from the naive performance estimations: even though you can perceive the mutator is spending more time doing work, that might be offset by easier job for GC. All of the proposed compression methods require an additional scan of the entire string. That's exactly the wrong direction. Something like the following pseudo-code is common inside a driver. { char[] c = new char[n]; for (i = 0; i n; i++) c[i] = charSource.next(); return new String(c); } Good to know. We will be assessing the String(char[]) construction performance in the course of this performance work. What would you say is a characteristic high-level benchmark for the scenario you are describing? The array copy inside the String constructor is a significant fraction of JDBC driver execution time. Adding an additional scan on top of it is making things worse regardless of the transient benefit of more compact storage. In the case of a query result the String will be likely never be promoted out of new space; the benefit of compression would be minimal. It's hard to say at this point. We want to understand what footprint improvements we are talking about. I agree that if cost-benefit analysis will say the performance is degrading beyond the sane limits even if we are happy with memory savings, there is little reason to push this into the general JDK. Thanks, -Aleksey