Re: Backport of JEP 254 (Compact Strings) to OpenJDK 8

2017-01-12 Thread Aleksey Shipilev
On 01/12/2017 08:11 PM, John Platts wrote:
> I am interested in OpenJDK 8 builds with JEP 254 (Compact Strings) support
> backported from OpenJDK 9. I like the compact strings work that is being done
> in JDK 9, but I am interested in a OpenJDK 8 build with backported JEP 254
> support since I am working with Java applications that might not work in
> OpenJDK 9 yet and since I would like take advantage of the feature in a
> backported OpenJDK 8 build instead of having to wait for the JDK 9 release.

This was discussed a number of times both internally and externally, and the
maintainers' consensus was that the feature is too intrusive VM- and JDK- wise.
Backporting it would be very painful, and probably destabilizing for JDK 8.

Think about the work one does enabling applications to run on JDK 9 as the price
to get Compact Strings :)

Thanks,
-Aleksey



Backport of JEP 254 (Compact Strings) to OpenJDK 8

2017-01-12 Thread John Platts
I am interested in OpenJDK 8 builds with JEP 254 (Compact Strings) support 
backported from OpenJDK 9. I like the compact strings work that is being done 
in JDK 9, but I am interested in a OpenJDK 8 build with backported JEP 254 
support since I am working with Java applications that might not work in 
OpenJDK 9 yet and since I would like take advantage of the feature in a 
backported OpenJDK 8 build instead of having to wait for the JDK 9 release.


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread John Rose
On Sep 6, 2016, at 2:18 PM, Tim Ellison  wrote:
> 
> People stash all sorts of things in (immutable) Strings. Reducing the
> limits in JDK9 seems like a regression.  Was there any consideration to
> using the older Java 8 StringCoding APIs for UTF-16 strings (already
> highly perf tuned) and adding additional methods for compact strings
> rather than rewriting everything as byte[]'s?

It doesn't help now, but https://bugs.openjdk.java.net/browse/JDK-8161256
proposes a better way to stash immutable bits, CONSTANT_Data.
(Caveat:  Language bindings not yet included.)  Eventually we'll get there.

— John

Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 2:18 PM, Tim Ellison wrote:



Do we have a real use case that impacted by this change?

People stash all sorts of things in (immutable) Strings. Reducing the
limits in JDK9 seems like a regression.  Was there any consideration to
using the older Java 8 StringCoding APIs for UTF-16 strings (already
highly perf tuned) and adding additional methods for compact strings
rather than rewriting everything as byte[]'s?




Hi Tim,

I'm sorry I don't get the idea of "using StringCoding APIs for UTF-16 
strings",
can you explain a little more in detail? We did try various approaches, 
byte[] +
flag, byte[] + coder, coder, char[] + coder, etc) the current one 
appears to be

the best so far based on our measurement.

Regards,
Sherman



Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Tim Ellison
On 06/09/16 19:04, Xueming Shen wrote:
> On 9/6/16, 10:09 AM, Tim Ellison wrote:
>> Has it been noted that while JEP 254 reduces the space occupied by one
>> byte per character strings, moving from a char[] to byte[]
>> representation universally means that the maximum length of a UTF-16
>> (two bytes per char) string is now halved?

Hey Sherman,

> Yes, it's a known "limit" given the nature of the approach. It is
> not considered to be an "incompatible change", because the max length
> the String class and the corresponding buffer/builder classes can
> support is really an implementation details, not a spec requirement.

Don't confuse spec compliance with compatibility.  Of course, the JEP
should not break the formal specified behavior of String etc, but the
goal was to ensure that the implementation be compatible with prior
behavior. As you know, there are many places where compatible behavior
beyond the spec is important to maintain.

> The conclusion from the discussion back then was this is something we
> can trade off for the benefits we gain from the approach. 

Out of curiosity, where was that?  I did search for previous discussion
of this topic but didn't see it -- it may be just my poor search foo.

> Do we have a real use case that impacted by this change?

People stash all sorts of things in (immutable) Strings. Reducing the
limits in JDK9 seems like a regression.  Was there any consideration to
using the older Java 8 StringCoding APIs for UTF-16 strings (already
highly perf tuned) and adding additional methods for compact strings
rather than rewriting everything as byte[]'s?

Regards,
Tim

>> Since the goal is "preserving full compatibility", this has been missed
>> by failing to allow for UTF-16 strings of length greater than
>> Integer.MAX_VALUE / 2.
>>
>> Regards,
>> Tim
>>
>>
> 


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 12:58 PM, Charles Oliver Nutter wrote:
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen > wrote:


Yes, it's a known "limit" given the nature of the approach. It is
not considered
to be an "incompatible change",  because the max length the String
class and
the corresponding buffer/builder classes can support is really an
implementation
details, not a spec requirement. The conclusion from the
discussion back then
was this is  something we can trade off for the benefits we gain
from the approach.
Do we have a real use case that impacted by this change?

Well, doesn't this mean that any code out there consuming String data 
that's longer than Integer.MAX_VALUE / 2 will suddenly start failing 
on OpenJDK 9?


Yes, true. But arguably the code that uses huge length of String should have
fallback code to handle the potential OOM exception, when the vm can't 
handle
the size, as there is really no guarantee the vm can handle the > 
max_value/2

length of String.


Not that such a case is a particularly good pattern, but I'm sure 
there's code out there doing it. On JRuby we routinely get bug reports 
complaining that we can't support strings larger than 2GB (and we have 
used byte[] for strings since 2006).



That was a trade-off decision to make.

Does JRuby have any better solution for such complain?  ever consider to 
go back to use char[]
to "fix" the problem? or some workaround such as to add another byte[] 
for example.


btw, the single byte only string should work just fine :-) or :-( 
depends on the character set

used.

Sherman


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread John Rose
On Sep 6, 2016, at 12:58 PM, Charles Oliver Nutter  wrote:
> 
> On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen 
> wrote:
> 
>> Yes, it's a known "limit" given the nature of the approach. It is not
>> considered
>> to be an "incompatible change",  because the max length the String class
>> and
>> the corresponding buffer/builder classes can support is really an
>> implementation
>> details, not a spec requirement. The conclusion from the discussion back
>> then
>> was this is  something we can trade off for the benefits we gain from the
>> approach.
>> Do we have a real use case that impacted by this change?
>> 
> 
> Well, doesn't this mean that any code out there consuming String data
> that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on
> OpenJDK 9?
> 
> Not that such a case is a particularly good pattern, but I'm sure there's
> code out there doing it. On JRuby we routinely get bug reports complaining
> that we can't support strings larger than 2GB (and we have used byte[] for
> strings since 2006).
> 
> - Charlie

The most basic scale requirement for strings is that they support class-file
constants, which top out at a UTF8-length of 2**16.  Lengths beyond that,
to fill up the 'int' return value of String::length, are less well specified.

FTR, we could have chosen char[], int[], or long[] (not byte[]) as the backing
store for string data.  With long[] we could have strings above 4G-chars.

But it would have come with a perf. tax, since the T[].length field would need
to be combined with an extra bit or two (from a flag byte) to complete the 
length.
That's 2-3 extra instructions for loading a string length, or else a redundant
length field.  So it's a trade-off.

Likewise, choosing a third format deepens branch depth in order to get to 
payload.

Likewise, making the second format (of two) have a length field embedded in the
payload section requires a conditional load or branch, in order to load the 
string
length.  Again, more instructions.

The team has looked at 20 possibilities like these.  The current design is 
fastest.
I hope it flies.

— John

Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Charles Oliver Nutter
On Tue, Sep 6, 2016 at 1:04 PM, Xueming Shen 
wrote:

> Yes, it's a known "limit" given the nature of the approach. It is not
> considered
> to be an "incompatible change",  because the max length the String class
> and
> the corresponding buffer/builder classes can support is really an
> implementation
> details, not a spec requirement. The conclusion from the discussion back
> then
> was this is  something we can trade off for the benefits we gain from the
> approach.
> Do we have a real use case that impacted by this change?
>

Well, doesn't this mean that any code out there consuming String data
that's longer than Integer.MAX_VALUE / 2 will suddenly start failing on
OpenJDK 9?

Not that such a case is a particularly good pattern, but I'm sure there's
code out there doing it. On JRuby we routinely get bug reports complaining
that we can't support strings larger than 2GB (and we have used byte[] for
strings since 2006).

- Charlie


Re: JEP 254: Compact Strings - length limits

2016-09-06 Thread Xueming Shen

On 9/6/16, 10:09 AM, Tim Ellison wrote:

Has it been noted that while JEP 254 reduces the space occupied by one
byte per character strings, moving from a char[] to byte[]
representation universally means that the maximum length of a UTF-16
(two bytes per char) string is now halved?

Hi Tim,

Yes, it's a known "limit" given the nature of the approach. It is not 
considered

to be an "incompatible change",  because the max length the String class and
the corresponding buffer/builder classes can support is really an 
implementation
details, not a spec requirement. The conclusion from the discussion back 
then
was this is  something we can trade off for the benefits we gain from 
the approach.

Do we have a real use case that impacted by this change?

Thanks,
Sherman


Since the goal is "preserving full compatibility", this has been missed
by failing to allow for UTF-16 strings of length greater than
Integer.MAX_VALUE / 2.

Regards,
Tim






Re: JEP 254: Compact Strings thoughts: character ranges outside ASCII + EASCII blocks

2015-09-25 Thread Aleksey Shipilev
Hi Simon,

On 09/25/2015 01:01 AM, Simon Spero wrote:
> [Some of this is may simple or prohibitively tricksy depending on alignment
> constraints (even though it's restricted to Prime Multilingual Plane :-) ]
> 
> For some not un-realistic use cases, the most significant bytes for all the
> characters in a string are identical, even if the string is non-latin. For
> example, all the characters may be in the range U+0400--U+04FF, or
> U+0500--U+05FF.
> In these cases, it may be feasible to save the upper byte, then splat it
> into place when reconstituting the UTF-16 chars.
> 
> Because of the assignment of unicode code-points, this technique is not as
> big as win as it might have been. Unlike (e.g.) 8859-5 or 8859-8, there are
> no punctuation marks, digits, or whitespace characters, which restricts use
> cases to very short strings (the lack of whitespace is the biggest
> problem). For the 254-like coding system I was experimenting with,  for the
> cases were I didn't fall back to UTF-16, the savings were overwhelmed by
> the cost of  header words and padding.
> 
> It is possible to handle some of these mixtures, on some architectures,
> without resorting to LUTs or branches, but that's well in to  non-goal
> territory for JEP-254. There might be some useful win just from being able
> to have an offset to be added to the packed value based if the  high-bit is
> set or not.  Anyone here from Москва?

Sure, many theoretical constructions may be devised. Not many of them
are practical.

JEP-254 wins big time exactly because many strings *are* single-byte
storeable in ASCII/8859-1, *especially* those with long lengths. So, the
very first thing you have to do is prove that an alternative scheme
successfully encodes a fair amount of real strings. Otherwise, it does
not worth exploring any further. As you say, a lack of "usual"
characters like whitespace may be the deal breaker.

Adding an alternative coder is easy, but making sure it does not regress
the prevailing cases of 8859-1/UTF16 strings is much harder. Think about
branching costs, eliminating the bit tricks that are employed now with
binary 0/1 coder, etc.

Thanks,
-Aleksey




JEP 254: Compact Strings thoughts: character ranges outside ASCII + EASCII blocks

2015-09-24 Thread Simon Spero
[Some of this is may simple or prohibitively tricksy depending on alignment
constraints (even though it's restricted to Prime Multilingual Plane :-) ]

For some not un-realistic use cases, the most significant bytes for all the
characters in a string are identical, even if the string is non-latin. For
example, all the characters may be in the range U+0400--U+04FF, or
U+0500--U+05FF.
In these cases, it may be feasible to save the upper byte, then splat it
into place when reconstituting the UTF-16 chars.

Because of the assignment of unicode code-points, this technique is not as
big as win as it might have been. Unlike (e.g.) 8859-5 or 8859-8, there are
no punctuation marks, digits, or whitespace characters, which restricts use
cases to very short strings (the lack of whitespace is the biggest
problem). For the 254-like coding system I was experimenting with,  for the
cases were I didn't fall back to UTF-16, the savings were overwhelmed by
the cost of  header words and padding.

It is possible to handle some of these mixtures, on some architectures,
without resorting to LUTs or branches, but that's well in to  non-goal
territory for JEP-254. There might be some useful win just from being able
to have an offset to be added to the packed value based if the  high-bit is
set or not.  Anyone here from Москва?

Simon
p.s.
   As part of the replacement for sun.misc.Unsafe, could we get a
jdk.infernal/...ABitDodgy, which would allow the full set of SIMD
instructions to be generated in an architecture independent fashion? (By
architecture independent I mean if you ask for a NEON instruction on an
amd64, or an SSE 4.2 string primitive on SPARC, that's what gets emitted).


Re: JEP 254: Compact Strings

2015-06-02 Thread Jeremy Manson
TL;DR: In principle, we'd love to do more early testing of Hotspot / JDK
features, but our benchmarks are, honestly, not all that great.  We end up
having to test against live services, which makes this sort of thing really
hard.

More info than you need:

There are two real problems here:

1) To do apples-to-apples comparisons, we have to make sure that *our*
patches all work with whatever version of Hotspot we're testing.

2) Pulling down a new JDK9 - even an official release - usually means that
there are a lot of instabilities, half-finished work, and inefficiencies,
so we can't really run tests very well against it.  That's not a knock on
Hotspot developers; the only way to know about some of these problems is to
run the JDK in infrastructure like ours.  ( An example of something that
hit us hard that no one else would notice:
http://hg.openjdk.java.net/jdk9/hs/hotspot/rev/5ba37c4c0578 )

It took us months to forward port all of our patches to JDK8, and we've
spent the last six months getting it to the point that we're comfortable
enough to ship to our users (just in time for JDK7 EOL!).  That's required
disabling tiered compilation, heavily tweaking code cache flushing (which
is still causing us CPU regressions), rewriting various parts of the
metaspace to behave more efficiently, and fixing various incompatibilities
with our internal patches.  That's completely apart from the dozens of
backwards incompatibilities introduced in JDK8 that triggered a very, very,
very large code cleanup effort, from the new hash iteration order to the
unicode update to the fact that if you call flush on a closed
BufferedOutputStream it now throws an exception.

(We actually ended up randomizing our hash iteration order, which helps us
guard against broken code, is slightly more secure, and means that we never
get bitten by that as part of an upgrade again.)

In short, upgrading is in no sense cheap for us, and trying out new
features is hard.  We usually restrict ourselves to using new features that
can be more-or-less cleanly patched to the version of the JDK we're using
and hidden behind a flag.

This is an important enough change that we might be able to make some
effort, but we'll have to see how it goes.

Jeremy

On Mon, Jun 1, 2015 at 1:31 AM, Aleksey Shipilev 
aleksey.shipi...@oracle.com wrote:

 (getting back to this)

 Hi Jeremy,

 On 05/16/2015 03:34 AM, Jeremy Manson wrote:
  So, I'm pretty dubious, mostly because of the risks mentioned in the JEP.
  If you need a flag-check and two code paths for every String method,
 that's
  going to make the String class more slow and bloated, and make it very
  difficult for the JIT compiler to do its job inlining and intrinsifying
  calls to String methods.

 Yes, we know that's a potential problem, e.g. outlined here:
  http://cr.openjdk.java.net/~shade/density/equals.txt

 The hope is that the string coder check would be amortized by the
 substantial performance improvement with the ubiquitous Latin1
 (optimized) operations. Also, getting a few code generation quirks
 kicked out may further offset the perceived performance costs of doing
 this (you can do such a trick every so often, but not all the time).


  The proposed change here has the potential of doing the opposite with
 most
  String operations - trading off less GC overhead for more mutator cost.
  But String operations are a pretty big chunk of CPU time, on the whole.

 The thing is, many mutator ops on Strings are also improved, because the
 data become more easily cacheable and/or require less steps to complete
 (think vectorization that takes 2x less instructions).


  Does anyone really have a sense of how to make this kind of decision?
 The
  JEP seems mostly to be hoping that other organizations will do the
 testing
  for you.

 It is not true that JEP hopes to have other organizations to do testing
 for it. The JEP tries to illuminate that this is a performance-sensitive
 change, so early testing and feedback is very appreciated. So, if you
 have the String-intensive workloads in your org, can you try and run the
 prototype JDK against it? Our early runs on our workloads of interest
 show the appealing improvements.

 That is, the decision to integrate this is not done yet, as we don't
 have the complete performance picture and/or fully-tested prototype. In
 other words, there are quite a few blank spots to fill out. Your data
 may be the part of that picture when we decide to integrate in JDK 9.


  (I agree that it is worth doing some experimentation in this area, but I
  wanted to say this early, because if I could reach back in time and tell
  you *not* to make the substring change, I would.  We seriously considered
  simply backing it out locally, but it would have been a lot of effort for
  us to maintain that kind of patch, and we didn't want our performance
  tradeoffs to be that much different from the stock JDK's.)

 This is your golden ticket: if you come back with concrete data in your

Re: JEP 254: Compact Strings

2015-06-01 Thread Vitaly Davidovich
Hi Aleksey,

While it's true that the denser format will require fewer cachelines, my
experience is that most strings are smaller than a single cacheline worth
of storage, maybe two lines in some cases; there's just a ton of them in
the heap.  So the heap footprint should be substantially reduced, but I'm
not sure the cache pollution will be significantly reduced.

There's currently no vectorization of char[] scanning (or any vectorization
other than memcpy for that matter) - are you referring to the recent Intel
contributions here or there's a plan to further improve vectorization in
time for this JEP? Just curious.

I agree that string fusion is separate from this change, and we've
discussed this before.  It just seems to me like that's a bigger perf
problem today since even tiny/small strings (very common, IME) incur the
indirection and bloat overhead, so would have liked to see that addressed
first.  If you're saying that's fully on valhalla's plate, ok, but I
haven't seen anything proposed there yet.

Thanks

sent from my phone
On Jun 1, 2015 4:50 AM, Aleksey Shipilev aleksey.shipi...@oracle.com
wrote:

 On 05/18/2015 05:35 PM, Vitaly Davidovich wrote:
  This part is a bit unclear for the proposed changes.  While it's true
 that
  single byte encoding will be denser than two byte, most string ops end up
  walking the backing store linearly; prefetch (either implicit h/w or
  software-assisted) could hide the memory access latency.

 It will still pollute the caches though, and generally incur more
 instructions to be executed (e.g. think about the vectorized scan of the
 char[] array -- the compressed version will take 2x less instructions).


  Personally, what I'd like to see is fusing storage of String with its
  backing data, irrespective of encoding (i.e. removing the indirection to
  fetch the char[] or byte[]).

 This is not the target for this JEP, and the groundwork for
 String-char[] fusion is handled elsewhere (I put my hopes at Valhalla
 that will explore the exact path to add the exotic object shapes into
 the runtime).

 String-char[] fusion neither conflicts with the Compact String
 optimization, nor provides the alternative. Removing the excess
 headers from backing char[] array would solve the static overhead in
 Strings, while the String compaction would further compact the backing
 storage.

 Thanks,
 -Aleksey.





Re: JEP 254: Compact Strings

2015-06-01 Thread Aleksey Shipilev
On 05/18/2015 05:35 PM, Vitaly Davidovich wrote:
 This part is a bit unclear for the proposed changes.  While it's true that
 single byte encoding will be denser than two byte, most string ops end up
 walking the backing store linearly; prefetch (either implicit h/w or
 software-assisted) could hide the memory access latency.

It will still pollute the caches though, and generally incur more
instructions to be executed (e.g. think about the vectorized scan of the
char[] array -- the compressed version will take 2x less instructions).


 Personally, what I'd like to see is fusing storage of String with its
 backing data, irrespective of encoding (i.e. removing the indirection to
 fetch the char[] or byte[]).

This is not the target for this JEP, and the groundwork for
String-char[] fusion is handled elsewhere (I put my hopes at Valhalla
that will explore the exact path to add the exotic object shapes into
the runtime).

String-char[] fusion neither conflicts with the Compact String
optimization, nor provides the alternative. Removing the excess
headers from backing char[] array would solve the static overhead in
Strings, while the String compaction would further compact the backing
storage.

Thanks,
-Aleksey.




Re: JEP 254: Compact Strings

2015-06-01 Thread Aleksey Shipilev
(getting back to this)

Hi Jeremy,

On 05/16/2015 03:34 AM, Jeremy Manson wrote:
 So, I'm pretty dubious, mostly because of the risks mentioned in the JEP.
 If you need a flag-check and two code paths for every String method, that's
 going to make the String class more slow and bloated, and make it very
 difficult for the JIT compiler to do its job inlining and intrinsifying
 calls to String methods.

Yes, we know that's a potential problem, e.g. outlined here:
 http://cr.openjdk.java.net/~shade/density/equals.txt

The hope is that the string coder check would be amortized by the
substantial performance improvement with the ubiquitous Latin1
(optimized) operations. Also, getting a few code generation quirks
kicked out may further offset the perceived performance costs of doing
this (you can do such a trick every so often, but not all the time).


 The proposed change here has the potential of doing the opposite with most
 String operations - trading off less GC overhead for more mutator cost.
 But String operations are a pretty big chunk of CPU time, on the whole.

The thing is, many mutator ops on Strings are also improved, because the
data become more easily cacheable and/or require less steps to complete
(think vectorization that takes 2x less instructions).


 Does anyone really have a sense of how to make this kind of decision?  The
 JEP seems mostly to be hoping that other organizations will do the testing
 for you.

It is not true that JEP hopes to have other organizations to do testing
for it. The JEP tries to illuminate that this is a performance-sensitive
change, so early testing and feedback is very appreciated. So, if you
have the String-intensive workloads in your org, can you try and run the
prototype JDK against it? Our early runs on our workloads of interest
show the appealing improvements.

That is, the decision to integrate this is not done yet, as we don't
have the complete performance picture and/or fully-tested prototype. In
other words, there are quite a few blank spots to fill out. Your data
may be the part of that picture when we decide to integrate in JDK 9.


 (I agree that it is worth doing some experimentation in this area, but I
 wanted to say this early, because if I could reach back in time and tell
 you *not* to make the substring change, I would.  We seriously considered
 simply backing it out locally, but it would have been a lot of effort for
 us to maintain that kind of patch, and we didn't want our performance
 tradeoffs to be that much different from the stock JDK's.)

This is your golden ticket: if you come back with concrete data in your
hands saying that the particular tradeoff the JEP made is not sensible
for your applications, it would be considered in the decision to
integrate. But, it should be a real data and/or contrived benchmark
simulating the real-world scenario, not just theoretical appeals -- we
know how misguided those can get.


Thanks,
-Aleksey



Re: JEP 254: Compact Strings

2015-06-01 Thread Aleksey Shipilev
On 06/01/2015 03:54 PM, Vitaly Davidovich wrote:
 While it's true that the denser format will require fewer cachelines, my
 experience is that most strings are smaller than a single cacheline
 worth of storage, maybe two lines in some cases; there's just a ton of
 them in the heap.  So the heap footprint should be substantially
 reduced, but I'm not sure the cache pollution will be significantly reduced.

This calculation assumes object allocations are granular to the cache
lines. They are not: if String takes less space within the cache line,
it allows *more* object data to be squeezed there. In other words, with
compact Strings, the entire dataset can take less cache lines, thus
improving performance.


 There's currently no vectorization of char[] scanning (or any
 vectorization other than memcpy for that matter) - are you referring to
 the recent Intel contributions here or there's a plan to further improve
 vectorization in time for this JEP? Just curious.

String methods are intensely intrinsified (and vectorized in those
implementations). String::equals, String::compareTo, and some
encoding/decoding come to mind.

I really, really invite you to read the collateral materials from the
JEP, where we explored quite a few performance characteristics already.


Thanks,
-Aleksey.




Re: JEP 254: Compact Strings

2015-06-01 Thread Vitaly Davidovich
My calculation doesn't assume cacheline granularity; I'm looking at
strictly the strings.  What's allocated next to/around them is completely
arbitrary, circumstantial, uncontrollable to a large extent, and often not
repeatable.  If you're claiming that some second or even third order
locality effects will be measurable, I don't know how :).  I'm sure there
will be some as theoretically it's possible, but it'll be hard to
demonstrate that on anything other than specially crafted microbenchmarks.

Ok, you're talking about some string intrinsics and not general char[]
being vectorized - fair enough.

sent from my phone
On Jun 1, 2015 9:31 AM, Aleksey Shipilev aleksey.shipi...@oracle.com
wrote:

 On 06/01/2015 03:54 PM, Vitaly Davidovich wrote:
  While it's true that the denser format will require fewer cachelines, my
  experience is that most strings are smaller than a single cacheline
  worth of storage, maybe two lines in some cases; there's just a ton of
  them in the heap.  So the heap footprint should be substantially
  reduced, but I'm not sure the cache pollution will be significantly
 reduced.

 This calculation assumes object allocations are granular to the cache
 lines. They are not: if String takes less space within the cache line,
 it allows *more* object data to be squeezed there. In other words, with
 compact Strings, the entire dataset can take less cache lines, thus
 improving performance.


  There's currently no vectorization of char[] scanning (or any
  vectorization other than memcpy for that matter) - are you referring to
  the recent Intel contributions here or there's a plan to further improve
  vectorization in time for this JEP? Just curious.

 String methods are intensely intrinsified (and vectorized in those
 implementations). String::equals, String::compareTo, and some
 encoding/decoding come to mind.

 I really, really invite you to read the collateral materials from the
 JEP, where we explored quite a few performance characteristics already.


 Thanks,
 -Aleksey.





RE: JEP 254: Compact Strings

2015-05-18 Thread Rezaei, Mohammad A.
For what it's worth, we would welcome this change. We took a large memory hit 
and a small performance hit when we upgraded from 1.6 to 1.7 in some of our 
memory-bound applications.

From a purely performance perspective, the most expensive CPU operations are 
memory access these days. Anything that halves memory reads will likely 
produce better performance.

From an implementation perspective, having used 1.6's compressed strings 
feature in production, we are comfortable that none of our code, nor any of 
our dependencies rely on String internal representation in such a way as to 
cause a significant backward compatibility issue.

Thanks
Moh

-Original Message-
From: core-libs-dev [mailto:core-libs-dev-boun...@openjdk.java.net] On Behalf
Of mark.reinh...@oracle.com
Sent: Thursday, May 14, 2015 7:05 PM
To: xueming.s...@oracle.com
Cc: core-libs-dev@openjdk.java.net
Subject: JEP 254: Compact Strings

New JEP Candidate: http://openjdk.java.net/jeps/254

- Mark


Re: JEP 254: Compact Strings

2015-05-18 Thread Vitaly Davidovich

 From a purely performance perspective, the most expensive CPU operations
 are memory access these days.


Very true ... for random accesses.

Anything that halves memory reads will likely produce better performance.


This part is a bit unclear for the proposed changes.  While it's true that
single byte encoding will be denser than two byte, most string ops end up
walking the backing store linearly; prefetch (either implicit h/w or
software-assisted) could hide the memory access latency.

Personally, what I'd like to see is fusing storage of String with its
backing data, irrespective of encoding (i.e. removing the indirection to
fetch the char[] or byte[]).

On Mon, May 18, 2015 at 10:24 AM, Rezaei, Mohammad A. 
mohammad.rez...@gs.com wrote:

 For what it's worth, we would welcome this change. We took a large memory
 hit and a small performance hit when we upgraded from 1.6 to 1.7 in some of
 our memory-bound applications.

 From a purely performance perspective, the most expensive CPU operations
 are memory access these days. Anything that halves memory reads will likely
 produce better performance.

 From an implementation perspective, having used 1.6's compressed strings
 feature in production, we are comfortable that none of our code, nor any of
 our dependencies rely on String internal representation in such a way as to
 cause a significant backward compatibility issue.

 Thanks
 Moh

 -Original Message-
 From: core-libs-dev [mailto:core-libs-dev-boun...@openjdk.java.net] On
 Behalf
 Of mark.reinh...@oracle.com
 Sent: Thursday, May 14, 2015 7:05 PM
 To: xueming.s...@oracle.com
 Cc: core-libs-dev@openjdk.java.net
 Subject: JEP 254: Compact Strings
 
 New JEP Candidate: http://openjdk.java.net/jeps/254
 
 - Mark



Re: JEP 254: Compact Strings

2015-05-15 Thread Jeremy Manson
So, I'm pretty dubious, mostly because of the risks mentioned in the JEP.
If you need a flag-check and two code paths for every String method, that's
going to make the String class more slow and bloated, and make it very
difficult for the JIT compiler to do its job inlining and intrinsifying
calls to String methods.

At Google, we spent a fair bit of time last year climbing out of the
performance hole that trimming substrings dropped us into - we had a fair
bit of code that was based around substrings being approximately
memory-neutral, and it cost us a lot of GC overhead and rewriting to make
the change.  The JDK itself still has exposed APIs that make tradeoffs
based on cheap substrings (the URL(String) constructor does a lot of this,
for example).

The proposed change here has the potential of doing the opposite with most
String operations - trading off less GC overhead for more mutator cost.
But String operations are a pretty big chunk of CPU time, on the whole.
Does anyone really have a sense of how to make this kind of decision?  The
JEP seems mostly to be hoping that other organizations will do the testing
for you.

(I agree that it is worth doing some experimentation in this area, but I
wanted to say this early, because if I could reach back in time and tell
you *not* to make the substring change, I would.  We seriously considered
simply backing it out locally, but it would have been a lot of effort for
us to maintain that kind of patch, and we didn't want our performance
tradeoffs to be that much different from the stock JDK's.)

Jeremy

On Thu, May 14, 2015 at 4:05 PM, mark.reinh...@oracle.com wrote:

 New JEP Candidate: http://openjdk.java.net/jeps/254

 - Mark



JEP 254: Compact Strings

2015-05-14 Thread mark . reinhold
New JEP Candidate: http://openjdk.java.net/jeps/254

- Mark