Re: The store for byte strings

2018-06-10 Thread Florian Weimer
* John Rose: > In https://bugs.openjdk.java.net/browse/JDK-8161256 I discuss > this nascent API under the name "ByteSequence", which is analogous > to CharSequence, but doesn't mention the types 'char' or 'String'. Very interesting. What's the specification for toString() and hashCode()? One

Re: The store for byte strings

2018-06-09 Thread John Rose
On Jun 9, 2018, at 12:18 PM, Xueming Shen wrote: > > Ideally I would assume we would want to have a utf-8 internal storage for > String, even in theory utf8 is supposed to be used externally and utf16 > to be the internal one. Separately from my point about ByteSequence, I agree that "doubling

Re: The store for byte strings

2018-06-09 Thread John Rose
I'm glad to see you are thinking about this, Florian. You appear to be aiming at a way to compactly store and manipulate series of octets (in an arbitrary encoding) with an emphasis on using those octets to represent strings, in the usual sense of character sequences. Would you agree that this

Re: The store for byte strings

2018-06-09 Thread Xueming Shen
On 6/9/18, 3:27 AM, Florian Weimer wrote: Lately I've been thinking about string representation. The world turned out not to be UCS-2 or UTF-16, after all, and we often have to deal with strings generally encoded as ASCII or UTF-8, but we aren't always encoded this way (and there might not even

The store for byte strings

2018-06-09 Thread Florian Weimer
Lately I've been thinking about string representation. The world turned out not to be UCS-2 or UTF-16, after all, and we often have to deal with strings generally encoded as ASCII or UTF-8, but we aren't always encoded this way (and there might not even be a charset declaration, see the ELF