Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-05-02 Thread Alan Bateman
Xueming Shen wrote: Hi This is motivated by Neil's request to optimize common-case UTF8 path for native ZipFile.getEntry calls [1]. As I said in my replying email [2] I believe a better approach might be to patch UTF8 charset directly to implement sun.nio.cs.ArrayDecoder/Encoder interface to

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-05-02 Thread Xueming Shen
On 5/2/2011 7:31 AM, Alan Bateman wrote: Xueming Shen wrote: Hi This is motivated by Neil's request to optimize common-case UTF8 path for native ZipFile.getEntry calls [1]. As I said in my replying email [2] I believe a better approach might be to patch UTF8 charset directly to implement

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-05-02 Thread Alan Bateman
Xueming Shen wrote: : Webrev has been updated accordingly. I renamed the getBBuffer to getByteBuffer, now it looks better:-) The updated webrev looks fine to me. Personally I wouldn't have cached the byte buffer as the wrapping is not expensive (doesn't copy the array) and it's only used

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-30 Thread Ulf Zibis
Am 29.04.2011 02:11, schrieb Xueming Shen: On 04-28-2011 3:46 PM, Ulf Zibis wrote: It's safe to say that java.nio.cs.StandardCharset is not for String.getBytes()/toCharArray() only, so the fact that cs variant of String.getBytes()/toCharArray() is slower than its csn variant arguably might

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-29 Thread Neil Richards
On 28 April 2011 12:01, Alan Bateman alan.bate...@oracle.com wrote: I skimmed through the webrev and I agree this is a better approach. I will try to do a detailed review before Monday. It would be great if others on the list could jump in and help too as we are running out of time. Neil - I

Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Xueming Shen
Hi This is motivated by Neil's request to optimize common-case UTF8 path for native ZipFile.getEntry calls [1]. As I said in my replying email [2] I believe a better approach might be to patch UTF8 charset directly to implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the coding

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Alan Bateman
Xueming Shen wrote: Hi This is motivated by Neil's request to optimize common-case UTF8 path for native ZipFile.getEntry calls [1]. As I said in my replying email [2] I believe a better approach might be to patch UTF8 charset directly to implement sun.nio.cs.ArrayDecoder/Encoder interface to

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Ulf Zibis
Interesting results! Some days ago we had the discussion about constants for standard Charsets. Looking at your results, I see, that using *charset names constants*, the conversion mostly performs little better (up to 25 %), than using *charset constants*. So again my question: Why do we need

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Ulf Zibis
According to comments in 6795537 I additionally assume else if (b1 (byte)0xc2) should be little faster than else if ((b1 5) == -2) and if (isMalformed2(b1, b2)) could be replaced by if (isNotContinuation(b2)) -Ulf Am 28.04.2011 14:44, schrieb Ulf Zibis: Interesting

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Xueming Shen
On 04/28/2011 05:44 AM, Ulf Zibis wrote: In malformed(byte[] src, int sp, int nb) I think you could cache the ByteBuffer bb, instead instantiating a new one all the time. For this the method should not be static to ensure thread-safety. I was assuming that in scenario that you have

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Ulf Zibis
Am 28.04.2011 21:56, schrieb Xueming Shen: That said, you do have the point, we should do better even in malformed case, ... Yes, that's what I wanted to point on. But I thought, you could go 1 step further, declaring bb as member of UTF_8.Decoder. Then it should be guaranteed, the a decoder

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Xueming Shen
On 04/28/2011 01:55 PM, Ulf Zibis wrote: Am 28.04.2011 21:56, schrieb Xueming Shen: That said, you do have the point, we should do better even in malformed case, ... Yes, that's what I wanted to point on. But I thought, you could go 1 step further, declaring bb as member of UTF_8.Decoder.

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

2011-04-28 Thread Ulf Zibis
Am 28.04.2011 23:28, schrieb Xueming Shen: On 04/28/2011 01:55 PM, Ulf Zibis wrote: Am 28.04.2011 21:56, schrieb Xueming Shen: That said, you do have the point, we should do better even in malformed case, ... Yes, that's what I wanted to point on. But I thought, you could go 1 step further,