Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Kenton Varda
I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. On Fri, May 7, 2010 at 6:21 PM, Evan

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Evan Jones
On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-07 Thread Kenton Varda
Yeah I don't think we should add a way to inject decoders into ByteString... I'd be very interested to hear why the JDK is not optimal here. On Mon, May 3, 2010 at 6:16 PM, Evan Jones ev...@mit.edu wrote: On May 3, 2010, at 21:11 , Evan Jones wrote: Yes, I actually changed ByteString, since

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-07 Thread Evan Jones
On May 7, 2010, at 18:54 , Kenton Varda wrote: I'd be very interested to hear why the JDK is not optimal here. I dug into this. I *think* the problem is that the JDK ends up allocating a huge temporary array for the UTF-8 data. Hence, the garbage collection cost is higher for the JDK's

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-03 Thread Kenton Varda
Interesting. Since this seems like a JVM implementation issue, I wonder if the results are different on Dalvik (Android). Also, the extra code sounds undesirable for lite mode, but my guess is that you had to place this code inside CodedOutputStream which is shared by lite mode. So yeah, there

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-03 Thread Evan Jones
On May 3, 2010, at 21:11 , Evan Jones wrote: Yes, I actually changed ByteString, since ByteString.copyFromUtf8 is how protocol buffers get UTF-8 encoded strings at this point. Although now that I think about it, I think it might be possible to enable this only for SPEED messages, if that