Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-06-03 Thread Kenton Varda
Please don't use reflection to reach into private internals of classes you don't maintain. We have public and private for a reason. Furthermore, this access may throw a SecurityException if a SecurityManager is in use. On Mon, May 31, 2010 at 11:25 AM, David Dabbs dmda...@gmail.com wrote:

RE: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-06-01 Thread David Dabbs
of strings to en/decode was small. Did the same hold true when using a ThreadLocal? David -Original Message- From: Evan Jones [mailto:ev...@mit.edu] Sent: Monday, May 31, 2010 4:32 PM To: David Dabbs Cc: Protocol Buffers Subject: Re: [protobuf] Java UTF-8 encoding/decoding: possible

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-06-01 Thread Evan Jones
On Jun 1, 2010, at 2:29 , David Dabbs wrote: Even with the extra call to access the offset, I would think there would be some advantage to not making the data copies, which generate garbage cruft. However, the way I am doing it doesn't generate any garbage: I keep a temporary char[]

RE: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-31 Thread David Dabbs
The approach I found worked the best: 1. Copy the string into a pre-allocated and re-used char[] array. This is needed since the JDK does not permit access to the String's char[] ,to enforce immutability. This is a performance loss VS the JDK, which can access the char[] directly

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-31 Thread Evan Jones
On May 31, 2010, at 14:25 , David Dabbs wrote: you may access a String's internals via reflection in a safe, albeit potentially implementation-specific way. See class code below. As long as your java.lang.String uses value for the char[] and offset for the storage offset, this should work. No

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-30 Thread Evan Jones
On May 18, 2010, at 0:33 , Kenton Varda wrote: What if you did a fast scan of the bytes first to see if any are non- ASCII? Maybe only do this fast scan if the data is short enough to fit in L1 cache? I didn't try this exact idea, but my guess is that it may not be a win: to get fast

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-18 Thread Christopher Smith
That seems simple enough and likely to produce a net win often enough. --Chris On May 17, 2010, at 9:33 PM, Kenton Varda ken...@google.com wrote: What if you did a fast scan of the bytes first to see if any are non-ASCII? Maybe only do this fast scan if the data is short enough to fit in L1

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Christopher Smith
This does somewhat suggestive that it might be worthwhile specifically tagging a field as ASCII only. There are enough cases of this that it could be a huge win. On 5/17/10, Evan Jones ev...@mit.edu wrote: On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Kenton Varda
What if you did a fast scan of the bytes first to see if any are non-ASCII? Maybe only do this fast scan if the data is short enough to fit in L1 cache? On Mon, May 17, 2010 at 7:59 PM, Christopher Smith cbsm...@gmail.comwrote: This does somewhat suggestive that it might be worthwhile

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2010-01-19 Thread Kenton Varda
I think the 30-80% speed boost would be well worth the extra code size / memory overhead. Please send me the patch! On Sun, Jan 17, 2010 at 9:33 AM, Evan Jones ev...@mit.edu wrote: I've implemented a rough prototype of an optimization to the Java implementation that serializes Java strings

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2009-12-23 Thread Kenton Varda
On Wed, Dec 23, 2009 at 7:44 AM, Evan Jones ev...@mit.edu wrote: On Dec 22, 2009, at 19:59 , Kenton Varda wrote: I wonder if we can safely discard the cached byte array during serialization on the assumption that most messages are serialized only once? This is a good idea, and it seems to

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2009-12-22 Thread David Yu
On Wed, Dec 23, 2009 at 1:26 AM, Evan Jones ev...@mit.edu wrote: I've done some quick and dirty benchmarking of Java string encoding/ decoding to/from UTF-8 for an unrelated project, but I've realized that these performance improvements could be added to protobufs. The easy way to do UTF-8

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2009-12-22 Thread Kenton Varda
On Tue, Dec 22, 2009 at 7:06 PM, David Yu david.yu@gmail.com wrote: There should be a writeByteArray(int fieldNumber, byte[] value) in CodedOutputStream so that the cached bytes of strings would be written directly. The ByteString would not help, it adds more memory since it creates a

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2009-12-22 Thread Kenton Varda
On Tue, Dec 22, 2009 at 8:18 PM, David Yu david.yu@gmail.com wrote: On Wed, Dec 23, 2009 at 11:14 AM, Kenton Varda ken...@google.com wrote: On Tue, Dec 22, 2009 at 7:06 PM, David Yu david.yu@gmail.com wrote: There should be a writeByteArray(int fieldNumber, byte[] value) in

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

2009-12-22 Thread David Yu
On Wed, Dec 23, 2009 at 12:21 PM, Kenton Varda ken...@google.com wrote: On Tue, Dec 22, 2009 at 8:18 PM, David Yu david.yu@gmail.com wrote: On Wed, Dec 23, 2009 at 11:14 AM, Kenton Varda ken...@google.com wrote: On Tue, Dec 22, 2009 at 7:06 PM, David Yu david.yu@gmail.comwrote: