Re: [protobuf] EnumValueDescriptor doesn't provide toString()?
Right, because none of the other field value types are descriptors. I see your point -- since getField() returns an Object, it would certainly be nice to be able to call toString() on it without knowing the type. But, it's also important that EnumValueDescriptor be consistent with other descriptor classes, so we want to be careful not to mess up that consistency. Instead of calling toString(), you could call TextFormat.printFieldToString() to get a string representation of the field, although it will include the field name. On Mon, May 10, 2010 at 9:17 PM, Christopher Smith cbsm...@gmail.comwrote: Actually, toString() seems to work for me for every other value I get from a dynamic message *except* enums. --Chris On May 10, 2010, at 8:32 PM, Kenton Varda ken...@google.com wrote: I don't think we should add toString() to any of the descriptor classes unless we are going to implement it for *all* of them in some consistent way. If we fill them in ad-hoc then they may be inconsistent, and we may not be able to change them to make them consistent without breaking users. On Mon, May 10, 2010 at 9:49 AM, Christopher Smith cbsm...@gmail.com cbsm...@gmail.com wrote: I noticed EnumValueDescriptor uses the default toString() method. Why not override it to call getFullName()? --Chris -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf%2bunsubscr...@googlegroups.com protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. On Fri, May 7, 2010 at 6:21 PM, Evan Jones ev...@mit.edu wrote: On May 7, 2010, at 18:54 , Kenton Varda wrote: I'd be very interested to hear why the JDK is not optimal here. I dug into this. I *think* the problem is that the JDK ends up allocating a huge temporary array for the UTF-8 data. Hence, the garbage collection cost is higher for the JDK's implementation, rather than my implementation. Basically the code does this: * allocate a new byte[] array that is string length * max bytes per character ( = 4 for the UTF encoder) * use the java.nio.charset.CharsetEncoder to encode the char[] into the byte[] (wrapped in CharBuffer / ByteBuffer). * copy the exact number of bytes out of the byte[] into a new byte[], and return that. The only trick the JDK gets to use that normal Java code can't is that they can access the string's char[] buffer directly, whereas I need to copy it out into a char[] array. Hence, I think what is happening is that the JDK allocates 4-5 times as much memory per encode than I do. In the cases where the data is ASCII, my code is faster, since it allocates exactly the right amount of space, and doesn't need an extra copy. When the data is not ASCII, my code may still be faster, since it doesn't overallocate quite as much (in exchange my code does many copies). Conclusion: there is a legitimate reason for this code to be faster than the JDK's code. But it still may not be worth including this patch in the main line protocol buffer code. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] EnumValueDescriptor doesn't provide toString()?
I grok the problem now. This is the only descriptor that is also a value. Probably should have a method/visitor specifically for getting the value string of an object that isn't implemented by descriptors. --Chris On May 17, 2010, at 12:35 PM, Kenton Varda ken...@google.com wrote: Right, because none of the other field value types are descriptors. I see your point -- since getField() returns an Object, it would certainly be nice to be able to call toString() on it without knowing the type. But, it's also important that EnumValueDescriptor be consistent with other descriptor classes, so we want to be careful not to mess up that consistency. Instead of calling toString(), you could call TextFormat.printFieldToString() to get a string representation of the field, although it will include the field name. On Mon, May 10, 2010 at 9:17 PM, Christopher Smith cbsm...@gmail.com wrote: Actually, toString() seems to work for me for every other value I get from a dynamic message *except* enums. --Chris On May 10, 2010, at 8:32 PM, Kenton Varda ken...@google.com wrote: I don't think we should add toString() to any of the descriptor classes unless we are going to implement it for *all* of them in some consistent way. If we fill them in ad-hoc then they may be inconsistent, and we may not be able to change them to make them consistent without breaking users. On Mon, May 10, 2010 at 9:49 AM, Christopher Smith cbsm...@gmail.com wrote: I noticed EnumValueDescriptor uses the default toString() method. Why not override it to call getFullName()? --Chris -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Issue 188 in protobuf: protobuf fails to link after compiling with LDFLAGS=-Wl,--as-needed because of missing -lpthread
Updates: Status: NeedPatchFromUser Comment #2 on issue 188 by ken...@google.com: protobuf fails to link after compiling with LDFLAGS=-Wl,--as-needed because of missing -lpthread http://code.google.com/p/protobuf/issues/detail?id=188 We can't just switch the order, because if -pthread exists as a compiler flag, then it is essential that we use it. Just -lpthread is not good enough, because -pthread tells GCC to output thread-safe code. It sounds like acx_pthread.m4 may need to be refactored somewhat to get this right. Please feel free to submit a patch. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Issue 187 in protobuf: Command-line argument to override the optimize_for option
Comment #4 on issue 187 by ken...@google.com: Command-line argument to override the optimize_for option http://code.google.com/p/protobuf/issues/detail?id=187 I agree, we should be able to override options on the command-line. The only problem is that it's unclear how far this support needs to go. Should you be able to override message-level and field-level options, or just file-level? Should an override apply to an individual file, or should it apply to all the files it imports, too? In the case of optimize_for, we'd probably want the override to apply to imports, but for something like java_package we probably don't. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Issue 188 in protobuf: protobuf fails to link after compiling with LDFLAGS=-Wl,--as-needed because of missing -lpthread
Comment #3 on issue 188 by xarthisius.kk: protobuf fails to link after compiling with LDFLAGS=-Wl,--as-needed because of missing -lpthread http://code.google.com/p/protobuf/issues/detail?id=188 On x86 Linux machine -pthread does two things: 1. defines _REENTRANT for the preprocessor during compiling 2. adds -lpthread when passed during linking It does no other magic, and quoting man gcc: ... This option does not affect the thread safety of object code produced by the compiler or that of libraries supplied with it. Best regards, Kacper Kowalik -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Issue 59 in protobuf: Add another option to support java_implement_interface
Comment #13 on issue 59 by aantono: Add another option to support java_implement_interface http://code.google.com/p/protobuf/issues/detail?id=59 just an FYI, as its mentioned in issue 82, there is already a set of formatters for JSON, XML, etc, as part of the http://code.google.com/p/protobuf-java-format/ project. I've been toying around with an idea of making a common interface that they would all implement, so maybe then we could enhance the code generation part to accept any formatter/codec class which will be coded to a well- known interface. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Protocol buffers and large data sets
I wanted to get some opinion on large data sets and protocol buffers. Protocol Buffer project page by google says that for data 1 megabytes, one should consider something different but they don’t mention what would happen if one crosses this limit. Are there any known failure modes when it comes to the large data sets? What are your observations, recommendations from your experience on this front? -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] systesting utf8 validation on remote node using protocol buffers from python
Hello, I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Protocol buffers and large data sets
There is a default byte size limit of 64MB when parsing protocol buffers - if a message is larger than that, it will fail to parse. This can be configured if you really need to parse larger messages, but it is generally not recommended. Additionally, ByteSize() returns a 32-bit integer, so there's an implicit limit on the size of data that can be serialized. You can certainly use protocol buffers in large data sets, but it's not recommended to have your entire data set be represented by a single message. Instead, see if you can break it up into smaller messages. On Mon, May 17, 2010 at 1:05 PM, sanikumbh saniku...@gmail.com wrote: I wanted to get some opinion on large data sets and protocol buffers. Protocol Buffer project page by google says that for data 1 megabytes, one should consider something different but they don’t mention what would happen if one crosses this limit. Are there any known failure modes when it comes to the large data sets? What are your observations, recommendations from your experience on this front? -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. Yes. I've been playing with this a bit in my spare time since the last email, but I don't have any results I'm happy with yet. Rough notes: * Encoding is (quite a bit?) faster than String.getBytes() if you assume one byte per character. * If you guess the number bytes per character poorly and have to do multiple allocations and copies, the regular Java version will win. If you get it right (even if you first guess 1 byte per character) it looks like it can be slightly faster or on par with the Java version. * Re-using a temporary byte[] for string encoding may be faster than String.getBytes(), which effectively allocates a temporary byte[] each time. I'm going to try to rework my code with a slightly different policy: a) Assume 1 byte per character and attempt the encode. If we run out of space: b) Use a shared temporary buffer and continue the encode. If we run out of space: c) Allocate a worst case 4 byte per character buffer and finish the encode. This should be much better than the JDK version for ASCII, a bit better for short strings that fit in the shared temporary buffer, and not significantly worse for the rest, but I'll need to test it to be sure. This is sort of just a fun experiment for me at this point, so who knows when I may get around to actually finishing this. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
Okay, well it's slightly more complicated. My C++ application needs to actually accept the technically invalid code points U+ and U+FFFE. Otherwise, I need my server application to know when invalid UTF-8 has happened. That's all fine. I have that all implemented. That's good. The problem is I want to exercise that behavior from my Python systest framework. The problem is the Python libs are trying to be too helpful. While I normally want them to do UTF-8 validation, I *don't* want them to during the systests, because I want to send bad UTF-8 to the server. Make sense? I'm trying to do bad things to make sure stuff still works in a systest environment. -JT On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote: If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] systesting custom utf8 validation on remote c++ node using protocol buffers from python
It looks like I figured out a solution, though I'm not sure this is the best way. I have: pbuf = MyProtoBuf() pbuf.string_field = # to make sure pbuf initialization stuff works (sets _has_string_field, etc) pbuf._value_string_field = bad utf8 f = pbuf.DESCRIPTOR.fields_by_number[pbuf.STRING_FIELD_NUMBER] f.type = f.TYPE_BYTES On Mon, May 17, 2010 at 5:37 PM, JT Olds jto...@xnet5.com wrote: Okay, well it's slightly more complicated. My C++ application needs to actually accept the technically invalid code points U+ and U+FFFE. Otherwise, I need my server application to know when invalid UTF-8 has happened. That's all fine. I have that all implemented. That's good. The problem is I want to exercise that behavior from my Python systest framework. The problem is the Python libs are trying to be too helpful. While I normally want them to do UTF-8 validation, I *don't* want them to during the systests, because I want to send bad UTF-8 to the server. Make sense? I'm trying to do bad things to make sure stuff still works in a systest environment. -JT On Mon, May 17, 2010 at 4:51 PM, Jason Hsueh jas...@google.com wrote: If you compile with the macro GOOGLE_PROTOBUF_UTF8_VALIDATION_ENABLED defined, the C++ code will do UTF8 validation. However, it doesn't prevent the data from serializing or parsing, it will simply log an error message. How would you like it to fail? On Mon, May 17, 2010 at 3:15 PM, JT Olds jto...@xnet5.com wrote: Hello, (I submitted this already via the protobuf google group web form, but I think I screwed up. If not, sorry for the double post) I have a C++-based server using protocol buffers as the IDL, and I'm trying to ensure that it rejects invalid UTF-8 strings. My systest library is written in Python. The C++ protocol buffer library does not seem to do any UTF-8 string checking on string types, whereas the Python library does. So I added some UTF-8 validation testing to the C++ server-side and I want to check that it works (in case a C++ client sends invalid UTF-8). Whenever I inject invalid UTF-8 into the Python systests to make sure the server rejects the string, the Python library complains. Is there a way to override this behavior? I don't want to change my protocol buffer definitions to be the bytes type, because these really should be strings, and the Python library is doing exactly what I want for the general case. -JT -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements
This does somewhat suggestive that it might be worthwhile specifically tagging a field as ASCII only. There are enough cases of this that it could be a huge win. On 5/17/10, Evan Jones ev...@mit.edu wrote: On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. Yes. I've been playing with this a bit in my spare time since the last email, but I don't have any results I'm happy with yet. Rough notes: * Encoding is (quite a bit?) faster than String.getBytes() if you assume one byte per character. * If you guess the number bytes per character poorly and have to do multiple allocations and copies, the regular Java version will win. If you get it right (even if you first guess 1 byte per character) it looks like it can be slightly faster or on par with the Java version. * Re-using a temporary byte[] for string encoding may be faster than String.getBytes(), which effectively allocates a temporary byte[] each time. I'm going to try to rework my code with a slightly different policy: a) Assume 1 byte per character and attempt the encode. If we run out of space: b) Use a shared temporary buffer and continue the encode. If we run out of space: c) Allocate a worst case 4 byte per character buffer and finish the encode. This should be much better than the JDK version for ASCII, a bit better for short strings that fit in the shared temporary buffer, and not significantly worse for the rest, but I'll need to test it to be sure. This is sort of just a fun experiment for me at this point, so who knows when I may get around to actually finishing this. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- Sent from my mobile device Chris -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements
What if you did a fast scan of the bytes first to see if any are non-ASCII? Maybe only do this fast scan if the data is short enough to fit in L1 cache? On Mon, May 17, 2010 at 7:59 PM, Christopher Smith cbsm...@gmail.comwrote: This does somewhat suggestive that it might be worthwhile specifically tagging a field as ASCII only. There are enough cases of this that it could be a huge win. On 5/17/10, Evan Jones ev...@mit.edu wrote: On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. Yes. I've been playing with this a bit in my spare time since the last email, but I don't have any results I'm happy with yet. Rough notes: * Encoding is (quite a bit?) faster than String.getBytes() if you assume one byte per character. * If you guess the number bytes per character poorly and have to do multiple allocations and copies, the regular Java version will win. If you get it right (even if you first guess 1 byte per character) it looks like it can be slightly faster or on par with the Java version. * Re-using a temporary byte[] for string encoding may be faster than String.getBytes(), which effectively allocates a temporary byte[] each time. I'm going to try to rework my code with a slightly different policy: a) Assume 1 byte per character and attempt the encode. If we run out of space: b) Use a shared temporary buffer and continue the encode. If we run out of space: c) Allocate a worst case 4 byte per character buffer and finish the encode. This should be much better than the JDK version for ASCII, a bit better for short strings that fit in the shared temporary buffer, and not significantly worse for the rest, but I'll need to test it to be sure. This is sort of just a fun experiment for me at this point, so who knows when I may get around to actually finishing this. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- Sent from my mobile device Chris -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.