Status: New
Owner: ken...@google.com
Labels: Type-Defect Priority-Medium
New issue 198 by sri...@malhar.net: Unnecessarily inefficient calculation
of utf-8 encoded lengt
http://code.google.com/p/protobuf/issues/detail?id=198
Version: 2.3.0
File: CodedOutputStream.java
public static int computeStringSizeNoTag(final String value) {
try {
final byte[] bytes = value.getBytes("UTF-8");
return computeRawVarint32Size(bytes.length) +
bytes.length;
...
In order to compute the length of the corresponding utf-8 encoding, you
don't have to encode the string and create an array. The following is
enough:
public static int utf8len(String str) {
int len = str.length();
int utf8len = len;
for (int i = 0; i < len; i++) {
int c = str.charAt(i) & 0xFFFF;
if (c < 0x80) continue;
int extra = 0;
if (c < 0x800)
extra = 1;
else if (c < 0x010000)
extra = 2;
else
extra = 3;
utf8len += extra;
}
return utf8len;
}
In the most common case of the string being ascii, it amounts to a scan of
the string.
--
You received this message because you are subscribed to the Google Groups "Protocol
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.