Status: New
Owner: ken...@google.com
Labels: Type-Defect Priority-Medium

New issue 198 by sri...@malhar.net: Unnecessarily inefficient calculation of utf-8 encoded lengt
http://code.google.com/p/protobuf/issues/detail?id=198

Version: 2.3.0

File: CodedOutputStream.java

  public static int computeStringSizeNoTag(final String value) {
    try {
      final byte[] bytes = value.getBytes("UTF-8");
      return computeRawVarint32Size(bytes.length) +
             bytes.length;
    ...

In order to compute the length of the corresponding utf-8 encoding, you don't have to encode the string and create an array. The following is enough:

  public static int utf8len(String str) {
      int len = str.length();
      int utf8len = len;
      for (int i = 0; i < len; i++) {
          int c = str.charAt(i) & 0xFFFF;
          if (c < 0x80) continue;

          int extra = 0;
          if (c < 0x800)
              extra = 1;
          else if (c < 0x010000)
              extra = 2;
          else
              extra = 3;
          utf8len += extra;
      }
      return utf8len;
  }

In the most common case of the string being ascii, it amounts to a scan of the string.



--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to