Status: New
Labels: Type-Defect Priority-Medium

New issue 198 by Unnecessarily inefficient calculation of utf-8 encoded lengt

Version: 2.3.0


  public static int computeStringSizeNoTag(final String value) {
    try {
      final byte[] bytes = value.getBytes("UTF-8");
      return computeRawVarint32Size(bytes.length) +

In order to compute the length of the corresponding utf-8 encoding, you don't have to encode the string and create an array. The following is enough:

  public static int utf8len(String str) {
      int len = str.length();
      int utf8len = len;
      for (int i = 0; i < len; i++) {
          int c = str.charAt(i) & 0xFFFF;
          if (c < 0x80) continue;

          int extra = 0;
          if (c < 0x800)
              extra = 1;
          else if (c < 0x010000)
              extra = 2;
              extra = 3;
          utf8len += extra;
      return utf8len;

In the most common case of the string being ascii, it amounts to a scan of the string.

You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to