Tie Liu created AVRO-1411:
-----------------------------

             Summary: org.apache.avro.util.Utf8 performance improvement by 
remove private Charset in class
                 Key: AVRO-1411
                 URL: https://issues.apache.org/jira/browse/AVRO-1411
             Project: Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.7.5
            Reporter: Tie Liu
            Priority: Minor


Inside org.apache.avro.util.Utf8 class, it has a private member field defined 
as: private static final Charset UTF8 = Charset.forName("UTF-8");

and it's used as:
  public static final byte[] getBytesFor(String str) {
    return str.getBytes(UTF8);
  }

I guess the intention of create this object is to save object creation, but 
when we dive into the string.getBytes code, when it's called with Charset, it 
actually create a new StringEncoder in java.lang.StringCoding:
    static byte[] encode(Charset cs, char[] ca, int off, int len) {
        StringEncoder se = new StringEncoder(cs, cs.name());
        char[] c = Arrays.copyOf(ca, ca.length);
        return se.encode(c, off, len);
    }

If instead we just call it with string literal "UTF-8", it will just reuse the 
threadlocal StringEncoder. 

We tried overwrite this class with passing string literal and proved those 
short lived StringEncoder objects is not created any more. Would like apache to 
fix this so we don't need to overwrite it anymore. 



 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to