Tie Liu created AVRO-1411: ----------------------------- Summary: org.apache.avro.util.Utf8 performance improvement by remove private Charset in class Key: AVRO-1411 URL: https://issues.apache.org/jira/browse/AVRO-1411 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.7.5 Reporter: Tie Liu Priority: Minor
Inside org.apache.avro.util.Utf8 class, it has a private member field defined as: private static final Charset UTF8 = Charset.forName("UTF-8"); and it's used as: public static final byte[] getBytesFor(String str) { return str.getBytes(UTF8); } I guess the intention of create this object is to save object creation, but when we dive into the string.getBytes code, when it's called with Charset, it actually create a new StringEncoder in java.lang.StringCoding: static byte[] encode(Charset cs, char[] ca, int off, int len) { StringEncoder se = new StringEncoder(cs, cs.name()); char[] c = Arrays.copyOf(ca, ca.length); return se.encode(c, off, len); } If instead we just call it with string literal "UTF-8", it will just reuse the threadlocal StringEncoder. We tried overwrite this class with passing string literal and proved those short lived StringEncoder objects is not created any more. Would like apache to fix this so we don't need to overwrite it anymore. -- This message was sent by Atlassian JIRA (v6.1.4#6159)