BELUGA BEHR created AVRO-2049: --------------------------------- Summary: Remove Superfluous Configuration From AvroSerializer Key: AVRO-2049 URL: https://issues.apache.org/jira/browse/AVRO-2049 Project: Avro Issue Type: Improvement Components: java Affects Versions: 1.8.2, 1.7.7 Reporter: BELUGA BEHR Priority: Trivial
In the class {{org.apache.avro.hadoop.io.AvroSerializer}}, we see that the Avro block size is configured with a hard-coded value and there is a request to benchmark different buffer sizes. {code:title=org.apache.avro.hadoop.io.AvroSerializer} /** * The block size for the Avro encoder. * * This number was copied from the AvroSerialization of org.apache.avro.mapred in Avro 1.5.1. * * TODO(gwu): Do some benchmarking with different numbers here to see if it is important. */ private static final int AVRO_ENCODER_BLOCK_SIZE_BYTES = 512; /** An factory for creating Avro datum encoders. */ private static EncoderFactory mEncoderFactory = new EncoderFactory().configureBlockSize(AVRO_ENCODER_BLOCK_SIZE_BYTES); {code} However, there is no need to benchmark, this setting is superfluous and is ignored with the current implementation. {code:title=org.apache.avro.hadoop.io.AvroSerializer} @Override public void open(OutputStream outputStream) throws IOException { mOutputStream = outputStream; mAvroEncoder = mEncoderFactory.binaryEncoder(outputStream, mAvroEncoder); } {code} {{org.apache.avro.io.EncoderFactory.binaryEncoder}} ignores this setting. This setting is only relevant for calls to {{org.apache.avro.io.EncoderFactory.blockingBinaryEncoder}} which considers the configured "Block Size" for doing binary encoding of blocked Array types as laid out in the [specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_complex]. It can simply be removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)