BELUGA BEHR created AVRO-2049:
---------------------------------

             Summary: Remove Superfluous Configuration From AvroSerializer
                 Key: AVRO-2049
                 URL: https://issues.apache.org/jira/browse/AVRO-2049
             Project: Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.8.2, 1.7.7
            Reporter: BELUGA BEHR
            Priority: Trivial


In the class {{org.apache.avro.hadoop.io.AvroSerializer}}, we see that the Avro 
block size is configured with a hard-coded value and there is a request to 
benchmark different buffer sizes.

{code:title=org.apache.avro.hadoop.io.AvroSerializer}
  /**
   * The block size for the Avro encoder.
   *
   * This number was copied from the AvroSerialization of 
org.apache.avro.mapred in Avro 1.5.1.
   *
   * TODO(gwu): Do some benchmarking with different numbers here to see if it 
is important.
   */
  private static final int AVRO_ENCODER_BLOCK_SIZE_BYTES = 512;

  /** An factory for creating Avro datum encoders. */
  private static EncoderFactory mEncoderFactory
      = new EncoderFactory().configureBlockSize(AVRO_ENCODER_BLOCK_SIZE_BYTES);
{code}

However, there is no need to benchmark, this setting is superfluous and is 
ignored with the current implementation.

{code:title=org.apache.avro.hadoop.io.AvroSerializer}
  @Override
  public void open(OutputStream outputStream) throws IOException {
    mOutputStream = outputStream;
    mAvroEncoder = mEncoderFactory.binaryEncoder(outputStream, mAvroEncoder);
  }
{code}

{{org.apache.avro.io.EncoderFactory.binaryEncoder}} ignores this setting.  This 
setting is only relevant for calls to 
{{org.apache.avro.io.EncoderFactory.blockingBinaryEncoder}} 
 which considers the configured "Block Size" for doing binary encoding of 
blocked Array types as laid out in the 
[specs|https://avro.apache.org/docs/1.8.2/spec.html#binary_encode_complex].  It 
can simply be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to