[ 
https://issues.apache.org/jira/browse/BEAM-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marian Dvorsky updated BEAM-3874:
---------------------------------
    Description: 
AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for 
writes.

That compresses well, but is quite expensive.

Snappy codec offers sparser, but much faster compression, and is typically a 
better CPU/storage tradeoff except for very long lived files. 

We should consider switching the default to Snappy.

  was:
AvroIO currently uses 
[CodecFactory|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%2523b8636ed8a0357a3a3806fb8ad152a1e38d3b4fa39a6a66d189c040aee9687823&gsn=CodecFactory&ct=xref_usages].[deflateCodec|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%25239fc62def2276bb77cc0f71b21660540e246046da139bfed9b0f33c7f8dbb4550&gsn=deflateCodec&ct=xref_usages](6)
 as the default codec for writes.

That compresses well, but is quite expensive.

Snappy codec offers sparser, but much faster compression, and is typically a 
better CPU/storage tradeoff except for very long lived files. 

We should consider switching the default to Snappy.


> Switch AvroIO sink default codec to Snappy
> ------------------------------------------
>
>                 Key: BEAM-3874
>                 URL: https://issues.apache.org/jira/browse/BEAM-3874
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-avro
>            Reporter: Marian Dvorsky
>            Assignee: Eugene Kirpichov
>            Priority: Minor
>
> AvroIO currently uses CodecFactory.deflateCodec(6) as the default codec for 
> writes.
> That compresses well, but is quite expensive.
> Snappy codec offers sparser, but much faster compression, and is typically a 
> better CPU/storage tradeoff except for very long lived files. 
> We should consider switching the default to Snappy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to