[jira] [Resolved] (BEAM-1010) Custom FileSinks should respect AUTO compression
[ https://issues.apache.org/jira/browse/BEAM-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Katsiapis resolved BEAM-1010. -- Resolution: Fixed Fix Version/s: 0.4.0-incubating This was fixed by the Merge of https://github.com/apache/incubator-beam/pull/1392 as https://github.com/apache/incubator-beam/commit/8e88c7b035e76c6e15d03a79f9751c6e53786859 > Custom FileSinks should respect AUTO compression > > > Key: BEAM-1010 > URL: https://issues.apache.org/jira/browse/BEAM-1010 > Project: Beam > Issue Type: Bug >Reporter: Konstantinos Katsiapis >Assignee: Konstantinos Katsiapis > Fix For: 0.4.0-incubating > > > Currently AUTO compression is respected by Native FileSinks but not Custom > FileSinks. > Also, it would be good for Beam's _CompresedFile to support usage with Python > "with" clause (ie implement __enter__ and __exit__). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1016) Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) for compressed files
[ https://issues.apache.org/jira/browse/BEAM-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Katsiapis updated BEAM-1016: - Priority: Minor (was: Major) > Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) > for compressed files > -- > > Key: BEAM-1016 > URL: https://issues.apache.org/jira/browse/BEAM-1016 > Project: Beam > Issue Type: New Feature >Reporter: Konstantinos Katsiapis >Priority: Minor > > Currently the Content-Type for compressed files overrides the original > Content-Type. > So > Content-Type: text/plain > becomes > Content-Type: application/gzip > We should instead consider keeping > Content-Type: text/plain > and adding > Content-Encoding: gzip > This will paly nice with Cloud Storage automatic transcoding: > https://cloud.google.com/storage/docs/transcoding -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1016) Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) for compressed files
Konstantinos Katsiapis created BEAM-1016: Summary: Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) for compressed files Key: BEAM-1016 URL: https://issues.apache.org/jira/browse/BEAM-1016 Project: Beam Issue Type: New Feature Reporter: Konstantinos Katsiapis Currently the Content-Type for compressed files overrides the original Content-Type. So Content-Type: text/plain becomes Content-Type: application/gzip We should instead consider keeping Content-Type: text/plain and adding Content-Encoding: gzip This will paly nice with Cloud Storage automatic transcoding: https://cloud.google.com/storage/docs/transcoding -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1010) Custom FileSinks should respect AUTO compression
[ https://issues.apache.org/jira/browse/BEAM-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Katsiapis updated BEAM-1010: - Description: Currently AUTO compression is respected by Native FileSinks but not Custom FileSinks. Also, it would be good for Beam's _CompresedFile to support usage with Python "with" clause (ie implement __enter__ and __exit__). > Custom FileSinks should respect AUTO compression > > > Key: BEAM-1010 > URL: https://issues.apache.org/jira/browse/BEAM-1010 > Project: Beam > Issue Type: Bug >Reporter: Konstantinos Katsiapis >Assignee: Konstantinos Katsiapis > > Currently AUTO compression is respected by Native FileSinks but not Custom > FileSinks. > Also, it would be good for Beam's _CompresedFile to support usage with Python > "with" clause (ie implement __enter__ and __exit__). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1010) Custom FileSinks should respect AUTO compression
Konstantinos Katsiapis created BEAM-1010: Summary: Custom FileSinks should respect AUTO compression Key: BEAM-1010 URL: https://issues.apache.org/jira/browse/BEAM-1010 Project: Beam Issue Type: Bug Reporter: Konstantinos Katsiapis Assignee: Konstantinos Katsiapis -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582770#comment-15582770 ] Konstantinos Katsiapis commented on BEAM-570: - Support is now in for both zlib and snappy (for both sources and sinks). Closing. > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Konstantinos Katsiapis > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582808#comment-15582808 ] Konstantinos Katsiapis commented on BEAM-570: - [~chamikara] I am unable to close this (can't figure out how, I seem to only have "assign" powers). Is it because I am not the original reporter? If so, can you close or mark-as-fixed? > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Konstantinos Katsiapis > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573243#comment-15573243 ] Konstantinos Katsiapis commented on BEAM-570: - Thanks Frances, PR https://github.com/apache/incubator-beam/pull/1007 has been merged into beam and my https://github.com/apache/incubator-beam/pull/1053 is now also ready for a merge. > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Konstantinos Katsiapis > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-625) Make Dataflow Python Materialized PCollection representation more efficient
[ https://issues.apache.org/jira/browse/BEAM-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantinos Katsiapis resolved BEAM-625. - This is fixed as of Google Cloud Dataflow 0.4.2 https://cloud.google.com/dataflow/release-notes/release-notes-python#042 > Make Dataflow Python Materialized PCollection representation more efficient > --- > > Key: BEAM-625 > URL: https://issues.apache.org/jira/browse/BEAM-625 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Konstantinos Katsiapis >Assignee: Frances Perry > Fix For: 0.3.0-incubating > > > This will be a several step process which will involve adding better support > for compression as well as Avro. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-629) Cleanup SDK regardign lint warnings (and lint warning suppression).
Konstantinos Katsiapis created BEAM-629: --- Summary: Cleanup SDK regardign lint warnings (and lint warning suppression). Key: BEAM-629 URL: https://issues.apache.org/jira/browse/BEAM-629 Project: Beam Issue Type: Improvement Reporter: Konstantinos Katsiapis Priority: Minor As discussed in https://github.com/apache/incubator-beam/pull/946 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types
[ https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484712#comment-15484712 ] Konstantinos Katsiapis commented on BEAM-570: - According to the Avro specification, the required codecs are 'null' and 'deflate', and the optional codecs are 'snappy'. See: https://avro.apache.org/docs/1.8.1/spec.html Python _AvroSource already supports 'null' and 'deflate'. The following PR adds support for 'snappy': https://github.com/apache/incubator-beam/pull/946 [~altay], [~chamikara] You also mention that bzip2 should be supported (similar to how it's done for Dataflow Java?), but that doesn't seem to be part of the specification (mentioned above). Should we limit the scope of this bug to just adding 'snappy', or is there precedence for supporting 'bzip2'? Any pointers to the Java code that supports 'bzip2' so that we can get more background there? Thanks, Gus > Update AvroSource to support more compression types > --- > > Key: BEAM-570 > URL: https://issues.apache.org/jira/browse/BEAM-570 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > Python AvroSource [1] currently only support 'deflate' compression. We should > update it to support other compression types supported by the Avro library > (e.g.: snappy, bzip2). > [1] > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-625) Make Dataflow Python Materialized PCollection representation more efficient
Konstantinos Katsiapis created BEAM-625: --- Summary: Make Dataflow Python Materialized PCollection representation more efficient Key: BEAM-625 URL: https://issues.apache.org/jira/browse/BEAM-625 Project: Beam Issue Type: Improvement Components: sdk-py Reporter: Konstantinos Katsiapis Assignee: Frances Perry This will be a several step process which will involve adding better support for compression as well as Avro. -- This message was sent by Atlassian JIRA (v6.3.4#6332)