[jira] [Resolved] (BEAM-1010) Custom FileSinks should respect AUTO compression

2016-11-21 Thread Konstantinos Katsiapis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Katsiapis resolved BEAM-1010.
--
   Resolution: Fixed
Fix Version/s: 0.4.0-incubating

This was fixed by the Merge of 
https://github.com/apache/incubator-beam/pull/1392 as 
https://github.com/apache/incubator-beam/commit/8e88c7b035e76c6e15d03a79f9751c6e53786859

> Custom FileSinks should respect AUTO compression
> 
>
> Key: BEAM-1010
> URL: https://issues.apache.org/jira/browse/BEAM-1010
> Project: Beam
>  Issue Type: Bug
>Reporter: Konstantinos Katsiapis
>Assignee: Konstantinos Katsiapis
> Fix For: 0.4.0-incubating
>
>
> Currently AUTO compression is respected by Native FileSinks but not Custom 
> FileSinks.
> Also, it would be good for Beam's _CompresedFile to support usage with Python 
> "with" clause (ie implement __enter__ and __exit__).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1016) Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) for compressed files

2016-11-19 Thread Konstantinos Katsiapis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Katsiapis updated BEAM-1016:
-
Priority: Minor  (was: Major)

> Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) 
> for compressed files
> --
>
> Key: BEAM-1016
> URL: https://issues.apache.org/jira/browse/BEAM-1016
> Project: Beam
>  Issue Type: New Feature
>Reporter: Konstantinos Katsiapis
>Priority: Minor
>
> Currently the Content-Type for compressed files overrides the original 
> Content-Type.
> So
> Content-Type: text/plain
> becomes
> Content-Type: application/gzip
> We should instead consider keeping
> Content-Type: text/plain
> and adding
> Content-Encoding: gzip
> This will paly nice with Cloud Storage automatic transcoding:
> https://cloud.google.com/storage/docs/transcoding



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1016) Use Content-Type and Content-Encoding (as opposed to overriding Content-Type) for compressed files

2016-11-19 Thread Konstantinos Katsiapis (JIRA)
Konstantinos Katsiapis created BEAM-1016:


 Summary: Use Content-Type and Content-Encoding (as opposed to 
overriding Content-Type) for compressed files
 Key: BEAM-1016
 URL: https://issues.apache.org/jira/browse/BEAM-1016
 Project: Beam
  Issue Type: New Feature
Reporter: Konstantinos Katsiapis


Currently the Content-Type for compressed files overrides the original 
Content-Type.


So
Content-Type: text/plain

becomes
Content-Type: application/gzip


We should instead consider keeping

Content-Type: text/plain
and adding
Content-Encoding: gzip

This will paly nice with Cloud Storage automatic transcoding:
https://cloud.google.com/storage/docs/transcoding



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (BEAM-1010) Custom FileSinks should respect AUTO compression

2016-11-18 Thread Konstantinos Katsiapis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Katsiapis updated BEAM-1010:
-
Description: 
Currently AUTO compression is respected by Native FileSinks but not Custom 
FileSinks.

Also, it would be good for Beam's _CompresedFile to support usage with Python 
"with" clause (ie implement __enter__ and __exit__).

> Custom FileSinks should respect AUTO compression
> 
>
> Key: BEAM-1010
> URL: https://issues.apache.org/jira/browse/BEAM-1010
> Project: Beam
>  Issue Type: Bug
>Reporter: Konstantinos Katsiapis
>Assignee: Konstantinos Katsiapis
>
> Currently AUTO compression is respected by Native FileSinks but not Custom 
> FileSinks.
> Also, it would be good for Beam's _CompresedFile to support usage with Python 
> "with" clause (ie implement __enter__ and __exit__).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-1010) Custom FileSinks should respect AUTO compression

2016-11-18 Thread Konstantinos Katsiapis (JIRA)
Konstantinos Katsiapis created BEAM-1010:


 Summary: Custom FileSinks should respect AUTO compression
 Key: BEAM-1010
 URL: https://issues.apache.org/jira/browse/BEAM-1010
 Project: Beam
  Issue Type: Bug
Reporter: Konstantinos Katsiapis
Assignee: Konstantinos Katsiapis






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-17 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582770#comment-15582770
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

Support is now in for both zlib and snappy (for both sources and sinks). 
Closing.

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-17 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582808#comment-15582808
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

[~chamikara] I am unable to close this (can't figure out how, I seem to only 
have "assign" powers). Is it because I am not the original reporter? If so, can 
you close or mark-as-fixed?

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-13 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573243#comment-15573243
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

Thanks Frances,

PR https://github.com/apache/incubator-beam/pull/1007 has been merged into beam 
and my https://github.com/apache/incubator-beam/pull/1053 is now also ready for 
a merge.

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (BEAM-625) Make Dataflow Python Materialized PCollection representation more efficient

2016-09-29 Thread Konstantinos Katsiapis (JIRA)

 [ 
https://issues.apache.org/jira/browse/BEAM-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantinos Katsiapis resolved BEAM-625.
-

This is fixed as of Google Cloud Dataflow 0.4.2

https://cloud.google.com/dataflow/release-notes/release-notes-python#042

> Make Dataflow Python Materialized PCollection representation more efficient
> ---
>
> Key: BEAM-625
> URL: https://issues.apache.org/jira/browse/BEAM-625
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Konstantinos Katsiapis
>Assignee: Frances Perry
> Fix For: 0.3.0-incubating
>
>
> This will be a several step process which will involve adding better support 
> for compression as well as Avro.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-629) Cleanup SDK regardign lint warnings (and lint warning suppression).

2016-09-13 Thread Konstantinos Katsiapis (JIRA)
Konstantinos Katsiapis created BEAM-629:
---

 Summary: Cleanup SDK regardign lint warnings (and lint warning 
suppression).
 Key: BEAM-629
 URL: https://issues.apache.org/jira/browse/BEAM-629
 Project: Beam
  Issue Type: Improvement
Reporter: Konstantinos Katsiapis
Priority: Minor


As discussed in https://github.com/apache/incubator-beam/pull/946



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-09-12 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484712#comment-15484712
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

According to the Avro specification, the required codecs are 'null' and 
'deflate', and the optional codecs are 'snappy'.
See: https://avro.apache.org/docs/1.8.1/spec.html

Python _AvroSource already supports 'null' and 'deflate'.
The following PR adds support for 'snappy': 
https://github.com/apache/incubator-beam/pull/946

[~altay], [~chamikara] You also mention that bzip2 should be supported (similar 
to how it's done for Dataflow Java?), but that doesn't seem to be part of the 
specification (mentioned above).

Should we limit the scope of this bug to just adding 'snappy', or is there 
precedence for supporting 'bzip2'?
Any pointers to the Java code that supports 'bzip2' so that we can get more 
background there?

Thanks,
Gus

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-625) Make Dataflow Python Materialized PCollection representation more efficient

2016-09-09 Thread Konstantinos Katsiapis (JIRA)
Konstantinos Katsiapis created BEAM-625:
---

 Summary: Make Dataflow Python Materialized PCollection 
representation more efficient
 Key: BEAM-625
 URL: https://issues.apache.org/jira/browse/BEAM-625
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py
Reporter: Konstantinos Katsiapis
Assignee: Frances Perry


This will be a several step process which will involve adding better support 
for compression as well as Avro.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)