[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-17 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582770#comment-15582770
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

Support is now in for both zlib and snappy (for both sources and sinks). 
Closing.

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-17 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582808#comment-15582808
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

[~chamikara] I am unable to close this (can't figure out how, I seem to only 
have "assign" powers). Is it because I am not the original reporter? If so, can 
you close or mark-as-fixed?

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582765#comment-15582765
 ] 

ASF GitHub Bot commented on BEAM-570:
-

Github user katsiapis closed the pull request at:

https://github.com/apache/incubator-beam/pull/1053


> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-13 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573243#comment-15573243
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

Thanks Frances,

PR https://github.com/apache/incubator-beam/pull/1007 has been merged into beam 
and my https://github.com/apache/incubator-beam/pull/1053 is now also ready for 
a merge.

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-06 Thread Frances Perry (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553995#comment-15553995
 ] 

Frances Perry commented on BEAM-570:


Assigning to Konstantinos to follow up after #1053  is in.

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Konstantinos Katsiapis
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-10-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15547423#comment-15547423
 ] 

ASF GitHub Bot commented on BEAM-570:
-

GitHub user katsiapis opened a pull request:

https://github.com/apache/incubator-beam/pull/1053

[BEAM-570] Title of the pull request

- Getting rid of CompressionTypes.ZLIB and CompressionTypes.NO_COMPRESSION.
- Introducing BZIP2 compression in analogy to Dataflow Java's BZIP2, 
towards resolution of https://issues.apache.org/jira/browse/BEAM-570.
- Introducing SNAPPY codec support for AVRO conciseness and in order to 
fully resolve https://issues.apache.org/jira/browse/BEAM-570.
- Moving avroio from compression_type to codec as per various discussions.
- A few cleanups in avroio.
- Making textio more DRY and doing a few cleanups.
- Raising exceptions when splitting is requested for compressed source 
since that should never happen (guaranteed by the service for the supported 
compression types).
- Using cStringIO instead of StringIO in various places as decided in some 
other discussions.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/katsiapis/incubator-beam bz2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/1053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1053


commit bd44e76b80e4edf4f922b9a26f7b359c4ede2008
Author: Gus Katsiapis 
Date:   2016-10-05T02:41:07Z

Several enhancements to Dataflow (part 2 of 2).




> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-09-12 Thread Chamikara Jayalath (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484736#comment-15484736
 ] 

Chamikara Jayalath commented on BEAM-570:
-

Java seems to be supporting bzip2 [1] but seems like Python won't be able to 
add that since the Avro implementation doesn't support that [2]. So we should 
just add support for snappy and close this JIRA issue.

[1] 
https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java
[2] https://github.com/cavorite/python-avro/blob/master/src/avro/datafile.py#L46

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-570) Update AvroSource to support more compression types

2016-09-12 Thread Konstantinos Katsiapis (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484712#comment-15484712
 ] 

Konstantinos Katsiapis commented on BEAM-570:
-

According to the Avro specification, the required codecs are 'null' and 
'deflate', and the optional codecs are 'snappy'.
See: https://avro.apache.org/docs/1.8.1/spec.html

Python _AvroSource already supports 'null' and 'deflate'.
The following PR adds support for 'snappy': 
https://github.com/apache/incubator-beam/pull/946

[~altay], [~chamikara] You also mention that bzip2 should be supported (similar 
to how it's done for Dataflow Java?), but that doesn't seem to be part of the 
specification (mentioned above).

Should we limit the scope of this bug to just adding 'snappy', or is there 
precedence for supporting 'bzip2'?
Any pointers to the Java code that supports 'bzip2' so that we can get more 
background there?

Thanks,
Gus

> Update AvroSource to support more compression types
> ---
>
> Key: BEAM-570
> URL: https://issues.apache.org/jira/browse/BEAM-570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> Python AvroSource [1] currently only support 'deflate' compression. We should 
> update it to support other compression types supported by the Avro library 
> (e.g.: snappy, bzip2).
> [1] 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/avroio.py



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)