[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385300#comment-15385300
 ] 

ASF GitHub Bot commented on BEAM-360:
-

Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/672


> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382011#comment-15382011
 ] 

ASF GitHub Bot commented on BEAM-360:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/672

[BEAM-360] Adds a PTransform for Avro source and updates snippets.

Wrapping a custom source as a 'PTransform' is better than directly using 
the source using 'df.Read' since the 'PTransform' can be extended without 
breaking end-user code.

Updates the documentation of avroio module.

Adds 'PTransform' wrappers to custom sources and sinks in 'snippets.py'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam 
snippets_source_sink_ptransforms

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/672.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #672


commit 0f206bd4581a341cec0c29348aa68d552b7f5a00
Author: Charles Chen 
Date:   2016-07-15T21:37:43Z

Adds a PTransform for Avro source.

Wrapping a custom source as a PTransform is better than directly using the 
source using df.Read since the PTransform can be extended without breaking 
end-user code.

Updates the documentation of avroio module.

Adds PTransform wrappers to custom sources and sinks in snippets.py.




> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379759#comment-15379759
 ] 

ASF GitHub Bot commented on BEAM-360:
-

Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/599


> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368706#comment-15368706
 ] 

ASF GitHub Bot commented on BEAM-360:
-

GitHub user chamikaramj reopened a pull request:

https://github.com/apache/incubator-beam/pull/599

[BEAM-360] Some updates related to dynamic work rebalancing of custom 
sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing results of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #599


commit 19a41ccf5bcf00192e3646258eae0cbce85da23b
Author: Chamikara Jayalath 
Date:   2016-07-07T03:25:04Z

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 4415989ef0dfd656643e6e8575b6e2090b4437b5
Author: Chamikara Jayalath 
Date:   2016-07-07T03:34:21Z

Adds more comments.

commit 6aa697465e88f827a3121a1de8bad1b810d904da
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:20Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 1e01b1f5cd70e5b39cd064577110898c623e524a
Author: Chamikara Jayalath 
Date:   2016-07-08T19:01:42Z

Reverting some updates.

commit 171df1ecedd51c7c72db309d526dfa9badf1
Author: Chamikara Jayalath 
Date:   2016-07-08T22:34:52Z

Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to 
determine the size of a single file.

Updates 'filebasedsource' to use this method when determining size of files.




> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368612#comment-15368612
 ] 

ASF GitHub Bot commented on BEAM-360:
-

Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/599


> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365604#comment-15365604
 ] 

ASF GitHub Bot commented on BEAM-360:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/599

[BEAM-360] Some updates related to dynamic work rebalancing of custom 
sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing results of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/599.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #599


commit e51d4acf12133a79671c567c9ff709c941c54f8c
Author: Chamikara Jayalath 
Date:   2016-06-21T01:09:50Z

Implements a framework for developing sources for new file types.

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework readily implements several features common to many 
sources based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.

commit cacb613448b47592f8415570f7b64bc6de797f91
Author: Chamikara Jayalath 
Date:   2016-07-07T03:25:04Z

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a
Author: Chamikara Jayalath 
Date:   2016-07-07T03:34:21Z

Adds more comments.

commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:06Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.

commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c
Author: Chamikara Jayalath 
Date:   2016-07-07T04:41:20Z

Some updates related to dynamic work rebalancing custom sources.

Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work 
rebalancing result of custom sources.

Updates Dataflow runner specific code (apiclient.py) to support dynamic 
work rebalancing custom sources.

Updates 'OffsetRangeTracker' so that the result of 
'position_at_fraction()'' is a 'long' instead of a 'float'.




> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343782#comment-15343782
 ] 

ASF GitHub Bot commented on BEAM-360:
-

Github user chamikaramj closed the pull request at:

https://github.com/apache/incubator-beam/pull/507


> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340898#comment-15340898
 ] 

ASF GitHub Bot commented on BEAM-360:
-

GitHub user chamikaramj opened a pull request:

https://github.com/apache/incubator-beam/pull/507

[BEAM-360] Implements a framework for developing Python SDK sources for new 
file types

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework implements several features common to many sources 
based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chamikaramj/incubator-beam filebasedsource

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-beam/pull/507.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #507


commit e51d4acf12133a79671c567c9ff709c941c54f8c
Author: Chamikara Jayalath 
Date:   2016-06-21T01:09:50Z

Implements a framework for developing sources for new file types.

Module 'filebasedsource' provides a framework for  creating sources for new 
file types. This framework readily implements several features common to many 
sources based on files.

Additionally, module 'avroio' contains a new source, 'AvroSource', that is 
implemented using the framework described above. 'AvroSource' is a source for 
reading Avro files.

Adds many unit tests for 'filebasedsource' and 'avroio' modules.




> Add a framework for creating Python-SDK sources for new file types
> --
>
> Key: BEAM-360
> URL: https://issues.apache.org/jira/browse/BEAM-360
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py
>Reporter: Chamikara Jayalath
>Assignee: Chamikara Jayalath
>
> We already have a framework for creating new sources for Beam Python SDK - 
> https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326
> It would be great if we can add a framework on top of this that encapsulates 
> logic common to sources that are based on files. This framework can include 
> following features that are common to sources based on files.
> (1) glob expansion
> (2) support for new file-systems
> (3) dynamic work rebalancing based on byte offsets
> (4) support for reading compressed files.
> Java SDK has a similar framework and it's available at - 
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)