[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385300#comment-15385300 ] ASF GitHub Bot commented on BEAM-360: - Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/672 > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382011#comment-15382011 ] ASF GitHub Bot commented on BEAM-360: - GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/672 [BEAM-360] Adds a PTransform for Avro source and updates snippets. Wrapping a custom source as a 'PTransform' is better than directly using the source using 'df.Read' since the 'PTransform' can be extended without breaking end-user code. Updates the documentation of avroio module. Adds 'PTransform' wrappers to custom sources and sinks in 'snippets.py'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam snippets_source_sink_ptransforms Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #672 commit 0f206bd4581a341cec0c29348aa68d552b7f5a00 Author: Charles Chen Date: 2016-07-15T21:37:43Z Adds a PTransform for Avro source. Wrapping a custom source as a PTransform is better than directly using the source using df.Read since the PTransform can be extended without breaking end-user code. Updates the documentation of avroio module. Adds PTransform wrappers to custom sources and sinks in snippets.py. > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379759#comment-15379759 ] ASF GitHub Bot commented on BEAM-360: - Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/599 > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368706#comment-15368706 ] ASF GitHub Bot commented on BEAM-360: - GitHub user chamikaramj reopened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 commit 19a41ccf5bcf00192e3646258eae0cbce85da23b Author: Chamikara Jayalath Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 4415989ef0dfd656643e6e8575b6e2090b4437b5 Author: Chamikara Jayalath Date: 2016-07-07T03:34:21Z Adds more comments. commit 6aa697465e88f827a3121a1de8bad1b810d904da Author: Chamikara Jayalath Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 1e01b1f5cd70e5b39cd064577110898c623e524a Author: Chamikara Jayalath Date: 2016-07-08T19:01:42Z Reverting some updates. commit 171df1ecedd51c7c72db309d526dfa9badf1 Author: Chamikara Jayalath Date: 2016-07-08T22:34:52Z Adds a method 'fileio.ChannelFactory.size_in_bytes()'' that can be used to determine the size of a single file. Updates 'filebasedsource' to use this method when determining size of files. > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368612#comment-15368612 ] ASF GitHub Bot commented on BEAM-360: - Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/599 > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365604#comment-15365604 ] ASF GitHub Bot commented on BEAM-360: - GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/599 [BEAM-360] Some updates related to dynamic work rebalancing of custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing results of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam custom_sources_dwr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/599.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #599 commit e51d4acf12133a79671c567c9ff709c941c54f8c Author: Chamikara Jayalath Date: 2016-06-21T01:09:50Z Implements a framework for developing sources for new file types. Module 'filebasedsource' provides a framework for creating sources for new file types. This framework readily implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. commit cacb613448b47592f8415570f7b64bc6de797f91 Author: Chamikara Jayalath Date: 2016-07-07T03:25:04Z Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit 264b4afc17c255e568a490e02ce47e9fb4b1e17a Author: Chamikara Jayalath Date: 2016-07-07T03:34:21Z Adds more comments. commit 49e097f9c5c3d8c2bca48d3416b4934a4d86ed34 Author: Chamikara Jayalath Date: 2016-07-07T04:41:06Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. commit c9696c9e17c9c7a6fc13d53d4da21ac9b325c73c Author: Chamikara Jayalath Date: 2016-07-07T04:41:20Z Some updates related to dynamic work rebalancing custom sources. Adds a class 'iobase.BoundedSourceSplit' to represent dynamic work rebalancing result of custom sources. Updates Dataflow runner specific code (apiclient.py) to support dynamic work rebalancing custom sources. Updates 'OffsetRangeTracker' so that the result of 'position_at_fraction()'' is a 'long' instead of a 'float'. > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343782#comment-15343782 ] ASF GitHub Bot commented on BEAM-360: - Github user chamikaramj closed the pull request at: https://github.com/apache/incubator-beam/pull/507 > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-360) Add a framework for creating Python-SDK sources for new file types
[ https://issues.apache.org/jira/browse/BEAM-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340898#comment-15340898 ] ASF GitHub Bot commented on BEAM-360: - GitHub user chamikaramj opened a pull request: https://github.com/apache/incubator-beam/pull/507 [BEAM-360] Implements a framework for developing Python SDK sources for new file types Module 'filebasedsource' provides a framework for creating sources for new file types. This framework implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chamikaramj/incubator-beam filebasedsource Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-beam/pull/507.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #507 commit e51d4acf12133a79671c567c9ff709c941c54f8c Author: Chamikara Jayalath Date: 2016-06-21T01:09:50Z Implements a framework for developing sources for new file types. Module 'filebasedsource' provides a framework for creating sources for new file types. This framework readily implements several features common to many sources based on files. Additionally, module 'avroio' contains a new source, 'AvroSource', that is implemented using the framework described above. 'AvroSource' is a source for reading Avro files. Adds many unit tests for 'filebasedsource' and 'avroio' modules. > Add a framework for creating Python-SDK sources for new file types > -- > > Key: BEAM-360 > URL: https://issues.apache.org/jira/browse/BEAM-360 > Project: Beam > Issue Type: New Feature > Components: sdk-py >Reporter: Chamikara Jayalath >Assignee: Chamikara Jayalath > > We already have a framework for creating new sources for Beam Python SDK - > https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/iobase.py#L326 > It would be great if we can add a framework on top of this that encapsulates > logic common to sources that are based on files. This framework can include > following features that are common to sources based on files. > (1) glob expansion > (2) support for new file-systems > (3) dynamic work rebalancing based on byte offsets > (4) support for reading compressed files. > Java SDK has a similar framework and it's available at - > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileBasedSource.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)