[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642 ] Rui Wang edited comment on BEAM-7758 at 7/21/19 5:20 AM: - 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implementation for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to specify windowing property. Therefore you don't need to worry about windowing property now. 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join.of(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. was (Author: amaliujia): 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implementation for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to specify windowing property. Therefore you don't need to worry about windowing property now. 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642 ] Rui Wang edited comment on BEAM-7758 at 7/21/19 5:04 AM: - 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implementation for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to specify windowing property. Therefore you don't need to worry about windowing property now. 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. was (Author: amaliujia): 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implementation for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to do windowing . 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642 ] Rui Wang edited comment on BEAM-7758 at 7/21/19 5:03 AM: - 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implementation for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to do windowing . 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. was (Author: amaliujia): 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implement for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to do windowing. 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642 ] Rui Wang commented on BEAM-7758: 1. I would suggest you start from [here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54] to trace down to understand current BeamSQL's table provider. In prod we don't need a addTable implement for table providers. 2. Windowing properties for the unbounded table is supposed to be solved by https://s.apache.org/streaming-beam-sql. This is the proposal that we can use pure SQL to do windowing. 3. I actually think you can try to explore if you can do "unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()". However, I don't like such syntax. The more nature API should be "pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." However, it is out of scope now. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889634#comment-16889634 ] Rahul Patwari commented on BEAM-7758: - "_All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider._" - But none of the existing Table Providers expose any methods to add a Table to the TableProvider. To do so, the TableProvider has to be extended like this: [https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java#L41] "_You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider._" - Do you mean to say that SlowChangingCacheTableProvider should be constructed with two tables(one for unbounded PCollection, one for slowChangingCacheTable PCollectionView) and perform a join query? If it is so, we have to take window properties for the unbounded table to perform the join. {code:java} Pipeline pipeline = Pipeline.create(options); SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withUnboundedTable(unboundedTable).withCacheTable(cacheTable).build(); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("slowChangingCache", provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); {code} Doesn't this limit Joining [Slowly changing cache table] only with [unbounded PCollection created from Table]? How to join [Slowly changing cache table] with [unbounded PCollection not created from Table(SqlTranform applied on an unbounded PCollection)]? > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889595#comment-16889595 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 11:44 PM: -- Here is an example of a pubsub unbounded PCollection that is created from a Table (than applying a SqlTransform to a unbounded PC): https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145 was (Author: amaliujia): Here is an example of a pubsub unbounded PCollection that is created from a Table (than apply SqlTransform to this unbounded PC): https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145 > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889595#comment-16889595 ] Rui Wang commented on BEAM-7758: Here is an example of a pubsub unbounded PCollection that is created from a Table (than apply SqlTransform to this unbounded PC): https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145 > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 9:22 PM: - Yes. I think your code shows expected workflow with a modification: {code:java} Pipeline pipeline = Pipeline.create(options); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("your_table_provider_name", create your table_provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); {code} You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. was (Author: amaliujia): Yes. I think your code shows expected workflow with some a modification: {code:java} Pipeline pipeline = Pipeline.create(options); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("your_table_provider_name", create your table_provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); {code} You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 9:16 PM: - Yes. I think your code shows expected workflow with some a modification: {code:java} Pipeline pipeline = Pipeline.create(options); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("your_table_provider_name", create your table_provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); {code} You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. was (Author: amaliujia): Yes. I think your code shows expected workflow with some a modification: Pipeline pipeline = Pipeline.create(options); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("your_table_provider_name", create your table_provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 9:16 PM: - Yes. I think your code shows expected workflow with some a modification: Pipeline pipeline = Pipeline.create(options); SqlTransform query = SqlTransform.query("join_query") .withDefaultTableProvider("your_table_provider_name", create your table_provider); PCollection queryResult = pipeline.apply("Run SQL Query", query); You don't need to apply sql query to unbounded PCollection. Your unbounded Pcollection are created within SqlTransform from a Table which is gotten from table provider. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. was (Author: amaliujia): Yes. I think your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 9:12 PM: - Yes. I think your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. was (Author: amaliujia): Yes. I think your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang edited comment on BEAM-7758 at 7/20/19 9:10 PM: - Yes. I think your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. was (Author: amaliujia): Yes. I see your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590 ] Rui Wang commented on BEAM-7758: Yes. I see your code shows expected workflow. All existing table provider can be used in this programmatic way. SqlTransform is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM TABLE_A", if you have a TABLE_A in table provider. > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-6783) byte[] breaks in BeamSQL codegen
[ https://issues.apache.org/jira/browse/BEAM-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889589#comment-16889589 ] Rui Wang commented on BEAM-6783: [~sridharG] Please let me know if you need any help. > byte[] breaks in BeamSQL codegen > > > Key: BEAM-6783 > URL: https://issues.apache.org/jira/browse/BEAM-6783 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Assignee: sridhar Reddy >Priority: Major > > Calcite will call `byte[].toString` because BeamSQL codegen read byte[] from > Row to calcite (see: > https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L334). > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7790) Make debugging subprocess workers easier
[ https://issues.apache.org/jira/browse/BEAM-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-7790: -- Description: 1. The output of the SDK workers is currently invisible due to the output and logging setup. 2. The dockerized version of the Python SDK worker sets up an HTTP server to let the user view stack traces for all of the worker's threads [1]. It would be useful if this was available for other execution modes as well. 3. Make the above items more usable with multiple subprocesses by identifying them with worker ids. [1] [https://github.com/apache/beam/blob/9f4ce1c6fc2fb195e218783a6e9ce6104ddb4d1e/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L46-L89] was: 1. The output of the SDK workers is currently invisible [1]. 2. The dockerized version of the Python SDK worker sets up an HTTP server to let the user view stack traces for all of the worker's threads [2]. It would be useful if this was available for other execution modes as well. 3. Make the above items more usable with multiple subprocesses by identifying them with worker ids. > Make debugging subprocess workers easier > > > Key: BEAM-7790 > URL: https://issues.apache.org/jira/browse/BEAM-7790 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > > 1. The output of the SDK workers is currently invisible due to the output and > logging setup. > 2. The dockerized version of the Python SDK worker sets up an HTTP server to > let the user view stack traces for all of the worker's threads [1]. It would > be useful if this was available for other execution modes as well. > 3. Make the above items more usable with multiple subprocesses by identifying > them with worker ids. > > [1] > [https://github.com/apache/beam/blob/9f4ce1c6fc2fb195e218783a6e9ce6104ddb4d1e/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L46-L89] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280132 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 21:01 Start Date: 20/Jul/19 21:01 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513498980 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280132) Time Spent: 7h 20m (was: 7h 10m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h 20m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7790) Make debugging subprocess workers easier
Kyle Weaver created BEAM-7790: - Summary: Make debugging subprocess workers easier Key: BEAM-7790 URL: https://issues.apache.org/jira/browse/BEAM-7790 Project: Beam Issue Type: Improvement Components: sdk-py-harness Reporter: Kyle Weaver Assignee: Kyle Weaver 1. The output of the SDK workers is currently invisible [1]. 2. The dockerized version of the Python SDK worker sets up an HTTP server to let the user view stack traces for all of the worker's threads [2]. It would be useful if this was available for other execution modes as well. 3. Make the above items more usable with multiple subprocesses by identifying them with worker ids. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-7747) ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on Windows
[ https://issues.apache.org/jira/browse/BEAM-7747?focusedWorklogId=280129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280129 ] ASF GitHub Bot logged work on BEAM-7747: Author: ASF GitHub Bot Created on: 20/Jul/19 20:37 Start Date: 20/Jul/19 20:37 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9111: [BEAM-7747] Close the file handle owned by fastavro.write.Writer in _FastAvroSink.close(). URL: https://github.com/apache/beam/pull/9111#issuecomment-513497441 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280129) Time Spent: 1h 20m (was: 1h 10m) > ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on > Windows > - > > Key: BEAM-7747 > URL: https://issues.apache.org/jira/browse/BEAM-7747 > Project: Beam > Issue Type: Bug > Components: io-python-avro, test-failures >Reporter: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > == > ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) > -- > Traceback (most recent call last): > File "C:\projects\beam\sdks\python\apache_beam\io\avroio_test.py", line > 436, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 426, in > __exit__ > self.run().wait_until_finish() > File "C:\projects\beam\sdks\python\apache_beam\testing\test_pipeline.py", > line 107, in run > else test_runner_api)) > File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 406, in > run > self._options).run(False) > File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 419, in > run > return self.runner.run_pipeline(self, self._options) > File > "C:\projects\beam\sdks\python\apache_beam\runners\direct\direct_runner.py", > line 128, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 319, in run_pipeline > default_environment=self._default_environment)) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 326, in run_via_runner_api > return self.run_stages(stage_context, stages) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 408, in run_stages > stage_context.safe_coders) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 681, in _run_stage > result, splits = bundle_manager.process_bundle(data_input, data_output) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 1562, in process_bundle > part_inputs): > File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line > 641, in result_iterator > yield fs.pop().result() > File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line > 462, in result > return self.__get_result() > File "C:\venv\newenv1\lib\site-packages\concurrent\futures\thread.py", line > 63, in run > result = self.fn(*self.args, **self.kwargs) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 1561, in > self._registered).process_bundle(part, expected_outputs), > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 1500, in process_bundle > result_future = self._controller.control_handler.push(process_bundle_req) > File > "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py", > line 1017, in push > response = self.worker.do_instruction(request) > File > "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line > 342, in do_instruction > request.instruction_id) > File > "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line > 368, in process_bundle > bundle_processor.process_bundle(instruction_id)) > File > "C:\projects\beam\sdks\python\apache_beam\runners\worker\bundle_processor.py", > line 593, in process_bundle >
[jira] [Commented] (BEAM-7742) BigQuery File Loads to work well with load job size limits
[ https://issues.apache.org/jira/browse/BEAM-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889583#comment-16889583 ] Tanay Tummalapalli commented on BEAM-7742: -- Yes, I'll work on this. I'm a bit late on the design doc. I'll work on it tomorrow. > BigQuery File Loads to work well with load job size limits > -- > > Key: BEAM-7742 > URL: https://issues.apache.org/jira/browse/BEAM-7742 > Project: Beam > Issue Type: Improvement > Components: io-python-gcp >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > > Load jobs into BigQuery have a number of limitations: > [https://cloud.google.com/bigquery/quotas#load_jobs] > > Currently, the python BQ sink implemented in `bigquery_file_loads.py` does > not handle these limitations well. Improvements need to be made to the > miplementation, to: > * Decide to use temp_tables dynamically at pipeline execution > * Add code to determine when a load job to a single destination needs to be > partitioned into multiple jobs. > * When this happens, then we definitely need to use temp_tables, in case one > of the two load jobs fails, and the pipeline is rerun. > Tanay, would you be able to look at this? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7752) Java Validates Direct Runner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky?
[ https://issues.apache.org/jira/browse/BEAM-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889577#comment-16889577 ] Michael Luckey commented on BEAM-7752: -- [~ŁukaszG] These tests seems to be flaky on direct runner. Need to look into implementation there to see, whether those assumptions made by original test creator hold. iirc direct runner was problematic with timings on that tests. > Java Validates Direct Runner: > testTeardownCalledAfterExceptionInFinishBundleStateful flaky? > --- > > Key: BEAM-7752 > URL: https://issues.apache.org/jira/browse/BEAM-7752 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Lukasz Gajowy >Priority: Major > Labels: sickbay > > See: > [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Direct/663/] > another run on the same master state was successful. As I see the job's > history, the test fails from time to time. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280126 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:20 Start Date: 20/Jul/19 19:20 Worklog Time Spent: 10m Work Description: ttanay commented on pull request #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#discussion_r304995214 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -622,8 +653,9 @@ def expand(self, pcoll): test_client=self.test_client, temporary_tables=self.temp_tables, additional_bq_parameters=self.additional_bq_parameters), -load_job_name_pcv, *self.schema_side_inputs).with_outputs( -TriggerLoadJobs.TEMP_TABLES, main='main') +load_job_name_pcv, self.is_streaming_pipeline, Review comment: According to the guide to generating Job IDs for BigQuery[1], the Job ID should be a human-readable prefix with a GUID/UUID suffix(Eg: `daily_import_job_1447971251`). If we can generate the job ID as: ```python "{}_{}".format(load_job_name_prefix, str(uuid.uuid4())) ``` we can do away with passing the `is_streaming_pipeline` flag altogether. What are your thoughts? Will it make it harder for users to debug their jobs? [1] https://cloud.google.com/bigquery/docs/running-jobs#generate-jobid This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280126) Time Spent: 7h 10m (was: 7h) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h 10m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280123 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:13 Start Date: 20/Jul/19 19:13 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513492186 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280123) Time Spent: 6h 40m (was: 6.5h) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 40m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280125 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:13 Start Date: 20/Jul/19 19:13 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513492200 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280125) Time Spent: 7h (was: 6h 50m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280124 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:13 Start Date: 20/Jul/19 19:13 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513492189 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280124) Time Spent: 6h 50m (was: 6h 40m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 50m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280122 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:12 Start Date: 20/Jul/19 19:12 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513492112 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280122) Time Spent: 6.5h (was: 6h 20m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6.5h > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280121 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 19:12 Start Date: 20/Jul/19 19:12 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513492093 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280121) Time Spent: 6h 20m (was: 6h 10m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 20m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280113 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 17:33 Start Date: 20/Jul/19 17:33 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596 I'm unable to run the IT tests for BQFL(except for `test_one_job_fails_all_jobs_fail`) locally(even on clean master) and also on a VM due to this error: ``` ERROR: test_multiple_destinations_transform (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) -- Traceback (most recent call last): File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py", line 467, in test_multiple_destinations_transform max_files_per_bundle=-1)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 426, in __exit__ self.run().wait_until_finish() File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 406, in run self._options).run(False) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 419, in run return self.runner.run_pipeline(self, self._options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py", line 43, in run_pipeline self.result = super(TestDirectRunner, self).run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 128, in run_pipeline return runner.run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 319, in run_pipeline default_environment=self._default_environment)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 326, in run_via_runner_api return self.run_stages(stage_context, stages) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 408, in run_stages stage_context.safe_coders) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 681, in _run_stage result, splits = bundle_manager.process_bundle(data_input, data_output) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1562, in process_bundle part_inputs): File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1561, in self._registered).process_bundle(part, expected_outputs), File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1500, in process_bundle result_future = self._controller.control_handler.push(process_bundle_req) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1017, in push response = self.worker.do_instruction(request) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 342, in do_instruction request.instruction_id) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 593, in process_bundle data.ptransform_id].process_encoded(data.data) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280112 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 17:32 Start Date: 20/Jul/19 17:32 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596 I'm unable to run the IT tests for BQFL locally(even on clean master) and also on a VM due to this error: ``` ERROR: test_multiple_destinations_transform (apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT) -- Traceback (most recent call last): File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py", line 467, in test_multiple_destinations_transform max_files_per_bundle=-1)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 426, in __exit__ self.run().wait_until_finish() File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 406, in run self._options).run(False) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py", line 419, in run return self.runner.run_pipeline(self, self._options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py", line 43, in run_pipeline self.result = super(TestDirectRunner, self).run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py", line 128, in run_pipeline return runner.run_pipeline(pipeline, options) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 319, in run_pipeline default_environment=self._default_environment)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 326, in run_via_runner_api return self.run_stages(stage_context, stages) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 408, in run_stages stage_context.safe_coders) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 681, in _run_stage result, splits = bundle_manager.process_bundle(data_input, data_output) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1562, in process_bundle part_inputs): File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator yield fs.pop().result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in result return self.__get_result() File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result raise self._exception File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1561, in self._registered).process_bundle(part, expected_outputs), File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1500, in process_bundle result_future = self._controller.control_handler.push(process_bundle_req) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1017, in push response = self.worker.do_instruction(request) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 342, in do_instruction request.instruction_id) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 593, in process_bundle data.ptransform_id].process_encoded(data.data) File "/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468 ] Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:27 PM: -- [~amaliujia] Is this the expected flow: {code:java} PCollection unboundedPcollection = ...; org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... .build();//Schema, Properties ... are given org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withTable(table).build(); unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingCache", provider).build()); {code} Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' be used in a programmatic way(using SqlTransform.query().withTableProvider())? was (Author: rahul8383): [~amaliujia] Is this the expected flow: {code:java} PCollection unboundedPcollection = ...; org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... .build();//Schema, Properties ... are given org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withTable(table).build(); unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingCacheTableName", provider).build()); {code} Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' be used in a programmatic way(using SqlTransform.query().withTableProvider())? > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468 ] Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:17 PM: -- [~amaliujia] Is this the expected flow: {code:java} PCollection unboundedPcollection = ...; org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... .build();//Schema, Properties ... are given org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withTable(table).build(); unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingCacheTableName", provider).build()); {code} Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' be used in a programmatic way(using SqlTransform.query().withTableProvider())? was (Author: rahul8383): [~amaliujia] Is this the expected flow: {code:java} PCollection unboundedPcollection = ...; org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... .build();//Schema, Properties ... are given org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withTable(table).build(); unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingCacheTableName", provider).build()); {code} Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' be used in a programmatic way(using SqlTransform.query().withTableProvider())? > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468 ] Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:16 PM: -- [~amaliujia] Is this the expected flow: {code:java} PCollection unboundedPcollection = ...; org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... .build();//Schema, Properties ... are given org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider provider = SlowChangingCacheTableProvider.builder().withTable(table).build(); unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingCacheTableName", provider).build()); {code} Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' be used in a programmatic way(using SqlTransform.query().withTableProvider())? was (Author: rahul8383): [~amaliujia] When Should the Table for PCollectionView get created? What are the changes needed in BeamSqlTable? Does the existing TableProviders like "KafkaTableProvider" (or) "TextTableProvider" can be used in the Java Program [https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java#L186] {{Is this the expected flow:}} {{PCollection unboundedPcollection = ...;}} {{slowChangingTableProvider.createTable(table);}} {{unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingTableName", slowChangingTableProvider).build());}} > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming
[ https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280094 ] ASF GitHub Bot logged work on BEAM-6611: Author: ASF GitHub Bot Created on: 20/Jul/19 14:40 Start Date: 20/Jul/19 14:40 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery file loads in Streaming for Python SDK URL: https://github.com/apache/beam/pull/8871#issuecomment-513473121 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 280094) Time Spent: 5h 50m (was: 5h 40m) > A Python Sink for BigQuery with File Loads in Streaming > --- > > Key: BEAM-6611 > URL: https://issues.apache.org/jira/browse/BEAM-6611 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Tanay Tummalapalli >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 5h 50m > Remaining Estimate: 0h > > The Java SDK supports a bunch of methods for writing data into BigQuery, > while the Python SDK supports the following: > - Streaming inserts for streaming pipelines [As seen in [bigquery.py and > BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]] > - File loads for batch pipelines [As implemented in [PR > 7655|https://github.com/apache/beam/pull/7655]] > Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming > The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads > application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776]. > File loads have the advantage of being much cheaper than streaming inserts > (although they also are slower for the records to show up in the table). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468 ] Rahul Patwari commented on BEAM-7758: - [~amaliujia] When Should the Table for PCollectionView get created? What are the changes needed in BeamSqlTable? Does the existing TableProviders like "KafkaTableProvider" (or) "TextTableProvider" can be used in the Java Program [https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java#L186] {{Is this the expected flow:}} {{PCollection unboundedPcollection = ...;}} {{slowChangingTableProvider.createTable(table);}} {{unboundedPcollection.apply(SqlTransform.query("JOIN query").withTableProvider("SlowChangingTableName", slowChangingTableProvider).build());}} > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-6857) Support dynamic timers
[ https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shehzaad Nakhoda reassigned BEAM-6857: -- Assignee: Shehzaad Nakhoda > Support dynamic timers > -- > > Key: BEAM-6857 > URL: https://issues.apache.org/jira/browse/BEAM-6857 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > The Beam timers API currently requires each timer to be statically specified > in the DoFn. The user must provide a separate callback method per timer. For > example: > DoFn() { > @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); > @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); > .. set timers in processElement > @OnTimer("timer1") public void onTimer1() \{ .} > @OnTimer("timer2") public void onTimer2() \{} > } > However there are many cases where the user does not know the set of timers > statically when writing their code. This happens when the timer tag should be > based on the data. It also happens when writing a DSL on top of Beam, where > the DSL author has to create DoFns but does not know statically which timers > their users will want to set (e.g. Scio). > > The goal is to support dynamic timers. Something as follows; > DoFn() { > @TimerId("timer") private final TimerSpec timer1 = > TimerSpecs.dynamicTimer(...); > @ProcessElement process(@TimerId("timer") DynamicTimer timer) { > timer.set("tag1'", ts); > timer.set("tag2", ts); > } > @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .} > } -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Patwari reassigned BEAM-7758: --- Assignee: Rahul Patwari > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Rahul Patwari >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL
[ https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889383#comment-16889383 ] Rui Wang commented on BEAM-7758: I think this JIRA could start from thinking about two change: 1. How to change [BeamSqlTable|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlTable.java]. Note that every table in SQL implements this interface. 2. How to propagate PCollectionView through SQL query plan, especially starting from [BeamIOSourceRel|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java] > Table returns PCollectionView in BeamSQL > > > Key: BEAM-7758 > URL: https://issues.apache.org/jira/browse/BEAM-7758 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > We might be able to define a table with properties that says this table > return a PCollectionView. By doing so we will have a trigger based > PCollectionView available in SQL rel nodes. > Relevant thread: > https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.14#76016)