[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642
 ] 

Rui Wang edited comment on BEAM-7758 at 7/21/19 5:20 AM:
-

1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implementation for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to specify windowing property. Therefore you don't need to worry about 
windowing property now.

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join.of(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.


was (Author: amaliujia):
1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implementation for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to specify windowing property. Therefore you don't need to worry about 
windowing property now.

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642
 ] 

Rui Wang edited comment on BEAM-7758 at 7/21/19 5:04 AM:
-

1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implementation for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to specify windowing property. Therefore you don't need to worry about 
windowing property now.

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.


was (Author: amaliujia):
1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implementation for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to do windowing .

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642
 ] 

Rui Wang edited comment on BEAM-7758 at 7/21/19 5:03 AM:
-

1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implementation for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to do windowing .

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.


was (Author: amaliujia):
1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implement for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to do windowing.

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889642#comment-16889642
 ] 

Rui Wang commented on BEAM-7758:


1. I would suggest you start from 
[here|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTableProvider.java#L54]
 to trace down to understand current BeamSQL's table provider. In prod we don't 
need a addTable implement for table providers.

2. Windowing properties for the unbounded table is supposed to be solved by 
https://s.apache.org/streaming-beam-sql. This is the proposal that we can use 
pure SQL to do windowing.

3. I actually think you can try to explore if you can do 
"unbounded_PCollection.apply(SqlTransform.query().withSlowChangingTableProvider()".
 However, I don't like such syntax. The more nature API should be 
"pipeline.apply(Join(unbounded_PC, slow_changing_PC).ON(join_condition))." 
However, it is out of scope now.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889634#comment-16889634
 ] 

Rahul Patwari commented on BEAM-7758:
-

"_All existing table provider can be used in this programmatic way. 
SqlTransform is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * 
FROM TABLE_A", if you have a TABLE_A in table provider._"

 - But none of the existing Table Providers expose any methods to add a Table 
to the TableProvider. To do so, the TableProvider has to be extended like this: 
 
[https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTestTableProvider.java#L41]

"_You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider._"

 - Do you mean to say that SlowChangingCacheTableProvider should be constructed 
with two tables(one for unbounded PCollection, one for slowChangingCacheTable 
PCollectionView) and perform a join query? If it is so, we have to take window 
properties for the unbounded table to perform the join.
{code:java}
Pipeline pipeline = Pipeline.create(options);

SlowChangingCacheTableProvider provider = 
SlowChangingCacheTableProvider.builder().withUnboundedTable(unboundedTable).withCacheTable(cacheTable).build();

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("slowChangingCache", provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);
{code}
 

Doesn't this limit Joining [Slowly changing cache table] only with [unbounded 
PCollection created from Table]?  How to join [Slowly changing cache table] 
with [unbounded PCollection not created from Table(SqlTranform applied on an 
unbounded PCollection)]?

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889595#comment-16889595
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 11:44 PM:
--

Here is an example of a pubsub unbounded PCollection that is created from a 
Table (than applying a  SqlTransform to a unbounded PC): 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145


was (Author: amaliujia):
Here is an example of a pubsub unbounded PCollection that is created from a 
Table (than apply SqlTransform to this unbounded PC): 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889595#comment-16889595
 ] 

Rui Wang commented on BEAM-7758:


Here is an example of a pubsub unbounded PCollection that is created from a 
Table (than apply SqlTransform to this unbounded PC): 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java#L145

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 9:22 PM:
-

Yes. I think your code shows expected workflow with a modification:


{code:java}
Pipeline pipeline = Pipeline.create(options);

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("your_table_provider_name", create your 
table_provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);
{code}



You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider.

All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.


was (Author: amaliujia):
Yes. I think your code shows expected workflow with some a modification:


{code:java}
Pipeline pipeline = Pipeline.create(options);

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("your_table_provider_name", create your 
table_provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);
{code}



You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider.

All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 9:16 PM:
-

Yes. I think your code shows expected workflow with some a modification:


{code:java}
Pipeline pipeline = Pipeline.create(options);

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("your_table_provider_name", create your 
table_provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);
{code}



You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider.

All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.


was (Author: amaliujia):
Yes. I think your code shows expected workflow with some a modification:

Pipeline pipeline = Pipeline.create(options);

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("your_table_provider_name", create your 
table_provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);

You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider.

All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 9:16 PM:
-

Yes. I think your code shows expected workflow with some a modification:

Pipeline pipeline = Pipeline.create(options);

SqlTransform query =
SqlTransform.query("join_query")
.withDefaultTableProvider("your_table_provider_name", create your 
table_provider);

PCollection queryResult = pipeline.apply("Run SQL Query", query);

You don't need to apply sql query to unbounded PCollection. Your unbounded 
Pcollection are created within SqlTransform from a Table which is gotten from 
table provider.

All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.


was (Author: amaliujia):
Yes. I think your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 9:12 PM:
-

Yes. I think your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCOLLECTION" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.


was (Author: amaliujia):
Yes. I think your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang edited comment on BEAM-7758 at 7/20/19 9:10 PM:
-

Yes. I think your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.


was (Author: amaliujia):
Yes. I see your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889590#comment-16889590
 ] 

Rui Wang commented on BEAM-7758:


Yes. I see your code shows expected workflow.


All existing table provider can be used in this programmatic way. SqlTransform 
is not limited to "SELECT * FROM PCollection" but can do "SELECT * FROM 
TABLE_A", if you have a TABLE_A in table provider.

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6783) byte[] breaks in BeamSQL codegen

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889589#comment-16889589
 ] 

Rui Wang commented on BEAM-6783:


[~sridharG]

Please let me know if you need any help.

> byte[] breaks in BeamSQL codegen
> 
>
> Key: BEAM-6783
> URL: https://issues.apache.org/jira/browse/BEAM-6783
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> Calcite will call `byte[].toString` because BeamSQL codegen read byte[] from 
> Row to calcite (see: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamCalcRel.java#L334).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7790) Make debugging subprocess workers easier

2019-07-20 Thread Kyle Weaver (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-7790:
--
Description: 
1. The output of the SDK workers is currently invisible due to the output and 
logging setup.

2. The dockerized version of the Python SDK worker sets up an HTTP server to 
let the user view stack traces for all of the worker's threads [1]. It would be 
useful if this was available for other execution modes as well.

3. Make the above items more usable with multiple subprocesses by identifying 
them with worker ids.

 

[1] 
[https://github.com/apache/beam/blob/9f4ce1c6fc2fb195e218783a6e9ce6104ddb4d1e/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L46-L89]

  was:
1. The output of the SDK workers is currently invisible [1].

2. The dockerized version of the Python SDK worker sets up an HTTP server to 
let the user view stack traces for all of the worker's threads [2]. It would be 
useful if this was available for other execution modes as well.

3. Make the above items more usable with multiple subprocesses by identifying 
them with worker ids. 


> Make debugging subprocess workers easier
> 
>
> Key: BEAM-7790
> URL: https://issues.apache.org/jira/browse/BEAM-7790
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>
> 1. The output of the SDK workers is currently invisible due to the output and 
> logging setup.
> 2. The dockerized version of the Python SDK worker sets up an HTTP server to 
> let the user view stack traces for all of the worker's threads [1]. It would 
> be useful if this was available for other execution modes as well.
> 3. Make the above items more usable with multiple subprocesses by identifying 
> them with worker ids.
>  
> [1] 
> [https://github.com/apache/beam/blob/9f4ce1c6fc2fb195e218783a6e9ce6104ddb4d1e/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L46-L89]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280132
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 21:01
Start Date: 20/Jul/19 21:01
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513498980
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280132)
Time Spent: 7h 20m  (was: 7h 10m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7790) Make debugging subprocess workers easier

2019-07-20 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7790:
-

 Summary: Make debugging subprocess workers easier
 Key: BEAM-7790
 URL: https://issues.apache.org/jira/browse/BEAM-7790
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-harness
Reporter: Kyle Weaver
Assignee: Kyle Weaver


1. The output of the SDK workers is currently invisible [1].

2. The dockerized version of the Python SDK worker sets up an HTTP server to 
let the user view stack traces for all of the worker's threads [2]. It would be 
useful if this was available for other execution modes as well.

3. Make the above items more usable with multiple subprocesses by identifying 
them with worker ids. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7747) ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on Windows

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7747?focusedWorklogId=280129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280129
 ]

ASF GitHub Bot logged work on BEAM-7747:


Author: ASF GitHub Bot
Created on: 20/Jul/19 20:37
Start Date: 20/Jul/19 20:37
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9111: [BEAM-7747] Close 
the file handle owned by fastavro.write.Writer in _FastAvroSink.close().
URL: https://github.com/apache/beam/pull/9111#issuecomment-513497441
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280129)
Time Spent: 1h 20m  (was: 1h 10m)

> ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on 
> Windows
> -
>
> Key: BEAM-7747
> URL: https://issues.apache.org/jira/browse/BEAM-7747
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-avro, test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro)
> --
> Traceback (most recent call last):
>   File "C:\projects\beam\sdks\python\apache_beam\io\avroio_test.py", line 
> 436, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 426, in 
> __exit__
> self.run().wait_until_finish()
>   File "C:\projects\beam\sdks\python\apache_beam\testing\test_pipeline.py", 
> line 107, in run
> else test_runner_api))
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 406, in 
> run
> self._options).run(False)
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 419, in 
> run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\direct\direct_runner.py", 
> line 128, in run_pipeline
> return runner.run_pipeline(pipeline, options)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 319, in run_pipeline
> default_environment=self._default_environment))
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 326, in run_via_runner_api
> return self.run_stages(stage_context, stages)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 408, in run_stages
> stage_context.safe_coders)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 681, in _run_stage
> result, splits = bundle_manager.process_bundle(data_input, data_output)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1562, in process_bundle
> part_inputs):
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line 
> 641, in result_iterator
> yield fs.pop().result()
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line 
> 462, in result
> return self.__get_result()
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\thread.py", line 
> 63, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1561, in 
> self._registered).process_bundle(part, expected_outputs),
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1500, in process_bundle
> result_future = self._controller.control_handler.push(process_bundle_req)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1017, in push
> response = self.worker.do_instruction(request)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line 
> 342, in do_instruction
> request.instruction_id)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line 
> 368, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\bundle_processor.py",
>  line 593, in process_bundle
> 

[jira] [Commented] (BEAM-7742) BigQuery File Loads to work well with load job size limits

2019-07-20 Thread Tanay Tummalapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889583#comment-16889583
 ] 

Tanay Tummalapalli commented on BEAM-7742:
--

Yes, I'll work on this. I'm a bit late on the design doc. 
I'll work on it tomorrow.

> BigQuery File Loads to work well with load job size limits
> --
>
> Key: BEAM-7742
> URL: https://issues.apache.org/jira/browse/BEAM-7742
> Project: Beam
>  Issue Type: Improvement
>  Components: io-python-gcp
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>
> Load jobs into BigQuery have a number of limitations: 
> [https://cloud.google.com/bigquery/quotas#load_jobs]
>  
> Currently, the python BQ sink implemented in `bigquery_file_loads.py` does 
> not handle these limitations well. Improvements need to be made to the 
> miplementation, to:
>  * Decide to use temp_tables dynamically at pipeline execution
>  * Add code to determine when a load job to a single destination needs to be 
> partitioned into multiple jobs.
>  * When this happens, then we definitely need to use temp_tables, in case one 
> of the two load jobs fails, and the pipeline is rerun.
> Tanay, would you be able to look at this?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7752) Java Validates Direct Runner: testTeardownCalledAfterExceptionInFinishBundleStateful flaky?

2019-07-20 Thread Michael Luckey (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889577#comment-16889577
 ] 

Michael Luckey commented on BEAM-7752:
--

[~ŁukaszG]
These tests seems to be flaky on direct runner. Need to look into 
implementation there to see, whether those assumptions made by original test 
creator hold. iirc direct runner was problematic with timings on that tests.

> Java Validates Direct Runner: 
> testTeardownCalledAfterExceptionInFinishBundleStateful flaky?
> ---
>
> Key: BEAM-7752
> URL: https://issues.apache.org/jira/browse/BEAM-7752
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
>  Labels: sickbay
>
> See: 
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Direct/663/]
> another run on the same master state was successful. As I see the job's 
> history, the test fails from time to time. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280126
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:20
Start Date: 20/Jul/19 19:20
Worklog Time Spent: 10m 
  Work Description: ttanay commented on pull request #8871: [BEAM-6611] 
BigQuery file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#discussion_r304995214
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -622,8 +653,9 @@ def expand(self, pcoll):
 test_client=self.test_client,
 temporary_tables=self.temp_tables,
 additional_bq_parameters=self.additional_bq_parameters),
-load_job_name_pcv, *self.schema_side_inputs).with_outputs(
-TriggerLoadJobs.TEMP_TABLES, main='main')
+load_job_name_pcv, self.is_streaming_pipeline,
 
 Review comment:
   According to the guide to generating Job IDs for BigQuery[1], the Job ID 
should be a human-readable prefix with a GUID/UUID suffix(Eg: 
`daily_import_job_1447971251`).
   If we can generate the job ID as:
   ```python
   "{}_{}".format(load_job_name_prefix, str(uuid.uuid4()))
   ```
we can do away with passing the `is_streaming_pipeline` flag altogether. 
   What are your thoughts? Will it make it harder for users to debug their jobs?
   
   [1] https://cloud.google.com/bigquery/docs/running-jobs#generate-jobid
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280126)
Time Spent: 7h 10m  (was: 7h)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280123
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:13
Start Date: 20/Jul/19 19:13
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513492186
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280123)
Time Spent: 6h 40m  (was: 6.5h)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280125
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:13
Start Date: 20/Jul/19 19:13
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513492200
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280125)
Time Spent: 7h  (was: 6h 50m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280124
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:13
Start Date: 20/Jul/19 19:13
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513492189
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280124)
Time Spent: 6h 50m  (was: 6h 40m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280122
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:12
Start Date: 20/Jul/19 19:12
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513492112
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280122)
Time Spent: 6.5h  (was: 6h 20m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280121
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 19:12
Start Date: 20/Jul/19 19:12
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513492093
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280121)
Time Spent: 6h 20m  (was: 6h 10m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280113
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 17:33
Start Date: 20/Jul/19 17:33
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596
 
 
   I'm unable to run the IT tests for BQFL(except for 
`test_one_job_fails_all_jobs_fail`) locally(even on clean master) and also on a 
VM due to this error:
   ```
   ERROR: test_multiple_destinations_transform 
(apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT)
   --
   Traceback (most recent call last):
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py",
 line 467, in test_multiple_destinations_transform
   max_files_per_bundle=-1))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__
   self.run().wait_until_finish()
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 406, in run
   self._options).run(False)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 419, in run
   return self.runner.run_pipeline(self, self._options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
 line 43, in run_pipeline
   self.result = super(TestDirectRunner, self).run_pipeline(pipeline, 
options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 128, in run_pipeline
   return runner.run_pipeline(pipeline, options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 319, in run_pipeline
   default_environment=self._default_environment))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 326, in run_via_runner_api
   return self.run_stages(stage_context, stages)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 408, in run_stages
   stage_context.safe_coders)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 681, in _run_stage
   result, splits = bundle_manager.process_bundle(data_input, data_output)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1562, in process_bundle
   part_inputs):
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in 
result_iterator
   yield fs.pop().result()
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in 
result
   return self.__get_result()
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in 
__get_result
   raise self._exception
 File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in 
run
   result = self.fn(*self.args, **self.kwargs)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1561, in 
   self._registered).process_bundle(part, expected_outputs),
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1500, in process_bundle
   result_future = self._controller.control_handler.push(process_bundle_req)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1017, in push
   response = self.worker.do_instruction(request)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
   request.instruction_id)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle
   bundle_processor.process_bundle(instruction_id))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 593, in process_bundle
   data.ptransform_id].process_encoded(data.data)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
   self.output(decoded_value)
 File 

[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280112
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 17:32
Start Date: 20/Jul/19 17:32
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596
 
 
   I'm unable to run the IT tests for BQFL locally(even on clean master) and 
also on a VM due to this error:
   ```
   ERROR: test_multiple_destinations_transform 
(apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT)
   --
   Traceback (most recent call last):
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py",
 line 467, in test_multiple_destinations_transform
   max_files_per_bundle=-1))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 426, in __exit__
   self.run().wait_until_finish()
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 406, in run
   self._options).run(False)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
 line 419, in run
   return self.runner.run_pipeline(self, self._options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
 line 43, in run_pipeline
   self.result = super(TestDirectRunner, self).run_pipeline(pipeline, 
options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
 line 128, in run_pipeline
   return runner.run_pipeline(pipeline, options)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 319, in run_pipeline
   default_environment=self._default_environment))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 326, in run_via_runner_api
   return self.run_stages(stage_context, stages)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 408, in run_stages
   stage_context.safe_coders)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 681, in _run_stage
   result, splits = bundle_manager.process_bundle(data_input, data_output)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1562, in process_bundle
   part_inputs):
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in 
result_iterator
   yield fs.pop().result()
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in 
result
   return self.__get_result()
 File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in 
__get_result
   raise self._exception
 File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in 
run
   result = self.fn(*self.args, **self.kwargs)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1561, in 
   self._registered).process_bundle(part, expected_outputs),
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1500, in process_bundle
   result_future = self._controller.control_handler.push(process_bundle_req)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
 line 1017, in push
   response = self.worker.do_instruction(request)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
   request.instruction_id)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle
   bundle_processor.process_bundle(instruction_id))
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 593, in process_bundle
   data.ptransform_id].process_encoded(data.data)
 File 
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
   self.output(decoded_value)
 File 

[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468
 ] 

Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:27 PM:
--

[~amaliujia] 

Is this the expected flow:
{code:java}
PCollection unboundedPcollection = ...;

org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... 
.build();//Schema, Properties ...  are given 

org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider 
provider = SlowChangingCacheTableProvider.builder().withTable(table).build();

unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingCache", provider).build());

{code}
Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' 
be used in a programmatic way(using SqlTransform.query().withTableProvider())?


was (Author: rahul8383):
[~amaliujia] 

Is this the expected flow:
{code:java}
PCollection unboundedPcollection = ...;

org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... 
.build();//Schema, Properties ...  are given 

org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider 
provider = SlowChangingCacheTableProvider.builder().withTable(table).build();

unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingCacheTableName", provider).build());

{code}
Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' 
be used in a programmatic way(using SqlTransform.query().withTableProvider())?

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468
 ] 

Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:17 PM:
--

[~amaliujia] 

Is this the expected flow:
{code:java}
PCollection unboundedPcollection = ...;

org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... 
.build();//Schema, Properties ...  are given 

org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider 
provider = SlowChangingCacheTableProvider.builder().withTable(table).build();

unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingCacheTableName", provider).build());

{code}
Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' 
be used in a programmatic way(using SqlTransform.query().withTableProvider())?


was (Author: rahul8383):
[~amaliujia] 

Is this the expected flow:
{code:java}
PCollection unboundedPcollection = ...;

org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... 
.build();//Schema, Properties ...  are given 

org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider 
provider = SlowChangingCacheTableProvider.builder().withTable(table).build();

unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingCacheTableName", provider).build());

{code}
 

Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' 
be used in a programmatic way(using SqlTransform.query().withTableProvider())?

 

 

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468
 ] 

Rahul Patwari edited comment on BEAM-7758 at 7/20/19 4:16 PM:
--

[~amaliujia] 

Is this the expected flow:
{code:java}
PCollection unboundedPcollection = ...;

org.apache.beam.sdk.extensions.sql.meta.Table table = Table.builder(). ... 
.build();//Schema, Properties ...  are given 

org.apache.beam.sdk.extensions.sql.meta.provider.SlowChangingCacheTableProvider 
provider = SlowChangingCacheTableProvider.builder().withTable(table).build();

unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingCacheTableName", provider).build());

{code}
 

Can the existing Table Providers like 'KafkaTableProvider', 'TextTableProvider' 
be used in a programmatic way(using SqlTransform.query().withTableProvider())?

 

 


was (Author: rahul8383):
[~amaliujia]

When Should the Table for PCollectionView get created?

What are the changes needed in BeamSqlTable?

Does the existing TableProviders like "KafkaTableProvider" (or) 
"TextTableProvider" can be used in the Java Program 
[https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java#L186]

{{Is this the expected flow:}}

{{PCollection unboundedPcollection = ...;}}

{{slowChangingTableProvider.createTable(table);}}

{{unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingTableName", 
slowChangingTableProvider).build());}}

 

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280094
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 20/Jul/19 14:40
Start Date: 20/Jul/19 14:40
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513473121
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 280094)
Time Spent: 5h 50m  (was: 5h 40m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889468#comment-16889468
 ] 

Rahul Patwari commented on BEAM-7758:
-

[~amaliujia]

When Should the Table for PCollectionView get created?

What are the changes needed in BeamSqlTable?

Does the existing TableProviders like "KafkaTableProvider" (or) 
"TextTableProvider" can be used in the Java Program 
[https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/SqlTransform.java#L186]

{{Is this the expected flow:}}

{{PCollection unboundedPcollection = ...;}}

{{slowChangingTableProvider.createTable(table);}}

{{unboundedPcollection.apply(SqlTransform.query("JOIN 
query").withTableProvider("SlowChangingTableName", 
slowChangingTableProvider).build());}}

 

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6857) Support dynamic timers

2019-07-20 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6857:
--

Assignee: Shehzaad Nakhoda

> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
> DoFn() {
>   @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);
>   @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);
>                 .. set timers in processElement
>    @OnTimer("timer1") public void onTimer1() \{ .}
>    @OnTimer("timer2") public void onTimer2() \{}
> }
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
> DoFn() {
>   @TimerId("timer") private final TimerSpec timer1 = 
> TimerSpecs.dynamicTimer(...);
>    @ProcessElement process(@TimerId("timer") DynamicTimer timer) {
>       timer.set("tag1'", ts);
>       timer.set("tag2", ts);
>     }
>    @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .}
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rahul Patwari (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Patwari reassigned BEAM-7758:
---

Assignee: Rahul Patwari

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rahul Patwari
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7758) Table returns PCollectionView in BeamSQL

2019-07-20 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889383#comment-16889383
 ] 

Rui Wang commented on BEAM-7758:


I think this JIRA could start from thinking about two change:
1. How to change 
[BeamSqlTable|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlTable.java].
 Note that every table in SQL implements this interface.

2. How to propagate PCollectionView through SQL query plan, especially starting 
from 
[BeamIOSourceRel|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java]

> Table returns PCollectionView in BeamSQL
> 
>
> Key: BEAM-7758
> URL: https://issues.apache.org/jira/browse/BEAM-7758
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> We might be able to define a table with properties that says this table 
> return a PCollectionView. By doing so we will have a trigger based 
> PCollectionView available in SQL rel nodes. 
> Relevant thread: 
> https://lists.apache.org/thread.html/602121ee49886590ce4975f66aa0d270b4a5d64575337fca8bef1232@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)