[jira] [Created] (BEAM-8408) [RedisIO][java] Writing with method SADD doesn't set TTL

2019-10-15 Thread Andrey Sokolov (Jira)
Andrey Sokolov created BEAM-8408:


 Summary: [RedisIO][java] Writing with method SADD doesn't set TTL
 Key: BEAM-8408
 URL: https://issues.apache.org/jira/browse/BEAM-8408
 Project: Beam
  Issue Type: Bug
  Components: io-java-redis
Affects Versions: 2.15.0, 2.14.0, 2.16.0
Reporter: Andrey Sokolov


[https://github.com/apache/beam/blob/master/sdks/java/io/redis/src/main/java/org/apache/beam/sdk/io/redis/RedisIO.java]
{code:java}
private void writeUsingSaddCommand(KV record, Long expireTime) {
  String key = record.getKey();
  String value = record.getValue();

  pipeline.sadd(key, value);
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8403?focusedWorklogId=328887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328887
 ]

ASF GitHub Bot logged work on BEAM-8403:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:57
Start Date: 16/Oct/19 00:57
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9800: [BEAM-8403] Guard 
request id generation to prevent concurrent worker access
URL: https://github.com/apache/beam/pull/9800#issuecomment-542464476
 
 
   @mxm no occurrence of the error during testing with this change, we should 
be good to go
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328887)
Time Spent: 1h 10m  (was: 1h)

> Race condition in request id generation of GrpcStateRequestHandler
> --
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after 
> the recent changes to process append/clear state request asynchronously. The 
> race condition can occur if multiple Runner workers process a transform with 
> state requests with the same SDK Harness. For example, this setup occurs with 
> Flink when a TaskManager has multiple task slots and two or more of those 
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=328881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328881
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:31
Start Date: 16/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [BEAM-8382] Add polling 
interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542445144
 
 
   @aromanenko-dev Kinesis limits us to 5 getRecords calls per second per 
shard. If we're the only consumer on the stream we can call getRecords every 
200ms without getting throttled (in theory). If we introduce a 1 sec delay on 
the first KMSThrottlingException then we have an unnecessary 800ms of latency. 
Let's say we reduce the initial delay to 100ms. Assuming a successful call 
takes about 10ms, we'll hit an exception after 50ms and for the next 950ms 
thereafter. On the 4th retry we'll succeed but we'll be at 800ms. Reduce it to 
10ms and on the 7th retry we'll succeed but we'll be at 640ms. In any case 
we've introduced another knob (initial backoff delay time).
   
   So that's assuming delay time starts at zero and only increases when we get 
a KMSThrottlingException. Since that's 100% guaranteed to happen we could try 
with a non-zero initial delay instead. Maybe if we start at 200ms we won't get 
throttled at all and we won't overshoot. Another knob.
   
   OK so now we've overshot. Maybe we can ease back on the delay. We can 
speculatively try a shorter delay time and see if we still get throttled. How 
often to try? How much to ease back? More knobs.
   
   You can fiddle with these knobs and come up with something that works well 
when you're the only consumer, but as soon as you have 2 or more consumers you 
have to throw that out the window. If you're pulling back more or less records 
that may take more or less time and throw off your timing. You can't really 
hardcode anything because what works well in one scenario may not work well in 
another.
   
   So those are my initial thoughts. I'm totally open to the idea that I'm just 
overthinking it. I think the ultimate test would be to try it out and see what 
works. But if you want to minimize latency (and we do) I think the 1 second 
rolling window introduces a feedback delay that makes this a little more 
complicated than first glance. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328881)
Time Spent: 1h 10m  (was: 1h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328875
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:17
Start Date: 16/Oct/19 00:17
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] Project 
push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-542456459
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328875)
Time Spent: 7h 20m  (was: 7h 10m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328874=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328874
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:17
Start Date: 16/Oct/19 00:17
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] Project 
push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-542456459
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328874)
Time Spent: 7h 10m  (was: 7h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328872=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328872
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:16
Start Date: 16/Oct/19 00:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r335231248
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -451,7 +451,7 @@ def test_dofn_client_finish_bundle_flush_called(self):
 
 # Destination is a tuple of (destination, schema) to ensure the table is
 # created.
-fn.process(('project_id:dataset_id.table_id', {'month': 1}))
+fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid3')))
 
 Review comment:
   This is the input that we're passing to the function, so we can pass 
anything we want : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328872)
Time Spent: 1h 10m  (was: 1h)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328873
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:16
Start Date: 16/Oct/19 00:16
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r335231285
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -425,8 +425,8 @@ def test_dofn_client_process_flush_called(self):
 test_client=client)
 
 fn.start_bundle()
-fn.process(('project_id:dataset_id.table_id', {'month': 1}))
-fn.process(('project_id:dataset_id.table_id', {'month': 2}))
+fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid1')))
 
 Review comment:
   That makes sense. I'll add a test.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328873)
Time Spent: 1h 20m  (was: 1h 10m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8405) Python: Datastore: add support for embedded entities

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8405?focusedWorklogId=328871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328871
 ]

ASF GitHub Bot logged work on BEAM-8405:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:13
Start Date: 16/Oct/19 00:13
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9805: [BEAM-8405] Support 
embedded Datastore entities
URL: https://github.com/apache/beam/pull/9805#issuecomment-542455698
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328871)
Time Spent: 20m  (was: 10m)

> Python: Datastore: add support for embedded entities 
> -
>
> Key: BEAM-8405
> URL: https://issues.apache.org/jira/browse/BEAM-8405
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The conversion methods to/from the client entity type should be updated to 
> support an embedded Entity.
> https://github.com/apache/beam/blob/603d68aafe9bdcd124d28ad62ad36af01e7a7403/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py#L216-L240



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8405) Python: Datastore: add support for embedded entities

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8405?focusedWorklogId=328869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328869
 ]

ASF GitHub Bot logged work on BEAM-8405:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:11
Start Date: 16/Oct/19 00:11
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9805: [BEAM-8405] 
Support embedded Datastore entities
URL: https://github.com/apache/beam/pull/9805
 
 
   Python SDK already supports keys as property values, add entities as well.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328865
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:01
Start Date: 16/Oct/19 00:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9797: 
[BEAM-8367] Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r335227521
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -451,7 +451,7 @@ def test_dofn_client_finish_bundle_flush_called(self):
 
 # Destination is a tuple of (destination, schema) to ensure the table is
 # created.
-fn.process(('project_id:dataset_id.table_id', {'month': 1}))
+fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid3')))
 
 Review comment:
   I'm wondering why this end up being 'insertid3' not 'insertid1'. Shouldn't 
the sequence be restarted from 0 for the new object ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328865)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328864
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:01
Start Date: 16/Oct/19 00:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9797: 
[BEAM-8367] Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r335228241
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -425,8 +425,8 @@ def test_dofn_client_process_flush_called(self):
 test_client=client)
 
 fn.start_bundle()
-fn.process(('project_id:dataset_id.table_id', {'month': 1}))
-fn.process(('project_id:dataset_id.table_id', {'month': 2}))
+fn.process(('project_id:dataset_id.table_id', ({'month': 1}, 'insertid1')))
 
 Review comment:
   Can we add a test that explicitly checks that inserts IDs (as expected) get 
added ? (these tests also check for other things.
   
   Also, optionally, it's good if we can somehow add a unit test that induces a 
failure in a direct runner based job (in a step before BQ sink)  to confirm 
that we do no regenerate the insert ID for the same element.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328864)
Time Spent: 1h  (was: 50m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=328859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328859
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:00
Start Date: 16/Oct/19 00:00
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335228033
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
+  """Converts a google.protobuf.Timestamp and
+ apache_beam.util.timestamp.Timestamp to seconds since epoch.
+  """
+  if isinstance(ts, timestamp_pb2.Timestamp):
+return (ts.seconds * 10**6) + (ts.nanos * 10**-3)
+  if isinstance(ts, timestamp.Timestamp):
+return ts.micros
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._readers = [reader.read() for reader in readers]
 
 Review comment:
   No, the elements in the self._readers are generators that produce the events 
we want. These fit the interface in https://github.com/apache/beam/pull/8884 
for the PCollectionCache.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328859)
Time Spent: 3h 10m  (was: 3h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328858
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:00
Start Date: 16/Oct/19 00:00
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] Project 
push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-542452951
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328858)
Time Spent: 6h 50m  (was: 6h 40m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7847) Generate Python SDK docs using Python 3

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7847?focusedWorklogId=328862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328862
 ]

ASF GitHub Bot logged work on BEAM-7847:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:01
Start Date: 16/Oct/19 00:01
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on pull request #9804: 
[WIP][BEAM-7847] generate SDK docs with Python3
URL: https://github.com/apache/beam/pull/9804
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328860
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 16/Oct/19 00:00
Start Date: 16/Oct/19 00:00
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] Project 
push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-542452951
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328860)
Time Spent: 7h  (was: 6h 50m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=328854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328854
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:59
Start Date: 15/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335227669
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,123 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+
+from google.protobuf import timestamp_pb2
+
+
+def to_timestamp(timestamp_secs):
+  """Converts seconds since epoch to an apache_beam.util.Timestamp.
+  """
+  return timestamp.Timestamp.of(timestamp_secs)
+
+def from_timestamp_proto(timestamp_proto):
+  return timestamp.Timestamp(seconds=timestamp_proto.seconds,
+ micros=timestamp_proto.nanos * 1000)
+
+def to_timestamp_usecs(ts):
 
 Review comment:
   I moved the methods to be on the timestamp.Timestamp object.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328854)
Time Spent: 3h  (was: 2h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8407) [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner)

2019-10-15 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8407:

Description: 
One of the tests with this issue:
{code:java}
./gradlew -p sdks/java/extensions/sql/ test --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.hcatalog.BeamSqlHiveSchemaTest.testSelectFromImplicitDefaultDb
 --info
{code}
Stack trace:
{code:java}
[direct-runner-worker] ERROR hive.log - Got exception: 
java.lang.NullPointerException null[direct-runner-worker] ERROR hive.log - Got 
exception: java.lang.NullPointerException nulljava.lang.NullPointerException 
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
 ~[hive-exec-2.1.0.jar:2.1.0] 
at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
 ~[hive-exec-2.1.0.jar:2.1.0] 
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1126)
 [hive-metastore-2.1.0.jar:2.1.0] 
at 
org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:369)
 [hive-hcatalog-core-2.1.0.jar:2.1.0] 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222] 
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_222] 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_222] 
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222] 
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:150)
 [hive-metastore-2.1.0.jar:2.1.0] 
at com.sun.proxy.$Proxy56.isOpen(Unknown Source) [?:?] 
at 
org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:205) 
[hive-hcatalog-core-2.1.0.jar:2.1.0] 
at 
org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
 [hive-hcatalog-core-2.1.0.jar:2.1.0] 
at 
org.apache.beam.sdk.io.hcatalog.HCatalogUtils.createMetaStoreClient(HCatalogUtils.java:42)
 [beam-sdks-java-io-hcatalog-2.17.0-SNAPSHOT.jar:?] 
at 
org.apache.beam.sdk.io.hcatalog.HCatalogIO$BoundedHCatalogSource.getEstimatedSizeBytes(HCatalogIO.java:323)
 [beam-sdks-java-io-hcatalog-2.17.0-SNAPSHOT.jar:?] 
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.startDynamicSplitThread(BoundedReadEvaluatorFactory.java:172)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] 
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] 
at 
org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] 
at 
org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222] 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_222] 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222] 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222] 
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

... more NullPointerExceptions{code}

  was:
One of the tests with this issue:
{code:java}
./gradlew -p sdks/java/extensions/sql/ test --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.hcatalog.BeamSqlHiveSchemaTest.testSelectFromImplicitDefaultDb
 --info
{code}
Stack trace:
{code:java}
[direct-runner-worker] ERROR hive.log - Got exception: 
java.lang.NullPointerException null[direct-runner-worker] ERROR hive.log - Got 
exception: java.lang.NullPointerException nulljava.lang.NullPointerException at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
 ~[hive-exec-2.1.0.jar:2.1.0] at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
 ~[hive-exec-2.1.0.jar:2.1.0] at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1126)
 [hive-metastore-2.1.0.jar:2.1.0] at 
org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:369)
 [hive-hcatalog-core-2.1.0.jar:2.1.0] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_222] at 

[jira] [Created] (BEAM-8407) [SQL] Some Hive tests throw NullPointerException, but get marked as passing (Direct Runner)

2019-10-15 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8407:
---

 Summary: [SQL] Some Hive tests throw NullPointerException, but get 
marked as passing (Direct Runner)
 Key: BEAM-8407
 URL: https://issues.apache.org/jira/browse/BEAM-8407
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Affects Versions: 2.15.0
Reporter: Kirill Kozlov


One of the tests with this issue:
{code:java}
./gradlew -p sdks/java/extensions/sql/ test --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.hcatalog.BeamSqlHiveSchemaTest.testSelectFromImplicitDefaultDb
 --info
{code}
Stack trace:
{code:java}
[direct-runner-worker] ERROR hive.log - Got exception: 
java.lang.NullPointerException null[direct-runner-worker] ERROR hive.log - Got 
exception: java.lang.NullPointerException nulljava.lang.NullPointerException at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
 ~[hive-exec-2.1.0.jar:2.1.0] at 
org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
 ~[hive-exec-2.1.0.jar:2.1.0] at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1126)
 [hive-metastore-2.1.0.jar:2.1.0] at 
org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:369)
 [hive-hcatalog-core-2.1.0.jar:2.1.0] at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_222] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_222] at java.lang.reflect.Method.invoke(Method.java:498) 
~[?:1.8.0_222] at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:150)
 [hive-metastore-2.1.0.jar:2.1.0] at com.sun.proxy.$Proxy56.isOpen(Unknown 
Source) [?:?] at 
org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:205) 
[hive-hcatalog-core-2.1.0.jar:2.1.0] at 
org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
 [hive-hcatalog-core-2.1.0.jar:2.1.0] at 
org.apache.beam.sdk.io.hcatalog.HCatalogUtils.createMetaStoreClient(HCatalogUtils.java:42)
 [beam-sdks-java-io-hcatalog-2.17.0-SNAPSHOT.jar:?] at 
org.apache.beam.sdk.io.hcatalog.HCatalogIO$BoundedHCatalogSource.getEstimatedSizeBytes(HCatalogIO.java:323)
 [beam-sdks-java-io-hcatalog-2.17.0-SNAPSHOT.jar:?] at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.startDynamicSplitThread(BoundedReadEvaluatorFactory.java:172)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:148)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] at 
org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] at 
org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
 [beam-runners-direct-java-2.17.0-SNAPSHOT-unshaded.jar:?] at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_222] at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_222] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_222] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_222] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222]

... more NullPointerExceptions{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328846
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:32
Start Date: 15/Oct/19 23:32
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335221391
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=328844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328844
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:24
Start Date: 15/Oct/19 23:24
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9765: [BEAM-8382] Add polling 
interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-542445144
 
 
   @aromanenko-dev If we're the only consumer on the stream we can call 
getRecords every 200ms without getting throttled (in theory). If we introduce a 
1 sec delay on the first KMSThrottlingException then we have an unnecessary 
800ms of latency. Let's say we reduce the initial delay to 100ms. Assuming a 
successful call takes about 10ms, we'll hit an exception after 50ms and for the 
next 950ms thereafter. On the 4th retry we'll succeed but we'll be at 800ms. 
Reduce it to 10ms and on the 7th retry we'll succeed but we'll be at 640ms. In 
any case we've introduced another knob (initial backoff delay time).
   
   So that's assuming delay time starts at zero and only increases when we get 
a KMSThrottlingException. Since that's 100% guaranteed to happen we could try 
with a non-zero initial delay instead. Maybe if we start at 200ms we won't get 
throttled at all and we won't overshoot. Another knob.
   
   OK so now we've overshot. Maybe we can ease back on the delay. We can 
speculatively try a shorter delay time and see if we still get throttled. How 
often to try? How much to ease back? More knobs.
   
   You can fiddle with these knobs and come up with something that works well 
when you're the only consumer, but as soon as you have 2 or more consumers you 
have to throw that out the window. If you're pulling back more or less records 
that may take more or less time and throw off your timing. You can't really 
hardcode anything because what works well in one scenario may not work well in 
another.
   
   So those are my initial thoughts. I'm totally open to the idea that I'm just 
overthinking it. I think the ultimate test would be to try it out and see what 
works. But if you want to minimize latency (and we do) I think the 1 second 
rolling window introduces a feedback delay that makes this a little more 
complicated than first glance. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328844)
Time Spent: 1h  (was: 50m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328842
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:19
Start Date: 15/Oct/19 23:19
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335218379
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -109,10 +137,11 @@ public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
   @Override
   public BeamCostModel beamComputeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
 NodeStats estimates = BeamSqlRelUtils.getNodeStats(this, mq);
-return BeamCostModel.FACTORY.makeCost(estimates.getRowCount(), 
estimates.getRate());
+return BeamCostModel.FACTORY.makeCost(
+estimates.getRowCount() * getRowType().getFieldCount(), 
estimates.getRate());
   }
 
 Review comment:
   Updated to multiplying total cost by `getRowType().getFieldCount()` stead of 
just `getRowCount`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328842)
Time Spent: 6.5h  (was: 6h 20m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328836
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:15
Start Date: 15/Oct/19 23:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335217549
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -94,7 +116,19 @@ public NodeStats estimateNodeStats(RelMetadataQuery mq) {
   "Should not have received input for %s: %s",
   BeamIOSourceRel.class.getSimpleName(),
   input);
-  return beamTable.buildIOReader(input.getPipeline().begin());
+
+  PBegin begin = input.getPipeline().begin();
 
 Review comment:
   Changed local variables to be `final`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328836)
Time Spent: 6h 20m  (was: 6h 10m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328831
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:15
Start Date: 15/Oct/19 23:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335217375
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328833=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328833
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:15
Start Date: 15/Oct/19 23:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335217402
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328834
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:15
Start Date: 15/Oct/19 23:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335217433
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
 
 Review comment:
   Done, moved before the first use.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328830
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:15
Start Date: 15/Oct/19 23:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335217355
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -157,16 +169,18 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 public PCollection buildIOReader(
 PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
   PCollection withAllFields = buildIOReader(begin);
-  if (fieldNames.isEmpty() && filters instanceof DefaultTableFilter) {
+  if (options == PushDownOptions.NONE) { // needed for testing purposes
 return withAllFields;
   }
 
 
 Review comment:
   Added checks that throw a `RuntimeException` when an invalid scenario is 
encountered.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328830)
Time Spent: 5h 40m  (was: 5.5h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328828
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:11
Start Date: 15/Oct/19 23:11
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335213030
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328829
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:11
Start Date: 15/Oct/19 23:11
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335213030
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328827
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 23:10
Start Date: 15/Oct/19 23:10
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335213030
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328820
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:56
Start Date: 15/Oct/19 22:56
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335213030
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Updated] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath updated BEAM-8367:

Fix Version/s: 2.17.0

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-15 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath updated BEAM-8367:

Priority: Blocker  (was: Major)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Blocker
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328819
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:47
Start Date: 15/Oct/19 22:47
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r335210632
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/artifact_service.py
 ##
 @@ -77,47 +88,131 @@ def PutArtifact(self, request_iterator, context=None):
 metadata = request.metadata.metadata
 retrieval_token = self.retrieval_token(
 request.metadata.staging_session_token)
-self._mkdir(retrieval_token)
-temp_path = filesystems.FileSystems.join(
-self._root,
-retrieval_token,
-'%x.tmp' % random.getrandbits(128))
-fout = filesystems.FileSystems.create(temp_path)
+artifact_path = self._artifact_path(retrieval_token, metadata.name)
+temp_path = self._temp_path(artifact_path)
+fout = self._open(temp_path, 'w')
 hasher = hashlib.sha256()
   else:
 hasher.update(request.data.data)
 fout.write(request.data.data)
 fout.close()
 data_hash = hasher.hexdigest()
 if metadata.sha256 and metadata.sha256 != data_hash:
-  filesystems.FileSystems.delete([temp_path])
+  self._delete(temp_path)
   raise ValueError('Bad metadata hash: %s vs %s' % (
-  metadata.metadata.sha256, data_hash))
-filesystems.FileSystems.rename(
-[temp_path], [self._artifact_path(retrieval_token, metadata.name)])
+  metadata.sha256, data_hash))
+self._rename(temp_path, artifact_path)
 return beam_artifact_api_pb2.PutArtifactResponse()
 
   def CommitManifest(self, request, context=None):
 retrieval_token = self.retrieval_token(request.staging_session_token)
-with filesystems.FileSystems.create(
-self._manifest_path(retrieval_token)) as fout:
-  fout.write(request.manifest.SerializeToString())
+proxy_manifest = beam_artifact_api_pb2.ProxyManifest(
+manifest=request.manifest,
+location=[
+beam_artifact_api_pb2.ProxyManifest.Location(
+name=metadata.name,
+uri=self._artifact_path(retrieval_token, metadata.name))
+for metadata in request.manifest.artifact])
+with self._open(self._manifest_path(retrieval_token), 'w') as fout:
+  fout.write(json_format.MessageToJson(proxy_manifest).encode('utf-8'))
 return beam_artifact_api_pb2.CommitManifestResponse(
 retrieval_token=retrieval_token)
 
   def GetManifest(self, request, context=None):
-with filesystems.FileSystems.open(
-self._manifest_path(request.retrieval_token)) as fin:
-  return beam_artifact_api_pb2.GetManifestResponse(
-  manifest=beam_artifact_api_pb2.Manifest.FromString(
-  fin.read()))
+return beam_artifact_api_pb2.GetManifestResponse(
+manifest=self._get_manifest_proxy(request.retrieval_token).manifest)
 
   def GetArtifact(self, request, context=None):
-with filesystems.FileSystems.open(
-self._artifact_path(request.retrieval_token, request.name)) as fin:
-  # This value is not emitted, but lets us yield a single empty
-  # chunk on an empty file.
-  chunk = True
-  while chunk:
-chunk = fin.read(self._chunk_size)
-yield beam_artifact_api_pb2.ArtifactChunk(data=chunk)
+for artifact in self._get_manifest_proxy(request.retrieval_token).location:
+  if artifact.name == request.name:
+with self._open(artifact.uri, 'r') as fin:
+  # This value is not emitted, but lets us yield a single empty
+  # chunk on an empty file.
+  chunk = True
+  while chunk:
+chunk = fin.read(self._chunk_size)
+yield beam_artifact_api_pb2.ArtifactChunk(data=chunk)
+break
+else:
+  raise ValueError('Unknown artifact: %s' % request.name)
+
+
+class ZipFileArtifactService(AbstractArtifactService):
+  """Stores artifacts in a zip file.
+
+  This is particularly useful for storing artifacts as part of an UberJar for
+  submitting to an upstream runner's cluster.
+
+  Writing to zip files requires Python 3.6+.
+  """
+
+  def __init__(self, path, chunk_size=None):
+if sys.version_info < (3, 6):
+  raise RuntimeError(
+  'Writing to zip files requires Python 3.6+, '
+  'but current version is %s' % sys.version)
 
 Review comment:
   Yeah. 3.5 is pretty old though, and 2.7 is EOL, so not worth implementing 
our own backport. 
 

This is an 

[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328818
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:44
Start Date: 15/Oct/19 22:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r335209883
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -29,7 +32,12 @@
 
 class FlinkRunner(portable_runner.PortableRunner):
   def default_job_server(self, options):
-return job_server.StopOnExitJobServer(FlinkJarJobServer(options))
+flink_master_url = options.view_as(FlinkRunnerOptions).flink_master_url
 
 Review comment:
   Sorry, I merged before refreshing and seeing your comments. 
https://github.com/apache/beam/pull/9803
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328818)
Time Spent: 3.5h  (was: 3h 20m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328817
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:44
Start Date: 15/Oct/19 22:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9803: [BEAM-8372] 
Follow-up to Flink UberJar submission.
URL: https://github.com/apache/beam/pull/9803
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Commented] (BEAM-8406) TextTable support JSON format

2019-10-15 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952349#comment-16952349
 ] 

Rui Wang commented on BEAM-8406:


Note that we have an implementation of JSON format over Pusbsub source: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubIOJsonTable.java

It could just be the similar implementation(or even reuse code) but just read 
from file this time.

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8403?focusedWorklogId=328810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328810
 ]

ASF GitHub Bot logged work on BEAM-8403:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:25
Start Date: 15/Oct/19 22:25
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9800: [BEAM-8403] Guard 
request id generation to prevent concurrent worker access
URL: https://github.com/apache/beam/pull/9800#issuecomment-542425967
 
 
   @aaltay ignoring the specific dependency, what is the generally recommended 
approach for such situations? Explicit locking or are there other/better 
options?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328810)
Time Spent: 1h  (was: 50m)

> Race condition in request id generation of GrpcStateRequestHandler
> --
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after 
> the recent changes to process append/clear state request asynchronously. The 
> race condition can occur if multiple Runner workers process a transform with 
> state requests with the same SDK Harness. For example, this setup occurs with 
> Flink when a TaskManager has multiple task slots and two or more of those 
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=328807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328807
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:11
Start Date: 15/Oct/19 22:11
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335200823
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,106 @@
+/*
 
 Review comment:
   Moved to model/interactive. Also, this file can work for Batch and 
Streaming. Technically, it will only be used for Batch pipelines because the 
interactive runner will be running streaming pipelines as Batch during re-run.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328807)
Time Spent: 2h 50m  (was: 2h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8403?focusedWorklogId=328806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328806
 ]

ASF GitHub Bot logged work on BEAM-8403:


Author: ASF GitHub Bot
Created on: 15/Oct/19 22:08
Start Date: 15/Oct/19 22:08
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9800: [BEAM-8403] Guard 
request id generation to prevent concurrent worker access
URL: https://github.com/apache/beam/pull/9800#issuecomment-542425967
 
 
   @aaltay ignoring the specific dependency, what is the generally recommended 
approach for such situations? Explicit locking or are there other options?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328806)
Time Spent: 50m  (was: 40m)

> Race condition in request id generation of GrpcStateRequestHandler
> --
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after 
> the recent changes to process append/clear state request asynchronously. The 
> race condition can occur if multiple Runner workers process a transform with 
> state requests with the same SDK Harness. For example, this setup occurs with 
> Flink when a TaskManager has multiple task slots and two or more of those 
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8172) Add a label field to FunctionSpec proto

2019-10-15 Thread Chad Dombrova (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952312#comment-16952312
 ] 

Chad Dombrova commented on BEAM-8172:
-

The goal of clarifying intent is a fine one, but I'm concerned that 
"debug_string" could result in developers using this field for large text 
blocks like stack traces, when the intent is a brief descriptor, like a 
human-readable ID.  

other ideas:  hint, descriptor, debug_label, debug_identifier

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8210) Python Integration tests: log test name

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952305#comment-16952305
 ] 

Kenneth Knowles commented on BEAM-8210:
---

Since currently it seems phrased as a Dataflow feature request, I added that 
component.

> Python Integration tests: log test name
> ---
>
> Key: BEAM-8210
> URL: https://issues.apache.org/jira/browse/BEAM-8210
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Major
>
> When creating a job (on any runner), log the originating test so it's easier 
> to debug.
> Postcommits frequently run tens of pipelines at a time and it's getting 
> harder to tell them apart.
> By logging I mean putting it somewhere in the job proto (such as in 
> parameter, job name, etc.). Using the worker logger on startup won't work if 
> the worker fails to start.
> Ideally you should be able to see the name in the runner UI (such as Dataflow 
> cloud console).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8210) Python Integration tests: log test name

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952304#comment-16952304
 ] 

Kenneth Knowles commented on BEAM-8210:
---

Seems nice to instrument this in the TestPipeline so any runner that hooks into 
logging will get it.

> Python Integration tests: log test name
> ---
>
> Key: BEAM-8210
> URL: https://issues.apache.org/jira/browse/BEAM-8210
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Udi Meiri
>Priority: Major
>
> When creating a job (on any runner), log the originating test so it's easier 
> to debug.
> Postcommits frequently run tens of pipelines at a time and it's getting 
> harder to tell them apart.
> By logging I mean putting it somewhere in the job proto (such as in 
> parameter, job name, etc.). Using the worker logger on startup won't work if 
> the worker fails to start.
> Ideally you should be able to see the name in the runner UI (such as Dataflow 
> cloud console).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8210) Python Integration tests: log test name

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8210:
--
Component/s: sdk-py-core
 runner-dataflow

> Python Integration tests: log test name
> ---
>
> Key: BEAM-8210
> URL: https://issues.apache.org/jira/browse/BEAM-8210
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Major
>
> When creating a job (on any runner), log the originating test so it's easier 
> to debug.
> Postcommits frequently run tens of pipelines at a time and it's getting 
> harder to tell them apart.
> By logging I mean putting it somewhere in the job proto (such as in 
> parameter, job name, etc.). Using the worker logger on startup won't work if 
> the worker fails to start.
> Ideally you should be able to see the name in the runner UI (such as Dataflow 
> cloud console).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8210) Python Integration tests: log test name

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8210:
--
Status: Open  (was: Triage Needed)

> Python Integration tests: log test name
> ---
>
> Key: BEAM-8210
> URL: https://issues.apache.org/jira/browse/BEAM-8210
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Udi Meiri
>Priority: Major
>
> When creating a job (on any runner), log the originating test so it's easier 
> to debug.
> Postcommits frequently run tens of pipelines at a time and it's getting 
> harder to tell them apart.
> By logging I mean putting it somewhere in the job proto (such as in 
> parameter, job name, etc.). Using the worker logger on startup won't work if 
> the worker fails to start.
> Ideally you should be able to see the name in the runner UI (such as Dataflow 
> cloud console).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8122) Py2 version of getcallargs_forhints always returns generic hint for varargs

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8122:
--
Status: Open  (was: Triage Needed)

> Py2 version of getcallargs_forhints always returns generic hint for varargs
> ---
>
> Key: BEAM-8122
> URL: https://issues.apache.org/jira/browse/BEAM-8122
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Minor
>
> The hint variadic positional arguments (*args) is always Tuple[Any, ...].
> Minor priority since Py2 is being phased out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8124) Clean tests after Python 2 deprecation

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8124:
--
Status: Open  (was: Triage Needed)

> Clean tests after Python 2 deprecation
> --
>
> Key: BEAM-8124
> URL: https://issues.apache.org/jira/browse/BEAM-8124
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: David Cavazos
>Priority: Minor
>
> Clean up or simplify tests that have special handling for Python 2 after its 
> deprecation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8177) BigQueryAvroUtils unable to convert field with record

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8177:
--
Status: Open  (was: Triage Needed)

> BigQueryAvroUtils unable to convert field with record 
> --
>
> Key: BEAM-8177
> URL: https://issues.apache.org/jira/browse/BEAM-8177
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Zaka Zaidan Azminur
>Priority: Major
>
> I'm trying to create a simple test pipeline that export BigQuery as Parquet 
> using BigQueryAvroUtils.java from Beam's code.
> When trying to read the BigQuery data and read it as Avro Generic Record, 
> somehow the code failed when trying to read the data with this exception
> {code:java}
> org.apache.avro.UnresolvedUnionException: Not in union 
> ["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
> for 
> record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
>  {"key_2": "asdasd", "key_1": 123123.123}
> {code}
> I have checked the Avro schema and it's the same with its BigQuery schema 
> counterpart.
> Then I tried to export the BigQuery table using BigQuery console as Avro and 
> compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
> out there's some difference at the Avro namespace between 
> BigQueryAvroUtils.java and from BigQuery export.
> After I tried to patch the BigQueryAvroUtils.java to make the schema result 
> the same with the schema from BigQuery export then the exception went away.
> So, I want to confirm whether there's problem in my implementation or 
> BigQuery create a slightly different Avro schema
> I've created a simple code along with the patch and data sample here 
> [https://github.com/zakazai/bq-to-parquet]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8177) BigQueryAvroUtils unable to convert field with record

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952303#comment-16952303
 ] 

Kenneth Knowles commented on BEAM-8177:
---

[~chamikara] any ideas about this?

> BigQueryAvroUtils unable to convert field with record 
> --
>
> Key: BEAM-8177
> URL: https://issues.apache.org/jira/browse/BEAM-8177
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Zaka Zaidan Azminur
>Priority: Major
>
> I'm trying to create a simple test pipeline that export BigQuery as Parquet 
> using BigQueryAvroUtils.java from Beam's code.
> When trying to read the BigQuery data and read it as Avro Generic Record, 
> somehow the code failed when trying to read the data with this exception
> {code:java}
> org.apache.avro.UnresolvedUnionException: Not in union 
> ["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
> for 
> record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
>  {"key_2": "asdasd", "key_1": 123123.123}
> {code}
> I have checked the Avro schema and it's the same with its BigQuery schema 
> counterpart.
> Then I tried to export the BigQuery table using BigQuery console as Avro and 
> compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
> out there's some difference at the Avro namespace between 
> BigQueryAvroUtils.java and from BigQuery export.
> After I tried to patch the BigQueryAvroUtils.java to make the schema result 
> the same with the schema from BigQuery export then the exception went away.
> So, I want to confirm whether there's problem in my implementation or 
> BigQuery create a slightly different Avro schema
> I've created a simple code along with the patch and data sample here 
> [https://github.com/zakazai/bq-to-parquet]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8177) BigQueryAvroUtils unable to convert field with record

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8177:
--
Priority: Major  (was: Trivial)

> BigQueryAvroUtils unable to convert field with record 
> --
>
> Key: BEAM-8177
> URL: https://issues.apache.org/jira/browse/BEAM-8177
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Zaka Zaidan Azminur
>Priority: Major
>
> I'm trying to create a simple test pipeline that export BigQuery as Parquet 
> using BigQueryAvroUtils.java from Beam's code.
> When trying to read the BigQuery data and read it as Avro Generic Record, 
> somehow the code failed when trying to read the data with this exception
> {code:java}
> org.apache.avro.UnresolvedUnionException: Not in union 
> ["null",{"type":"record","name":"record","namespace":"Translated Avro Schema 
> for 
> record","doc":"org.apache.beam.sdk.io.gcp.bigquery","fields":[{"name":"key_2","type":["null","string"]},{"name":"key_1","type":["null","double"]}]}]:
>  {"key_2": "asdasd", "key_1": 123123.123}
> {code}
> I have checked the Avro schema and it's the same with its BigQuery schema 
> counterpart.
> Then I tried to export the BigQuery table using BigQuery console as Avro and 
> compare its schema with the one generated from BigQueryAvroUtils.java. Turns 
> out there's some difference at the Avro namespace between 
> BigQueryAvroUtils.java and from BigQuery export.
> After I tried to patch the BigQueryAvroUtils.java to make the schema result 
> the same with the schema from BigQuery export then the exception went away.
> So, I want to confirm whether there's problem in my implementation or 
> BigQuery create a slightly different Avro schema
> I've created a simple code along with the patch and data sample here 
> [https://github.com/zakazai/bq-to-parquet]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8179) No tabs in dot output

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8179:
--
Status: Open  (was: Triage Needed)

> No tabs in dot output
> -
>
> Key: BEAM-8179
> URL: https://issues.apache.org/jira/browse/BEAM-8179
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-go
>Reporter: Dominic Mitchell
>Priority: Trivial
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Presently the output from the dot package has tabs in it.  This is 
> inconsistent, which is annoying, and also causes me checkin problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8145) Pubsub message size limit not taking size increase from base64 encoding into account

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952302#comment-16952302
 ] 

Kenneth Knowles commented on BEAM-8145:
---

[~Primevenn] [~chamikara] is this fixed?

> Pubsub message size limit not taking size increase from base64 encoding into 
> account
> 
>
> Key: BEAM-8145
> URL: https://issues.apache.org/jira/browse/BEAM-8145
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Michael Yzvenn Wolanski
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In the PubSubIO, the default max batch size is set to `10 * 1024 * 1024` 
> bytes. This however does not take into account the size increase of base64 
> encoding the messages after the flush. Base64 encodes each set of three bytes 
> into four bytes.
> Therefore the 'true' size limit placed on the unencoded batch should be
> x = ((10 * 1024 * 1024) / 4) * 3 = 7864320 bytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8145) Pubsub message size limit not taking size increase from base64 encoding into account

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8145:
--
Status: Open  (was: Triage Needed)

> Pubsub message size limit not taking size increase from base64 encoding into 
> account
> 
>
> Key: BEAM-8145
> URL: https://issues.apache.org/jira/browse/BEAM-8145
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Michael Yzvenn Wolanski
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In the PubSubIO, the default max batch size is set to `10 * 1024 * 1024` 
> bytes. This however does not take into account the size increase of base64 
> encoding the messages after the flush. Base64 encodes each set of three bytes 
> into four bytes.
> Therefore the 'true' size limit placed on the unencoded batch should be
> x = ((10 * 1024 * 1024) / 4) * 3 = 7864320 bytes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8172) Add a label field to FunctionSpec proto

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952300#comment-16952300
 ] 

Kenneth Knowles commented on BEAM-8172:
---

I'd suggest "debug_string" to make it clear that there is no semantic content.

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8153) PubSubIntegrationTest failing in post-commit

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8153:
--
Status: Open  (was: Triage Needed)

> PubSubIntegrationTest failing in post-commit
> 
>
> Key: BEAM-8153
> URL: https://issues.apache.org/jira/browse/BEAM-8153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Matthew Darwin
>Priority: Major
>
> Most likely due to: https://github.com/apache/beam/pull/9232
> {code}
> 11:44:31 
> ==
> 11:44:31 ERROR: test_streaming_with_attributes 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> 11:44:31 
> --
> 11:44:31 Traceback (most recent call last):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 199, in test_streaming_with_attributes
> 11:44:31 self._test_streaming(with_attributes=True)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 191, in _test_streaming
> 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
> 11:44:31 result = p.run()
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 420, in run
> 11:44:31 return self.runner.run_pipeline(self, self._options)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> 11:44:31 if not matcher.matches(actual):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py",
>  line 16, in matches
> 11:44:31 if not matcher.matches(item):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> 11:44:31 match_result = self._matches(item)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py",
>  line 91, in _matches
> 11:44:31 return Counter(self.messages) == Counter(self.expected_msg)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 566, in __init__
> 11:44:31 self.update(*args, **kwds)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 653, in update
> 11:44:31 _count_elements(self, iterable)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py",
>  line 83, in __hash__
> 11:44:31 self.message_id, self.publish_time.seconds,
> 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8153) PubSubIntegrationTest failing in post-commit

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952301#comment-16952301
 ] 

Kenneth Knowles commented on BEAM-8153:
---

Do we  need to keep the bug open?

> PubSubIntegrationTest failing in post-commit
> 
>
> Key: BEAM-8153
> URL: https://issues.apache.org/jira/browse/BEAM-8153
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Udi Meiri
>Assignee: Matthew Darwin
>Priority: Major
>
> Most likely due to: https://github.com/apache/beam/pull/9232
> {code}
> 11:44:31 
> ==
> 11:44:31 ERROR: test_streaming_with_attributes 
> (apache_beam.io.gcp.pubsub_integration_test.PubSubIntegrationTest)
> 11:44:31 
> --
> 11:44:31 Traceback (most recent call last):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 199, in test_streaming_with_attributes
> 11:44:31 self._test_streaming(with_attributes=True)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_integration_test.py",
>  line 191, in _test_streaming
> 11:44:31 timestamp_attribute=self.TIMESTAMP_ATTRIBUTE)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub_it_pipeline.py",
>  line 91, in run_pipeline
> 11:44:31 result = p.run()
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/pipeline.py",
>  line 420, in run
> 11:44:31 return self.runner.run_pipeline(self, self._options)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
>  line 51, in run_pipeline
> 11:44:31 hc_assert_that(self.result, pickler.loads(on_success_matcher))
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 43, in assert_that
> 11:44:31 _assert_match(actual=arg1, matcher=arg2, reason=arg3)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/assert_that.py",
>  line 49, in _assert_match
> 11:44:31 if not matcher.matches(actual):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/core/allof.py",
>  line 16, in matches
> 11:44:31 if not matcher.matches(item):
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/site-packages/hamcrest/core/base_matcher.py",
>  line 28, in matches
> 11:44:31 match_result = self._matches(item)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/tests/pubsub_matcher.py",
>  line 91, in _matches
> 11:44:31 return Counter(self.messages) == Counter(self.expected_msg)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 566, in __init__
> 11:44:31 self.update(*args, **kwds)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/1398941891/lib/python3.7/collections/__init__.py",
>  line 653, in update
> 11:44:31 _count_elements(self, iterable)
> 11:44:31   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/pubsub.py",
>  line 83, in __hash__
> 11:44:31 self.message_id, self.publish_time.seconds,
> 11:44:31 AttributeError: 'NoneType' object has no attribute 'seconds'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8172) Add a label field to FunctionSpec proto

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8172:
--
Status: Open  (was: Triage Needed)

> Add a label field to FunctionSpec proto
> ---
>
> Key: BEAM-8172
> URL: https://issues.apache.org/jira/browse/BEAM-8172
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>
> The FunctionSpec payload is opaque outside of the native environment, which 
> can make debugging pipeline protos difficult.  It would be very useful if the 
> FunctionSpec had an optional human readable "label", for debugging and 
> presenting in UIs and error messages.
> For example, in python, if the payload is an instance of a class, we could 
> attempt to provide a string that represents the dotted path to that class, 
> "mymodule.MyClass".  In the case of coders, we could use the label to hold 
> the type hint:  "Optional[Iterable[mymodule.MyClass]]".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8173) Filesystems.matchSingleFileSpec throws away the actual failure exception

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952299#comment-16952299
 ] 

Kenneth Knowles commented on BEAM-8173:
---

[~chamikara] FYI filed this as a starter bug in code you may be interested in

> Filesystems.matchSingleFileSpec throws away the actual failure exception
> 
>
> Key: BEAM-8173
> URL: https://issues.apache.org/jira/browse/BEAM-8173
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, starter
>
> At [1] the result of a non-OK match causes an exception to be thrown. But the 
> exception does not include the actual cause of the failure, so it cannot be 
> efficiently debugged. It appears that the design of MatchResult is that it 
> should call metadata() without bothering to check status, so that the 
> underlying exception can be re-raised and caught and put in the chain of 
> causes, as it should be.
> [1] 
> https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L190



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8173) Filesystems.matchSingleFileSpec throws away the actual failure exception

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8173:
--
Status: Open  (was: Triage Needed)

> Filesystems.matchSingleFileSpec throws away the actual failure exception
> 
>
> Key: BEAM-8173
> URL: https://issues.apache.org/jira/browse/BEAM-8173
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Kenneth Knowles
>Priority: Major
>  Labels: easyfix, starter
>
> At [1] the result of a non-OK match causes an exception to be thrown. But the 
> exception does not include the actual cause of the failure, so it cannot be 
> efficiently debugged. It appears that the design of MatchResult is that it 
> should call metadata() without bothering to check status, so that the 
> underlying exception can be re-raised and caught and put in the chain of 
> causes, as it should be.
> [1] 
> https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java#L190



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8211) Set a default wait_until_finish_duration?

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952298#comment-16952298
 ] 

Kenneth Knowles commented on BEAM-8211:
---

Tagged Python here, but sounds like it could be useful more broadly.

> Set a default wait_until_finish_duration?
> -
>
> Key: BEAM-8211
> URL: https://issues.apache.org/jira/browse/BEAM-8211
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Major
>
> This would benefit python ITs.
> If this value is not set, tests with workers failing to start up might take a 
> full hour, causing post-commits to time out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8360) Failure in https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8360:
--
Status: Open  (was: Triage Needed)

> Failure in 
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> 
>
> Key: BEAM-8360
> URL: https://issues.apache.org/jira/browse/BEAM-8360
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Priority: Critical
>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> {code}
> 10:09:10   ERROR: Could not find a version that satisfies the requirement 
> enum34>=1.0.4; python_version < "3.4" (from 
> grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)
> 10:09:10 ERROR: No matching distribution found for enum34>=1.0.4; 
> python_version < "3.4" (from grpcio>=1.3.5->grpcio-tools==1.3.5)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8211) Set a default wait_until_finish_duration?

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8211:
--
Component/s: sdk-py-core

> Set a default wait_until_finish_duration?
> -
>
> Key: BEAM-8211
> URL: https://issues.apache.org/jira/browse/BEAM-8211
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Major
>
> This would benefit python ITs.
> If this value is not set, tests with workers failing to start up might take a 
> full hour, causing post-commits to time out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8216) GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8216:
--
Status: Open  (was: Triage Needed)

> GCS IO fails with uninformative 'Broken pipe' errors while attempting to 
> write to a GCS bucket without proper permissions.
> --
>
> Key: BEAM-8216
> URL: https://issues.apache.org/jira/browse/BEAM-8216
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Obvserved while executing a wordcount IT pipeline:
> {noformat}
>  ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
> -Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
> -Dattr=IT 
> -DpipelineOptions="--project=some_project_different_from_apache_beam_testing \
> --staging_location=gs://some_bucket/ \
> --temp_location=gs://some_bucket/ \
> --input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.* 
> \
> --output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output  \
> --expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \
> --num_workers=10 \
> --autoscaling_algorithm=NONE \
> --runner=TestDataflowRunner \
> --sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz"
>  \
> --info  
> {noformat}
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a 
> different project than was running the pipeline.
> This caused a bunch of Broken pipe errors. Console logs:
> {noformat}
> root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation 
> read/Read+split+pair_with_one+group/Reify+group/Write
> root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation 
> group/Close
> root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation 
> group/Close
> root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation 
> group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
> root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most 
> recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", 
> line 1042, in process
> self.writer.write(element)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", 
> line 393, in write
> self.sink.write_record(self.temp_handle, value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", 
> line 137, in write_record
> self.write_encoded_record(file_handle, self.coder.encode(value))
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", 
> line 407, in write_encoded_record
> file_handle.write(encoded_value)
>   File 
> "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line 
> 202, in write
> self._uploader.put(b)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", 
> line 594, in put
> self._conn.send_bytes(data.tobytes())
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in 
> send_bytes
> self._send_bytes(m[offset:offset + size])
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in 
> _send_bytes
> self._send(header)
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in 
> _send
> n = write(self._handle, buf)
> BrokenPipeError: [Errno 32] Broken pipe
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> ...
> root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure 
> step failure25
> root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. 
> Causes: 
> S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
>  failed., The job failed because a work item has failed 4 times. Look in 
> previous log 

[jira] [Updated] (BEAM-8211) Set a default wait_until_finish_duration?

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8211:
--
Status: Open  (was: Triage Needed)

> Set a default wait_until_finish_duration?
> -
>
> Key: BEAM-8211
> URL: https://issues.apache.org/jira/browse/BEAM-8211
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Major
>
> This would benefit python ITs.
> If this value is not set, tests with workers failing to start up might take a 
> full hour, causing post-commits to time out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8360) Failure in https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-8360.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Failure in 
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> 
>
> Key: BEAM-8360
> URL: https://issues.apache.org/jira/browse/BEAM-8360
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Priority: Critical
> Fix For: Not applicable
>
>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> {code}
> 10:09:10   ERROR: Could not find a version that satisfies the requirement 
> enum34>=1.0.4; python_version < "3.4" (from 
> grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)
> 10:09:10 ERROR: No matching distribution found for enum34>=1.0.4; 
> python_version < "3.4" (from grpcio>=1.3.5->grpcio-tools==1.3.5)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8403) Race condition in request id generation of GrpcStateRequestHandler

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8403?focusedWorklogId=328793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328793
 ]

ASF GitHub Bot logged work on BEAM-8403:


Author: ASF GitHub Bot
Created on: 15/Oct/19 21:17
Start Date: 15/Oct/19 21:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9800: [BEAM-8403] Guard 
request id generation to prevent concurrent worker access
URL: https://github.com/apache/beam/pull/9800#issuecomment-542409016
 
 
   Change LGTM.
   
   I would suggest against using the atomiclong dependency, it is stale and 
marked by the owner as do not use this.
   
   Precommit failed for an unrelated issue. Related to BEAM-8194 and pypi being 
flaky. That would be addressed separately.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328793)
Time Spent: 40m  (was: 0.5h)

> Race condition in request id generation of GrpcStateRequestHandler
> --
>
> Key: BEAM-8403
> URL: https://issues.apache.org/jira/browse/BEAM-8403
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There is a race condition in {{GrpcStateRequestHandler}} which surfaced after 
> the recent changes to process append/clear state request asynchronously. The 
> race condition can occur if multiple Runner workers process a transform with 
> state requests with the same SDK Harness. For example, this setup occurs with 
> Flink when a TaskManager has multiple task slots and two or more of those 
> slots process the same stateful stage against an SDK Harness.
> CC [~robertwb]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8166) Support Graceful shutdown of worker harness.

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952296#comment-16952296
 ] 

Kenneth Knowles commented on BEAM-8166:
---

Also I'm confused about what is the worker harness versus the SDK worker 
harness versus the runner side.

> Support Graceful shutdown of worker harness.
> 
>
> Key: BEAM-8166
> URL: https://issues.apache.org/jira/browse/BEAM-8166
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-go
>Reporter: Robert Burke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Ideally there should be a clear Shutdown control RPC a runner can send a 
> worker harness to trigger an orderly shutdown.
> Absent that, errors on the runner side shouldn't manifest as SDK worker 
> harness errors. SDKs should log, and gracefully shutdown from GRPC errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8360) Failure in https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8360:
--
Labels:   (was: currently-failing)

> Failure in 
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> 
>
> Key: BEAM-8360
> URL: https://issues.apache.org/jira/browse/BEAM-8360
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core, test-failures
>Reporter: Kenneth Knowles
>Priority: Critical
>
> https://builds.apache.org/job/beam_PreCommit_PythonLint_Commit/58/console
> {code}
> 10:09:10   ERROR: Could not find a version that satisfies the requirement 
> enum34>=1.0.4; python_version < "3.4" (from 
> grpcio>=1.3.5->grpcio-tools==1.3.5) (from versions: none)
> 10:09:10 ERROR: No matching distribution found for enum34>=1.0.4; 
> python_version < "3.4" (from grpcio>=1.3.5->grpcio-tools==1.3.5)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8140) Python API: PTransform should be immutable

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8140:
--
Issue Type: Bug  (was: Improvement)

> Python API: PTransform should be immutable
> --
>
> Key: BEAM-8140
> URL: https://issues.apache.org/jira/browse/BEAM-8140
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chris Suchandk
>Assignee: Robert Bradshaw
>Priority: Major
>
> While the Java API seems fine the Python API is (at least) counterintuitive.
> Let's see the following example:
> {code:python}
> p1 = beam.Pipeline()
> p2 = beam.Pipeline()
> node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
> p1 | node 
> p2 | node //fails here {code}
> The code above will fail because the _node_ somehow remembers that it was 
> already attached to _p1_. In fact, unlike in Java, the | (apply) method is 
> defined on the _PTransform_.
> If any, only the pipeline object should be mutable here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8140) Python API: PTransform should be immutable

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8140:
--
Status: Open  (was: Triage Needed)

> Python API: PTransform should be immutable
> --
>
> Key: BEAM-8140
> URL: https://issues.apache.org/jira/browse/BEAM-8140
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chris Suchandk
>Assignee: Robert Bradshaw
>Priority: Major
>
> While the Java API seems fine the Python API is (at least) counterintuitive.
> Let's see the following example:
> {code:python}
> p1 = beam.Pipeline()
> p2 = beam.Pipeline()
> node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
> p1 | node 
> p2 | node //fails here {code}
> The code above will fail because the _node_ somehow remembers that it was 
> already attached to _p1_. In fact, unlike in Java, the | (apply) method is 
> defined on the _PTransform_.
> If any, only the pipeline object should be mutable here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8140) Python API: PTransform should be immutable and reusable

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8140:
--
Summary: Python API: PTransform should be immutable and reusable  (was: 
Python API: PTransform should be immutable)

> Python API: PTransform should be immutable and reusable
> ---
>
> Key: BEAM-8140
> URL: https://issues.apache.org/jira/browse/BEAM-8140
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chris Suchandk
>Assignee: Robert Bradshaw
>Priority: Major
>
> While the Java API seems fine the Python API is (at least) counterintuitive.
> Let's see the following example:
> {code:python}
> p1 = beam.Pipeline()
> p2 = beam.Pipeline()
> node = 'ReadTrainData' >> beam.io.ReadFromText("/tmp/aaa.txt")
> p1 | node 
> p2 | node //fails here {code}
> The code above will fail because the _node_ somehow remembers that it was 
> already attached to _p1_. In fact, unlike in Java, the | (apply) method is 
> defined on the _PTransform_.
> If any, only the pipeline object should be mutable here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8395) JavaBeamZetaSQL PreCommit is flaky

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8395:
--
Status: Open  (was: Triage Needed)

> JavaBeamZetaSQL PreCommit is flaky
> --
>
> Key: BEAM-8395
> URL: https://issues.apache.org/jira/browse/BEAM-8395
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Kirill Kozlov
>Priority: Major
>  Labels: flake
>
> Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
> {code:java}
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testExceptAll FAILED
> 10:28:56 java.lang.NoSuchMethodError at ZetaSQLDialectSpecTest.java:2553
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testZetaSQLStructFieldAccessInTumble FAILED
> 10:28:56 java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272
> ... More of java.lang.NoClassDefFoundError{code}
> Jenkins that failed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]
> Jenkins that passed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]
>  
> Stack trace:
> {code:java}
> java.lang.NoSuchMethodError: 
> com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor;Close
>  stack trace
> at 
> com.google.zetasql.functions.ZetaSQLDateTime.(ZetaSQLDateTime.java:508)
> at 
> com.google.zetasql.functions.ZetaSQLDateTime$DateTimestampPart.getDescriptor(ZetaSQLDateTime.java:460)
> at 
> com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:377)
> at 
> com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:152)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:136)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:281)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:136)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest.testExceptAll(ZetaSQLDialectSpecTest.java:2553){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8395) JavaBeamZetaSQL PreCommit is flaky

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8395:
--
Labels: flake  (was: )

> JavaBeamZetaSQL PreCommit is flaky
> --
>
> Key: BEAM-8395
> URL: https://issues.apache.org/jira/browse/BEAM-8395
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Kirill Kozlov
>Priority: Major
>  Labels: flake
>
> Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
> {code:java}
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testExceptAll FAILED
> 10:28:56 java.lang.NoSuchMethodError at ZetaSQLDialectSpecTest.java:2553
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testZetaSQLStructFieldAccessInTumble FAILED
> 10:28:56 java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272
> ... More of java.lang.NoClassDefFoundError{code}
> Jenkins that failed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]
> Jenkins that passed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]
>  
> Stack trace:
> {code:java}
> java.lang.NoSuchMethodError: 
> com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor;Close
>  stack trace
> at 
> com.google.zetasql.functions.ZetaSQLDateTime.(ZetaSQLDateTime.java:508)
> at 
> com.google.zetasql.functions.ZetaSQLDateTime$DateTimestampPart.getDescriptor(ZetaSQLDateTime.java:460)
> at 
> com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:377)
> at 
> com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:152)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:136)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:281)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:136)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest.testExceptAll(ZetaSQLDialectSpecTest.java:2553){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8379) Cache Eviction for Interactive Beam

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8379:
--
Status: Open  (was: Triage Needed)

> Cache Eviction for Interactive Beam
> ---
>
> Key: BEAM-8379
> URL: https://issues.apache.org/jira/browse/BEAM-8379
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>
> Evicts cache created by Interactive Beam when an IPython kernel is restarted 
> or terminated to release the resource usage that is no longer needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8393) Java BigQueryIO clustering support breaks on multiple partitions

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8393:
--
Status: Open  (was: Triage Needed)

> Java BigQueryIO clustering support breaks on multiple partitions
> 
>
> Key: BEAM-8393
> URL: https://issues.apache.org/jira/browse/BEAM-8393
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Jeff Klukas
>Assignee: Jeff Klukas
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Support for writing to clustered tables in BigQuery was added in 2.15, which 
> involved adding a new optional clustering field to TableDestination. 
> Clustering support is working for most cases, but fails with errors about 
> incompatible partitioning specifications for any data that is handled by the 
> MultiplePartitions branch of BigQueryIO logic.
> There is a case in that code path where we provide a modified 
> TableDestination and neglect to copy the clustering definition, so the final 
> load job does not include any clustering columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8396) Default to LOOPBACK mode for local flink (spark, ...) runner.

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8396:
--
Status: Open  (was: Triage Needed)

> Default to LOOPBACK mode for local flink (spark, ...) runner.
> -
>
> Key: BEAM-8396
> URL: https://issues.apache.org/jira/browse/BEAM-8396
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Priority: Major
>
> As well as being lower overhead, this will avoid surprises about workers 
> operating within the docker filesystem, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=328790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328790
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 15/Oct/19 21:05
Start Date: 15/Oct/19 21:05
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335177110
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.fn_execution.v1;
+
+option go_package = "fnexecution_v1";
+option java_package = "org.apache.beam.model.fnexecution.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+service InteractiveService {
+
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+
+  rpc Start (StartRequest) returns (StartResponse) {}
 
 Review comment:
   We need a GRPC to simultaneously allow for streaming of events to the 
TestStream and for user interactivity in a portable way. When the TestStream is 
implemented in the FnApiRunner, there will be a hard container boundary to both 
the runner and worker that needs to be crossed with some IPC. GRPC is the best 
way for that, currently.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328790)
Time Spent: 2h 40m  (was: 2.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=328788=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328788
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 15/Oct/19 21:03
Start Date: 15/Oct/19 21:03
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r335176456
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.fn_execution.v1;
+
+option go_package = "fnexecution_v1";
+option java_package = "org.apache.beam.model.fnexecution.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+service InteractiveService {
 
 Review comment:
   The Stop request acts as the Reset. Added comments to make this more clear.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328788)
Time Spent: 2.5h  (was: 2h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7981) ParDo function wrapper doesn't support Iterable output types

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7981?focusedWorklogId=328786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328786
 ]

ASF GitHub Bot logged work on BEAM-7981:


Author: ASF GitHub Bot
Created on: 15/Oct/19 21:02
Start Date: 15/Oct/19 21:02
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9708: [BEAM-7981] Fix double 
iterable stripping
URL: https://github.com/apache/beam/pull/9708#issuecomment-542403371
 
 
   R: @robertwb @aaltay
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328786)
Time Spent: 0.5h  (was: 20m)

> ParDo function wrapper doesn't support Iterable output types
> 
>
> Key: BEAM-7981
> URL: https://issues.apache.org/jira/browse/BEAM-7981
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I believe the bug is in CallableWrapperDoFn.default_type_hints, which 
> converts Iterable[str] to str.
> This test will be included (commented out) in 
> https://github.com/apache/beam/pull/9283
> {code}
>   def test_typed_callable_iterable_output(self):
> @typehints.with_input_types(int)
> @typehints.with_output_types(typehints.Iterable[str])
> def do_fn(element):
>   return [[str(element)] * 2]
> result = [1, 2] | beam.ParDo(do_fn)
> self.assertEqual([['1', '1'], ['2', '2']], sorted(result))
> {code}
> Result:
> {code}
> ==
> ERROR: test_typed_callable_iterable_output 
> (apache_beam.typehints.typed_pipeline_test.MainInputTest)
> --
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/typehints/typed_pipeline_test.py",
>  line 104, in test_typed_callable_iterable_output
> result = [1, 2] | beam.ParDo(do_fn)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 519, in __ror__
> p.run().wait_until_finish()
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 406, in run
> self._options).run(False)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pipeline.py", 
> line 419, in run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 129, in run_pipeline
> return runner.run_pipeline(pipeline, options)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 366, in run_pipeline
> default_environment=self._default_environment))
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 373, in run_via_runner_api
> return self.run_stages(stage_context, stages)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 455, in run_stages
> stage_context.safe_coders)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 733, in _run_stage
> result, splits = bundle_manager.process_bundle(data_input, data_output)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in process_bundle
> part, expected_outputs), part_inputs):
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 586, in 
> result_iterator
> yield fs.pop().result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 432, in result
> return self.__get_result()
>   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in 
> __get_result
> raise self._exception
>   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1663, in 
> part, expected_outputs), part_inputs):
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1601, in process_bundle
> 

[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=328755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328755
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 15/Oct/19 19:10
Start Date: 15/Oct/19 19:10
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py job request
URL: https://github.com/apache/beam/pull/9789#discussion_r335128391
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -296,7 +297,8 @@ def add_runner_options(parser):
 
 prepare_response = job_service.Prepare(
 beam_job_api_pb2.PrepareJobRequest(
-job_name='job', pipeline=proto_pipeline,
+job_name=options.view_as(GoogleCloudOptions).job_name or 'job',
 
 Review comment:
   `PipelineOptions` does not allow the same option to belong to multiple 
subclasses.
   
   This PR changes `job_name` in the job request, not the pipeline options. 
e.g. here 
https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/InMemoryJobService.java#L124
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328755)
Time Spent: 1h 50m  (was: 1h 40m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4087) Gradle build does not allow to overwrite versions of provided dependencies

2019-10-15 Thread Michael Luckey (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Luckey reassigned BEAM-4087:


Assignee: (was: Michael Luckey)

> Gradle build does not allow to overwrite versions of provided dependencies
> --
>
> Key: BEAM-4087
> URL: https://issues.apache.org/jira/browse/BEAM-4087
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Affects Versions: 2.5.0
>Reporter: Ismaël Mejía
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In order to test modules with provided dependencies in maven we can execute 
> for example for Kafka `mvn verify -Prelease -Dkafka.clients.version=0.9.0.1 
> -pl 'sdks/java/io/kafka'` However we don't have an equivalent way to do this 
> with gradle because the version of the dependencies are defined locally and 
> not in the gradle.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328727=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328727
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335101512
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328726=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328726
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335099291
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
 
 Review comment:
   nit: This could be somewhat expensive, I'd move it to just before the first 
use. (Same with `calcInputRowType` and `relBuilder`.)
 

This is an automated message from the 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328729=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328729
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335102598
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328730
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335087133
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -94,7 +116,19 @@ public NodeStats estimateNodeStats(RelMetadataQuery mq) {
   "Should not have received input for %s: %s",
   BeamIOSourceRel.class.getSimpleName(),
   input);
-  return beamTable.buildIOReader(input.getPipeline().begin());
+
+  PBegin begin = input.getPipeline().begin();
 
 Review comment:
   This is probably worth making `final`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328730)
Time Spent: 4h 50m  (was: 4h 40m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328731
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335099406
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor;
+import org.apache.beam.sdk.schemas.FieldAccessDescriptor.FieldDescriptor;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.SelectHelpers;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRule;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.RelFactories;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataType;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelDataTypeField;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.type.RelRecordType;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexCall;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexInputRef;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilder;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.tools.RelBuilderFactory;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.util.Pair;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.annotations.VisibleForTesting;
+
+public class BeamIOPushDownRule extends RelOptRule {
+  // ~ Static fields/initializers -
+
+  public static final BeamIOPushDownRule INSTANCE =
+  new BeamIOPushDownRule(RelFactories.LOGICAL_BUILDER);
+
+  // ~ Constructors ---
+
+  public BeamIOPushDownRule(RelBuilderFactory relBuilderFactory) {
+super(operand(Calc.class, operand(BeamIOSourceRel.class, any())), 
relBuilderFactory, null);
+  }
+
+  // ~ Methods 
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+final Calc calc = call.rel(0);
+final BeamIOSourceRel ioSourceRel = call.rel(1);
+final BeamSqlTable beamSqlTable = ioSourceRel.getBeamSqlTable();
+final RexProgram program = calc.getProgram();
+final Pair, ImmutableList> projectFilter = 
program.split();
+final RelDataType calcInputRowType = program.getInputRowType();
+RelBuilder relBuilder = call.builder();
+
+if (!beamSqlTable.supportsProjects()) {
+  return;
+}
+
+// Nested rows are not supported at the moment
+for 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328728=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328728
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335108780
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -157,16 +169,18 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
 public PCollection buildIOReader(
 PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
   PCollection withAllFields = buildIOReader(begin);
-  if (fieldNames.isEmpty() && filters instanceof DefaultTableFilter) {
+  if (options == PushDownOptions.NONE) { // needed for testing purposes
 return withAllFields;
   }
 
 
 Review comment:
   In this function you should throw an exception if the filter is set but 
filter push-down isn't enabled. Same thing with project push down.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328728)
Time Spent: 4h 40m  (was: 4.5h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328725=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328725
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:26
Start Date: 15/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335092961
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -109,10 +137,11 @@ public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
   @Override
   public BeamCostModel beamComputeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
 NodeStats estimates = BeamSqlRelUtils.getNodeStats(this, mq);
-return BeamCostModel.FACTORY.makeCost(estimates.getRowCount(), 
estimates.getRate());
+return BeamCostModel.FACTORY.makeCost(
+estimates.getRowCount() * getRowType().getFieldCount(), 
estimates.getRate());
   }
 
 Review comment:
   This is a good question. This needs to apply to the rate as well. I think 
this should probably be passed into `makeCost` as `dIo` and factored into the 
cost model. See here: 
https://github.com/apache/beam/blob/031b3789c4191bc82d0e97f4cabd0ccbee6c9902/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamCostModel.java#L121
   
   If just multiplying it into the sum in `getCostCombination` passes all the 
tests that is probably good enough for now. I would expect we actually want it 
to be a smaller factor but we can figure that out later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328725)
Time Spent: 4h 40m  (was: 4.5h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328721
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:12
Start Date: 15/Oct/19 18:12
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542337427
 
 
   I went over the warnings in your last message, my suggestions below.
   
   Fix in this PR (most changes affect one file):
   - expression-not-assigned (recent test, can be fixed via _ = ...)
   - singleton-comparison 
   - comparisons-with-literal 
   
   Exclude inline:
   - using-constant-test - I don't get this warning. Consider excluding in-line 
+ file a linter bug.
   - comparison-with-callable - potentially useful, let's fix inline in 2 
places.
   
   Fix in a separate PR:
   - invalid-overridden-method - I think this is worth fixing, we can file a 
Jira + a make separate PR to be safe. I think AI here is to replace deprecated 
decorator   @abc.abstractproperty in filesystemio.py. 
   
   Exclude from lint (seem debatable/harmless) code style warnings:
   - no-else-break
   - unnecessary-pass
   - consider-using-in 
   - try-except-raise
   
   Exclude but add a newbie task 1 (may help somebody learn bits of python):
   - consider-using-set-comprehension
   - chained-comparison
   - consider-using-sys-exit
   
   Exclude but add a newbie task 2:
   - unnecessary-comprehension. We should mention that in some places we should 
use list (some_iter), in some just remove the comprehension. May be better to 
do this in a separate task to avoid a blind approval during review as some 
place need to be fixed differently.
   
   Not sure. Let's exclude for now:
   - stop-iteration-return.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328721)
Time Spent: 7h 10m  (was: 7h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328716=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328716
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:10
Start Date: 15/Oct/19 18:10
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-542337427
 
 
   I went over the warnings, my suggestions below.
   
   Fix in this PR (most changes affect one file):
   - expression-not-assigned (recent test, can be fixed via _ = ...)
   - singleton-comparison 
   - comparisons-with-literal 
   
   Exclude inline:
   - using-constant-test - I don't get this warning. Consider excluding in-line 
+ file a linter bug.
   - comparison-with-callable - potentially useful, let's fix inline in 2 
places.
   
   Fix in a separate PR:
   - invalid-overridden-method - I think this is worth fixing, we can file a 
Jira + a make separate PR to be safe. I think AI here is to replace deprecated 
decorator   @abc.abstractproperty in filesystemio.py. 
   
   Exclude from lint (seem debatable/harmless) code style warnings:
   - no-else-break
   - unnecessary-pass
   - consider-using-in 
   - try-except-raise
   
   Exclude but add a newbie task 1 (may help somebody learn bits of python):
   - consider-using-set-comprehension
   - chained-comparison
   - consider-using-sys-exit
   
   Exclude but add a newbie task 2:
   - unnecessary-comprehension. We should mention that in some places we should 
use list (some_iter), in some just remove the comprehension. May be better to 
do this in a separate task to avoid a blind approval during review as some 
place need to be fixed differently.
   
   Not sure. Let's exclude for now:
   - stop-iteration-return.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328716)
Time Spent: 6h 50m  (was: 6h 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328719=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328719
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:10
Start Date: 15/Oct/19 18:10
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334716983
 
 

 ##
 File path: sdks/python/apache_beam/examples/complete/juliaset/setup.py
 ##
 @@ -31,7 +31,8 @@
 import subprocess
 from distutils.command.build import build as _build
 
-import setuptools
+# workaround pylint bug: https://github.com/PyCQA/pylint/issues/3152
 
 Review comment:
   How about we file an issue and add a comment:
   `# TODO(BEAM-...): re-enable lint check. `
   We can tag the issue as newbie/starter/trivial and it can be someone's first 
contribution to beam.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328719)
Time Spent: 7h  (was: 6h 50m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8406) TextTable support JSON format

2019-10-15 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8406:
---
Description: 
Have a JSON table implementation similar to [1].


[1]: 
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328710=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328710
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 18:00
Start Date: 15/Oct/19 18:00
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335097020
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -94,7 +116,19 @@ public NodeStats estimateNodeStats(RelMetadataQuery mq) {
   "Should not have received input for %s: %s",
   BeamIOSourceRel.class.getSimpleName(),
   input);
-  return beamTable.buildIOReader(input.getPipeline().begin());
+
+  PBegin begin = input.getPipeline().begin();
+  BeamSqlTableFilter filters = 
beamTable.constructFilter(ImmutableList.of());
+
+  if (usedFields.isEmpty() && filters instanceof DefaultTableFilter) {
+return beamTable.buildIOReader(begin);
+  }
+
+  Schema newBeamSchema = CalciteUtils.toSchema(getRowType());
+
+  return beamTable
+  .buildIOReader(input.getPipeline().begin(), filters, usedFields)
 
 Review comment:
   Nice catch! Updated to use local variable `begin`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328710)
Time Spent: 4.5h  (was: 4h 20m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8406) TextTable support JSON format

2019-10-15 Thread Rui Wang (Jira)
Rui Wang created BEAM-8406:
--

 Summary: TextTable support JSON format
 Key: BEAM-8406
 URL: https://issues.apache.org/jira/browse/BEAM-8406
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8406) TextTable support JSON format

2019-10-15 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8406:
---
Status: Open  (was: Triage Needed)

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8405) Python: Datastore: add support for embedded entities

2019-10-15 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8405:
---

 Summary: Python: Datastore: add support for embedded entities 
 Key: BEAM-8405
 URL: https://issues.apache.org/jira/browse/BEAM-8405
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Udi Meiri
Assignee: Udi Meiri


The conversion methods to/from the client entity type should be updated to 
support an embedded Entity.
https://github.com/apache/beam/blob/603d68aafe9bdcd124d28ad62ad36af01e7a7403/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py#L216-L240



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8405) Python: Datastore: add support for embedded entities

2019-10-15 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8405:

Status: Open  (was: Triage Needed)

> Python: Datastore: add support for embedded entities 
> -
>
> Key: BEAM-8405
> URL: https://issues.apache.org/jira/browse/BEAM-8405
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> The conversion methods to/from the client entity type should be updated to 
> support an embedded Entity.
> https://github.com/apache/beam/blob/603d68aafe9bdcd124d28ad62ad36af01e7a7403/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py#L216-L240



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328699=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328699
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 15/Oct/19 17:37
Start Date: 15/Oct/19 17:37
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r335086826
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -94,7 +116,19 @@ public NodeStats estimateNodeStats(RelMetadataQuery mq) {
   "Should not have received input for %s: %s",
   BeamIOSourceRel.class.getSimpleName(),
   input);
-  return beamTable.buildIOReader(input.getPipeline().begin());
+
+  PBegin begin = input.getPipeline().begin();
+  BeamSqlTableFilter filters = 
beamTable.constructFilter(ImmutableList.of());
+
+  if (usedFields.isEmpty() && filters instanceof DefaultTableFilter) {
+return beamTable.buildIOReader(begin);
+  }
+
+  Schema newBeamSchema = CalciteUtils.toSchema(getRowType());
+
+  return beamTable
+  .buildIOReader(input.getPipeline().begin(), filters, usedFields)
 
 Review comment:
   Replace `input.getPipeline().begin()` with `begin` (or drop the variable).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328699)
Time Spent: 4h 20m  (was: 4h 10m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8404) [SQL] Update deprecated method calls

2019-10-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8404?focusedWorklogId=328695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328695
 ]

ASF GitHub Bot logged work on BEAM-8404:


Author: ASF GitHub Bot
Created on: 15/Oct/19 17:32
Start Date: 15/Oct/19 17:32
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9798: [BEAM-8404] 
Replaced deprecated calls
URL: https://github.com/apache/beam/pull/9798
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328695)
Remaining Estimate: 0h
Time Spent: 10m

> [SQL] Update deprecated method calls
> 
>
> Key: BEAM-8404
> URL: https://issues.apache.org/jira/browse/BEAM-8404
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Affects Versions: 2.15.0
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Improve code health by moving away from using deprecated methods/classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-10-15 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952145#comment-16952145
 ] 

Kenneth Knowles commented on BEAM-8399:
---

Just curious - how do you even resolve hdfs://parent/child? I'm not experienced 
with HDFS. Is there a default namenodehost? Is it a pipeline option?

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-10-15 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8399:
--
Status: Open  (was: Triage Needed)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >