[jira] [Commented] (BEAM-7019) Reify transform for Python SDK

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887640#comment-16887640
 ] 

Shehzaad Nakhoda commented on BEAM-7019:


[~reuvenlax][~altay] BEAM-7388 was filed and has been resolved already.

> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
> Fix For: 2.14.0
>
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7019) Reify transform for Python SDK

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda resolved BEAM-7019.

   Resolution: Duplicate
Fix Version/s: 2.14.0

> Reify transform for Python SDK
> --
>
> Key: BEAM-7019
> URL: https://issues.apache.org/jira/browse/BEAM-7019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rose Nguyen
>Assignee: Shehzaad Nakhoda
>Priority: Minor
> Fix For: 2.14.0
>
>
> PTransforms for converting between explicit and implicit form of various Beam
> values.
> It should offer the same API as its Java counterpart: 
> [https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reify.java]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7767) Regexp matching breaks on Windows for fileio test

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7767?focusedWorklogId=278715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278715
 ]

ASF GitHub Bot logged work on BEAM-7767:


Author: ASF GitHub Bot
Created on: 18/Jul/19 05:17
Start Date: 18/Jul/19 05:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9097: [BEAM-7767] Improving 
regexp matching for fileio test
URL: https://github.com/apache/beam/pull/9097#issuecomment-512668419
 
 
   Run Python_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278715)
Time Spent: 20m  (was: 10m)

> Regexp matching breaks on Windows for fileio test
> -
>
> Key: BEAM-7767
> URL: https://issues.apache.org/jira/browse/BEAM-7767
> Project: Beam
>  Issue Type: Improvement
>  Components: io-python-files
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7767) Regexp matching breaks on Windows for fileio test

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7767?focusedWorklogId=278712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278712
 ]

ASF GitHub Bot logged work on BEAM-7767:


Author: ASF GitHub Bot
Created on: 18/Jul/19 05:10
Start Date: 18/Jul/19 05:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9097: [BEAM-7767] 
Improving regexp matching for fileio test
URL: https://github.com/apache/beam/pull/9097
 
 
   This matching runs into problems when receiving a `c:/...` filepath, so I'm 
just matching on the file name, and wildcarding the directory.
   
   r: @chamikaramj 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)

[jira] [Created] (BEAM-7767) Regexp matching breaks on Windows for fileio test

2019-07-17 Thread Pablo Estrada (JIRA)
Pablo Estrada created BEAM-7767:
---

 Summary: Regexp matching breaks on Windows for fileio test
 Key: BEAM-7767
 URL: https://issues.apache.org/jira/browse/BEAM-7767
 Project: Beam
  Issue Type: Improvement
  Components: io-python-files
Reporter: Pablo Estrada
Assignee: Pablo Estrada






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278700
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 18/Jul/19 04:52
Start Date: 18/Jul/19 04:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9064: 
[BEAM-6972] 2.7.1 LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278700)
Time Spent: 1h 40m  (was: 1.5h)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278698
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 18/Jul/19 04:52
Start Date: 18/Jul/19 04:52
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512664025
 
 
   I've had a large number of builds on Jenkins and locally fail due to maven 
central download issues. Here is a scan of `:javaPreCommit` 
https://gradle.com/s/p4mabdazm6yjq
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278698)
Time Spent: 1.5h  (was: 1h 20m)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278682=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278682
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 18/Jul/19 03:18
Start Date: 18/Jul/19 03:18
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512648841
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278682)
Time Spent: 1h 20m  (was: 1h 10m)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278680=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278680
 ]

ASF GitHub Bot logged work on BEAM-7714:


Author: ASF GitHub Bot
Created on: 18/Jul/19 03:10
Start Date: 18/Jul/19 03:10
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [BEAM-7714] 
[BEAM-7257] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512647646
 
 
   R: @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278680)
Time Spent: 0.5h  (was: 20m)

> Allow retries of PostCommit test suites per Python version
> --
>
> Key: BEAM-7714
> URL: https://issues.apache.org/jira/browse/BEAM-7714
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Blocker
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Python PostCommit test executes 4 tests running the set of tests 
> under Python 2.7, 3.5-3.7. When test execution fails due to a flake, 
> contributors have to rerun the whole suite. Having a possibility to re-run 
> test suite only for a particular version of Python would make it easier to 
> receive a green run. 
> Some considerations:
>   - increasing number of Jenkins job will increase the number of slots 
> required by postcommit, this will slow down the queue, unless we increase 
> number of slots. We can investigate utilization of Jenkins workers to see if 
> slot increase is advisable.
> - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that 
> will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. 
> phrase-only suites will not be triggered on the PR but will be triggered 
> manually when users want to re-run tests for particular version. It may cause 
> confusion on a PR though, since PR author will have to explain to reviewers 
> that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I 
> re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR 
> is safe to merge.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278679=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278679
 ]

ASF GitHub Bot logged work on BEAM-7714:


Author: ASF GitHub Bot
Created on: 18/Jul/19 03:08
Start Date: 18/Jul/19 03:08
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7714] 
[BEAM-7257] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512647182
 
 
   Run Go PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278679)
Time Spent: 20m  (was: 10m)

> Allow retries of PostCommit test suites per Python version
> --
>
> Key: BEAM-7714
> URL: https://issues.apache.org/jira/browse/BEAM-7714
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently Python PostCommit test executes 4 tests running the set of tests 
> under Python 2.7, 3.5-3.7. When test execution fails due to a flake, 
> contributors have to rerun the whole suite. Having a possibility to re-run 
> test suite only for a particular version of Python would make it easier to 
> receive a green run. 
> Some considerations:
>   - increasing number of Jenkins job will increase the number of slots 
> required by postcommit, this will slow down the queue, unless we increase 
> number of slots. We can investigate utilization of Jenkins workers to see if 
> slot increase is advisable.
> - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that 
> will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. 
> phrase-only suites will not be triggered on the PR but will be triggered 
> manually when users want to re-run tests for particular version. It may cause 
> confusion on a PR though, since PR author will have to explain to reviewers 
> that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I 
> re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR 
> is safe to merge.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7714) Allow retries of PostCommit test suites per Python version

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7714?focusedWorklogId=278678=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278678
 ]

ASF GitHub Bot logged work on BEAM-7714:


Author: ASF GitHub Bot
Created on: 18/Jul/19 03:08
Start Date: 18/Jul/19 03:08
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7714] 
[BEAM-7257] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512647130
 
 
   Run Python_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278678)
Time Spent: 10m
Remaining Estimate: 0h

> Allow retries of PostCommit test suites per Python version
> --
>
> Key: BEAM-7714
> URL: https://issues.apache.org/jira/browse/BEAM-7714
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Valentyn Tymofieiev
>Assignee: Mark Liu
>Priority: Blocker
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently Python PostCommit test executes 4 tests running the set of tests 
> under Python 2.7, 3.5-3.7. When test execution fails due to a flake, 
> contributors have to rerun the whole suite. Having a possibility to re-run 
> test suite only for a particular version of Python would make it easier to 
> receive a green run. 
> Some considerations:
>   - increasing number of Jenkins job will increase the number of slots 
> required by postcommit, this will slow down the queue, unless we increase 
> number of slots. We can investigate utilization of Jenkins workers to see if 
> slot increase is advisable.
> - we could introduce phrase-only suites "Run Python 3.7 PostCommits", that 
> will be separate jenkins jobs (1 suite, 1 slot) in addition to current jobs. 
> phrase-only suites will not be triggered on the PR but will be triggered 
> manually when users want to re-run tests for particular version. It may cause 
> confusion on a PR though, since PR author will have to explain to reviewers 
> that Python 3 Postcommit suite failed, but only 3.6 portion failed, and I 
> re-ran Py3.6 portion only in this separate jenkins Job and it passed, so PR 
> is safe to merge.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887565#comment-16887565
 ] 

Kenneth Knowles commented on BEAM-7766:
---

Is UNKNOWN a value that could be returned by the service? Or is it only a 
client side indication that it does not understand it? These two cases should 
be kept separate. If I recall from the Java SDK, these two are actually both 
possible and different.

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887563#comment-16887563
 ] 

Kenneth Knowles commented on BEAM-7766:
---

Thank you for filing this. Please remember to not close this until it is 
cherry-picked to 2.7.1. Or else you can clone it and close this one when it 
reaches master and close the clone when it is merged to 2.7.1.

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278676=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278676
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:58
Start Date: 18/Jul/19 02:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512645464
 
 
   Run Python 3.6 Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278676)
Time Spent: 3.5h  (was: 3h 20m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278677
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:58
Start Date: 18/Jul/19 02:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512645505
 
 
   Run PYthon 3.7 Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278677)
Time Spent: 3h 40m  (was: 3.5h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278674
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:58
Start Date: 18/Jul/19 02:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512645359
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278674)
Time Spent: 3h 10m  (was: 3h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278675=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278675
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:58
Start Date: 18/Jul/19 02:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512645414
 
 
   Run Python 3.5 Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278675)
Time Spent: 3h 20m  (was: 3h 10m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7600) Spark portable runner: reuse SDK harness

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7600?focusedWorklogId=278673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278673
 ]

ASF GitHub Bot logged work on BEAM-7600:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:57
Start Date: 18/Jul/19 02:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9095: [BEAM-7600] 
borrow SDK harness management code into Spark runner
URL: https://github.com/apache/beam/pull/9095
 
 
   Now the Spark runner can reuse SDK harnesses, and multiple SDK harness can 
be used.
   
   The latter will hopefully enable multicore processing on TFX, for example.
   
   
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests 

[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278671=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278671
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:47
Start Date: 18/Jul/19 02:47
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512643387
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278671)
Time Spent: 3h  (was: 2h 50m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278669
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:36
Start Date: 18/Jul/19 02:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8871: [BEAM-6611] 
BigQuery file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#discussion_r304671282
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -550,6 +562,25 @@ def verify(self):
'loaded into BigQuery. Please provide a GCS bucket, or '
'pass method="STREAMING_INSERTS" to WriteToBigQuery.'
% self._custom_gcs_temp_location.get())
+if self.is_streaming_pipeline and not self.triggering_frequency:
+  raise ValueError('triggering_frequency must be specified to use file'
+   'loads in streaming')
+elif not self.is_streaming_pipeline and self.triggering_frequency:
+  raise ValueError('triggering_frequency can only be used with file'
+   'loads in streaming')
+
+  def _window_fn(self):
+if self.is_streaming_pipeline:
+  return beam.WindowInto(beam.window.GlobalWindows(),
+ trigger=trigger.Repeatedly(
+ trigger.AfterAny(
+ trigger.AfterProcessingTime(
+ self.triggering_frequency),
+ trigger.AfterCount(
+ _FILE_TRIGGERING_RECORD_COUNT))),
 
 Review comment:
   If we trigger after a certain number of records OR the triggering frequency, 
we may end up triggering more times than the quota supports, right? Supposing 
that the records are coming into the pipeline very quickly. Can you share your 
reasoning around this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278669)
Time Spent: 4h 40m  (was: 4.5h)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278668
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:36
Start Date: 18/Jul/19 02:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8871: [BEAM-6611] 
BigQuery file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#discussion_r304670918
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -550,6 +562,25 @@ def verify(self):
'loaded into BigQuery. Please provide a GCS bucket, or '
'pass method="STREAMING_INSERTS" to WriteToBigQuery.'
% self._custom_gcs_temp_location.get())
+if self.is_streaming_pipeline and not self.triggering_frequency:
+  raise ValueError('triggering_frequency must be specified to use file'
+   'loads in streaming')
+elif not self.is_streaming_pipeline and self.triggering_frequency:
+  raise ValueError('triggering_frequency can only be used with file'
+   'loads in streaming')
+
+  def _window_fn(self):
+if self.is_streaming_pipeline:
+  return beam.WindowInto(beam.window.GlobalWindows(),
+ trigger=trigger.Repeatedly(
+ trigger.AfterAny(
+ trigger.AfterProcessingTime(
+ self.triggering_frequency),
+ trigger.AfterCount(
+ _FILE_TRIGGERING_RECORD_COUNT))),
+ accumulation_mode=trigger.AccumulationMode\
+  .DISCARDING)
+return beam.WindowInto(beam.window.GlobalWindows())
 
 Review comment:
   Nit: Maybe add `else: \ return ...globalwindow...`? I find it (a tiny bit) 
easier to read.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278668)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=278667=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278667
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:36
Start Date: 18/Jul/19 02:36
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #8871: [BEAM-6611] 
BigQuery file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#discussion_r304713800
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##
 @@ -622,8 +653,9 @@ def expand(self, pcoll):
 test_client=self.test_client,
 temporary_tables=self.temp_tables,
 additional_bq_parameters=self.additional_bq_parameters),
-load_job_name_pcv, *self.schema_side_inputs).with_outputs(
-TriggerLoadJobs.TEMP_TABLES, main='main')
+load_job_name_pcv, self.is_streaming_pipeline,
 
 Review comment:
   I wonder if we should pass `is_streaming_pipeline` at pipeline construction 
(i.e. in the constructor) rather than a side input. It would allow us to show 
it as .p.ex. display data.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278667)
Time Spent: 4.5h  (was: 4h 20m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278657
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:19
Start Date: 18/Jul/19 02:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512637641
 
 
   run python 3.5 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278657)
Time Spent: 2h 40m  (was: 2.5h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278658
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:19
Start Date: 18/Jul/19 02:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512637710
 
 
   run python 3.6 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278658)
Time Spent: 2h 50m  (was: 2h 40m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278656=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278656
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 02:15
Start Date: 18/Jul/19 02:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512636972
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278656)
Time Spent: 2.5h  (was: 2h 20m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-7246) Create a Spanner IO for Python

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7246 started by Shehzaad Nakhoda.
--
> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Kenneth Knowles (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887543#comment-16887543
 ] 

Kenneth Knowles commented on BEAM-6855:
---

What I mean is code something like this:

{code}
DoFnRunner statefulRunner = new StatefulDoFnRunner(...)
PushbackSideInputDoFnRunner dofnRunner = 
SimplePushbackSideInputDoFnRunner.create(statefulRunner, ...)
{code}

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6855:
--

Assignee: (was: Shehzaad Nakhoda)

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work started] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-6855 started by Shehzaad Nakhoda.
--
> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-6855) Side inputs are not supported when using the state API

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shehzaad Nakhoda reassigned BEAM-6855:
--

Assignee: Shehzaad Nakhoda

> Side inputs are not supported when using the state API
> --
>
> Key: BEAM-6855
> URL: https://issues.apache.org/jira/browse/BEAM-6855
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=278647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278647
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:44
Start Date: 18/Jul/19 01:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9050: [BEAM-7284] enabled 
to pickle python3 dataclasses
URL: https://github.com/apache/beam/pull/9050#issuecomment-512631458
 
 
   Thanks a lot, @lazylynx!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278647)
Time Spent: 40m  (was: 0.5h)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7284) Support Py3 Dataclasses

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7284?focusedWorklogId=278648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278648
 ]

ASF GitHub Bot logged work on BEAM-7284:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:44
Start Date: 18/Jul/19 01:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9050: [BEAM-7284] enabled 
to pickle python3 dataclasses
URL: https://github.com/apache/beam/pull/9050#issuecomment-512631569
 
 
   @robertwb Could you please help merge this? Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278648)
Time Spent: 50m  (was: 40m)

> Support Py3 Dataclasses 
> 
>
> Key: BEAM-7284
> URL: https://issues.apache.org/jira/browse/BEAM-7284
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It looks like dill does not support Dataclasses yet, 
> https://github.com/uqfoundation/dill/issues/312, which very likely means that 
> Beam does not support them either.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7246) Create a Spanner IO for Python

2019-07-17 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887536#comment-16887536
 ] 

Ahmet Altay commented on BEAM-7246:
---

[~raheelkhan] just checking, are you still blocked on this?

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6675) The JdbcIO sink should accept schemas

2019-07-17 Thread Shehzaad Nakhoda (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887535#comment-16887535
 ] 

Shehzaad Nakhoda commented on BEAM-6675:


[~reuvenlax] Can this be marked as resolved? Thanks.

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7728) Support ParquetTable in SQL

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7728?focusedWorklogId=278645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278645
 ]

ASF GitHub Bot logged work on BEAM-7728:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:33
Start Date: 18/Jul/19 01:33
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on pull request #9054: [BEAM-7728] 
[SQL] Support ParquetTable
URL: https://github.com/apache/beam/pull/9054#discussion_r304703525
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/ParquetTable.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.parquet;
+
+import java.io.Serializable;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.beam.sdk.extensions.sql.impl.schema.BaseBeamTable;
+import org.apache.beam.sdk.io.parquet.ParquetIO;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.utils.AvroUtils;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.sdk.values.Row;
+
+/** {@link ParquetTable} is a {@link 
org.apache.beam.sdk.extensions.sql.BeamSqlTable}. */
+public class ParquetTable extends BaseBeamTable implements Serializable {
+  private final String filePattern;
+
+  public ParquetTable(Schema beamSchema, String filePattern) {
+super(beamSchema);
+this.filePattern = filePattern;
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+PTransform, PCollection> readConverter =
+GenericRecordReadConverter.builder().beamSchema(schema).build();
+
+return begin
+.apply("ParquetIORead", 
ParquetIO.read(AvroUtils.toAvroSchema(schema)).from(filePattern))
+.apply("GenericRecordToRow", readConverter);
+  }
+
+  @Override
+  public PDone buildIOWriter(PCollection input) {
+throw new UnsupportedOperationException("Writing to a Parquet file is not 
supported");
 
 Review comment:
   okay, let me try this. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278645)
Time Spent: 1h 20m  (was: 1h 10m)

> Support ParquetTable in SQL
> ---
>
> Key: BEAM-7728
> URL: https://issues.apache.org/jira/browse/BEAM-7728
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7728) Support ParquetTable in SQL

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7728?focusedWorklogId=278644=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278644
 ]

ASF GitHub Bot logged work on BEAM-7728:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:32
Start Date: 18/Jul/19 01:32
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on pull request #9054: [BEAM-7728] 
[SQL] Support ParquetTable
URL: https://github.com/apache/beam/pull/9054#discussion_r304703383
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/parquet/GenericRecordToRowTest.java
 ##
 @@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.parquet;
+
+import java.io.Serializable;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.beam.sdk.coders.AvroCoder;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.junit.Rule;
+import org.junit.Test;
+
+/** Unit tests for {@link GenericRecordReadConverter}. */
+public class GenericRecordToRowTest implements Serializable {
+  @Rule public transient TestPipeline pipeline = TestPipeline.create();
+
+  org.apache.beam.sdk.schemas.Schema payloadSchema =
 
 Review comment:
   i see
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278644)
Time Spent: 1h 10m  (was: 1h)

> Support ParquetTable in SQL
> ---
>
> Key: BEAM-7728
> URL: https://issues.apache.org/jira/browse/BEAM-7728
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278643
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:27
Start Date: 18/Jul/19 01:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512628523
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278643)
Time Spent: 2h 20m  (was: 2h 10m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278640=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278640
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 01:22
Start Date: 18/Jul/19 01:22
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512627664
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278640)
Time Spent: 2h 10m  (was: 2h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278632=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278632
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:51
Start Date: 18/Jul/19 00:51
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512621830
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278632)
Time Spent: 2h  (was: 1h 50m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278631
 ]

ASF GitHub Bot logged work on BEAM-7545:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:48
Start Date: 18/Jul/19 00:48
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #9040: [BEAM-7545] 
Reordering Beam Joins
URL: https://github.com/apache/beam/pull/9040
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278631)
Time Spent: 10h 10m  (was: 10h)

> Row Count Estimation for CSV TextTable
> --
>
> Key: BEAM-7545
> URL: https://issues.apache.org/jira/browse/BEAM-7545
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Implementing Row Count Estimation for CSV Tables by reading the first few 
> lines of the file and estimating the number of records based on the length of 
> these lines and the total length of the file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-7766:

Status: Open  (was: Triage Needed)

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278629=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278629
 ]

ASF GitHub Bot logged work on BEAM-7766:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:34
Start Date: 18/Jul/19 00:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9094: [BEAM-7766] Default 
to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be 
recognized.
URL: https://github.com/apache/beam/pull/9094#issuecomment-512618753
 
 
   @kennknowles I'd like to fix this on 2.7.1 and can prepare a cherry-pick 
once this is merged. I set 2.7.1 as fix version on BEAM-7766.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278629)
Time Spent: 0.5h  (was: 20m)

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-7766:
-

Assignee: Valentyn Tymofieiev

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278628=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278628
 ]

ASF GitHub Bot logged work on BEAM-7766:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:31
Start Date: 18/Jul/19 00:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9094: [BEAM-7766] Default 
to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be 
recognized.
URL: https://github.com/apache/beam/pull/9094#issuecomment-512618315
 
 
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278628)
Time Spent: 20m  (was: 10m)

> Dataflow runner should default to PiplelineState.UNKNOWN when job state 
> received via v1beta3 cannot be recognized.
> --
>
> Key: BEAM-7766
> URL: https://issues.apache.org/jira/browse/BEAM-7766
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Affects Versions: 2.1.0
>Reporter: Valentyn Tymofieiev
>Priority: Minor
> Fix For: 2.7.1, 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7766?focusedWorklogId=278627=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278627
 ]

ASF GitHub Bot logged work on BEAM-7766:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:30
Start Date: 18/Jul/19 00:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9094: [BEAM-7766] 
Default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be 
recognized.
URL: https://github.com/apache/beam/pull/9094
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   

   
   --- |Java | Python | Go | 

[jira] [Created] (BEAM-7766) Dataflow runner should default to PiplelineState.UNKNOWN when job state received via v1beta3 cannot be recognized.

2019-07-17 Thread Valentyn Tymofieiev (JIRA)
Valentyn Tymofieiev created BEAM-7766:
-

 Summary: Dataflow runner should default to PiplelineState.UNKNOWN 
when job state received via v1beta3 cannot be recognized.
 Key: BEAM-7766
 URL: https://issues.apache.org/jira/browse/BEAM-7766
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow
Affects Versions: 2.1.0
Reporter: Valentyn Tymofieiev
 Fix For: 2.7.1, 2.15.0






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278625
 ]

ASF GitHub Bot logged work on BEAM-7545:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:26
Start Date: 18/Jul/19 00:26
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9040: [BEAM-7545] 
Reordering Beam Joins
URL: https://github.com/apache/beam/pull/9040#discussion_r304693458
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java
 ##
 @@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.math.BigInteger;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamIOSourceRel;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider;
+import org.apache.beam.sdk.values.Row;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.core.Join;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test ensures that we are reordering joins and get a plan similar to 
Join(large,Join(small,
 
 Review comment:
   Agree with Anton that changes and designs are documented by tests. As we are 
at an early stage of having optimizations, what we have now is definitely not 
perfect and we can refine them as time goes. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278625)
Time Spent: 9h 50m  (was: 9h 40m)

> Row Count Estimation for CSV TextTable
> --
>
> Key: BEAM-7545
> URL: https://issues.apache.org/jira/browse/BEAM-7545
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Implementing Row Count Estimation for CSV Tables by reading the first few 
> lines of the file and estimating the number of records based on the length of 
> these lines and the total length of the file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278626=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278626
 ]

ASF GitHub Bot logged work on BEAM-7545:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:26
Start Date: 18/Jul/19 00:26
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9040: [BEAM-7545] 
Reordering Beam Joins
URL: https://github.com/apache/beam/pull/9040#issuecomment-512617407
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278626)
Time Spent: 10h  (was: 9h 50m)

> Row Count Estimation for CSV TextTable
> --
>
> Key: BEAM-7545
> URL: https://issues.apache.org/jira/browse/BEAM-7545
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Implementing Row Count Estimation for CSV Tables by reading the first few 
> lines of the file and estimating the number of records based on the length of 
> these lines and the total length of the file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278622=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278622
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:15
Start Date: 18/Jul/19 00:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8893: [BEAM-6877] 
trivial_inference: make remaining tests pass
URL: https://github.com/apache/beam/pull/8893#issuecomment-512615395
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278622)
Time Spent: 8h 50m  (was: 8h 40m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278621
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:15
Start Date: 18/Jul/19 00:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8893: [BEAM-6877] 
trivial_inference: make remaining tests pass
URL: https://github.com/apache/beam/pull/8893#issuecomment-512615347
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278621)
Time Spent: 8h 40m  (was: 8.5h)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6877) TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode changes

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6877?focusedWorklogId=278620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278620
 ]

ASF GitHub Bot logged work on BEAM-6877:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:13
Start Date: 18/Jul/19 00:13
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8893: [BEAM-6877] 
trivial_inference: make remaining tests pass
URL: https://github.com/apache/beam/pull/8893#issuecomment-512615014
 
 
   R: @robertwb (in case you haven't seen all the emails)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278620)
Time Spent: 8.5h  (was: 8h 20m)

> TypeHints Py3 Error: Type inference tests fail on Python 3.6 due to bytecode 
> changes
> 
>
> Key: BEAM-6877
> URL: https://issues.apache.org/jira/browse/BEAM-6877
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Type inference doesn't work on Python 3.6 due to [bytecode to wordcode 
> changes|https://docs.python.org/3/whatsnew/3.6.html#cpython-bytecode-changes].
> Type inference always returns Any on Python 3.6, so this is not critical.
> Affected tests are:
>  *transforms.ptransform_test*:
>  - test_combine_properly_pipeline_type_checks_using_decorator
>  - test_mean_globally_pipeline_checking_satisfied
>  - test_mean_globally_runtime_checking_satisfied
>  - test_count_globally_pipeline_type_checking_satisfied
>  - test_count_globally_runtime_type_checking_satisfied
>  - test_pardo_type_inference
>  - test_pipeline_inference
>  - test_inferred_bad_kv_type
> *typehints.trivial_inference_test*:
>  - all tests in TrivialInferenceTest
> *io.gcp.pubsub_test.TestReadFromPubSubOverride*:
> * test_expand_with_other_options
> * test_expand_with_subscription
> * test_expand_with_topic



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278618
 ]

ASF GitHub Bot logged work on BEAM-7726:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:13
Start Date: 18/Jul/19 00:13
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9080:  [BEAM-7726] 
Implement State Backed Iterables in Go SDK
URL: https://github.com/apache/beam/pull/9080#discussion_r304686804
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
 ##
 @@ -262,3 +282,60 @@ func (n *DataSource) Split(splits []int64, frac float32) 
(int64, error) {
// return an error.
return 0, fmt.Errorf("failed to split at requested splits: {%v}, 
DataSource at index: %v", splits, c)
 }
+
+type concatReStream struct {
+   first, next ReStream
+}
+
+func (c *concatReStream) Open() (Stream, error) {
+   firstStream, err := c.first.Open()
+   if err != nil {
+   return nil, err
+   }
+   return {first: firstStream, nextStream: c.next}, nil
+}
+
+type concatStream struct {
+   first  Stream
+   nextStream ReStream
+}
+
+// Close nils the stream.
+func (s *concatStream) Close() error {
+   if s.first == nil {
+   return nil
+   }
+   defer func() {
+   s.first = nil
+   s.nextStream = nil
+   }()
+   return s.first.Close()
+}
+
+func (s *concatStream) Read() (*FullValue, error) {
+   if s.first == nil { // When the stream is closed.
+   return nil, io.EOF
+   }
+   fv, err := s.first.Read()
+   if err == nil {
+   return fv, nil
+   }
+   if err == io.EOF {
+   if err := s.first.Close(); err != nil {
+   s.nextStream = nil
+   return nil, err
+   }
+   if s.nextStream == nil {
+   s.first = nil
+   return nil, io.EOF
+   }
+   s.first, err = s.nextStream.Open()
 
 Review comment:
   Just checking my understanding here: nextStream here is opening an 
elementStream reading from the state-backed iterable (ScopedStateReader in 
statemgr.go), so that new stream will automatically get continuations from the 
state channel, right? It took me a while to trace how this was working, so I 
wanna confirm that I understood it correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278618)
Time Spent: 1h 20m  (was: 1h 10m)

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278619=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278619
 ]

ASF GitHub Bot logged work on BEAM-7726:


Author: ASF GitHub Bot
Created on: 18/Jul/19 00:13
Start Date: 18/Jul/19 00:13
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9080:  [BEAM-7726] 
Implement State Backed Iterables in Go SDK
URL: https://github.com/apache/beam/pull/9080#discussion_r304680191
 
 

 ##
 File path: sdks/go/pkg/beam/core/runtime/exec/datasource.go
 ##
 @@ -72,117 +79,129 @@ func (n *DataSource) Process(ctx context.Context) error {
c := coder.SkipW(n.Coder)
wc := MakeWindowDecoder(n.Coder.Window)
 
+   var cp ElementDecoder// Decoder for the primary element or the key 
in CoGBKs.
+   var cvs []ElementDecoder // Decoders for each value stream in CoGBKs.
+
switch {
case coder.IsCoGBK(c):
-   ck := MakeElementDecoder(c.Components[0])
-   cv := MakeElementDecoder(c.Components[1])
+   cp = MakeElementDecoder(c.Components[0])
 
-   for {
-   if n.IncrementCountAndCheckSplit(ctx) {
+   // TODO(BEAM-490): Support multiple value streams (coder 
components) with
+   // with CoGBK.
+   cvs = []ElementDecoder{MakeElementDecoder(c.Components[1])}
+   default:
+   cp = MakeElementDecoder(c)
+   }
+
+   for {
+   if n.IncrementCountAndCheckSplit(ctx) {
+   return nil
+   }
+   ws, t, err := DecodeWindowedValueHeader(wc, r)
+   if err != nil {
+   if err == io.EOF {
return nil
}
-   ws, t, err := DecodeWindowedValueHeader(wc, r)
-   if err != nil {
-   if err == io.EOF {
-   return nil
-   }
-   return errors.Wrap(err, "source failed")
-   }
+   return errors.Wrap(err, "source failed")
+   }
 
-   // Decode key
+   // Decode key or parallel element.
+   pe, err := cp.Decode(r)
+   if err != nil {
+   return errors.Wrap(err, "source decode failed")
+   }
+   pe.Timestamp = t
+   pe.Windows = ws
 
-   key, err := ck.Decode(r)
+   var valReStreams []ReStream
+   for _, cv := range cvs {
+   values, err := n.makeReStream(ctx, pe, cv, r)
if err != nil {
-   return errors.Wrap(err, "source decode failed")
+   return err
}
-   key.Timestamp = t
-   key.Windows = ws
+   valReStreams = append(valReStreams, values)
+   }
 
-   // TODO(herohde) 4/30/2017: the State API will be 
handle re-iterations
-   // and only "small" value streams would be inline. 
Presumably, that
-   // would entail buffering the whole stream. We do that 
for now.
+   if err := n.Out.ProcessElement(ctx, pe, valReStreams...); err 
!= nil {
+   return err
+   }
+   }
+}
 
-   var buf []FullValue
+func (n *DataSource) makeReStream(ctx context.Context, key *FullValue, cv 
ElementDecoder, r io.ReadCloser) (ReStream, error) {
+   size, err := coder.DecodeInt32(r)
+   if err != nil {
+   return nil, errors.Wrap(err, "stream size decoding failed")
+   }
 
-   size, err := coder.DecodeInt32(r)
+   switch {
+   case size >= 0:
+   // Single chunk streams are fully read in and buffered in 
memory.
+   var buf []FullValue
+   buf, err = readStreamToBuffer(cv, r, int64(size), buf)
+   if err != nil {
+   return nil, err
+   }
+   return {Buf: buf}, nil
+   case size == -1: // Shouldn't this be 0?
+   // Multi-chunked stream.
+   var buf []FullValue
+   for {
+   chunk, err := coder.DecodeVarInt(r)
if err != nil {
-   return errors.Wrap(err, "stream size decoding 
failed")
+   return nil, errors.Wrap(err, "stream chunk size 
decoding failed")
}
-
-   if size > -1 {
-   // Single chunk stream.
-
-   

[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=278599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278599
 ]

ASF GitHub Bot logged work on BEAM-7484:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:48
Start Date: 17/Jul/19 23:48
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8766: [BEAM-7484] 
Metrics collection in BigQuery perf tests
URL: https://github.com/apache/beam/pull/8766
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278599)
Time Spent: 5h  (was: 4h 50m)

> Throughput collection in BigQuery performance tests
> ---
>
> Key: BEAM-7484
> URL: https://issues.apache.org/jira/browse/BEAM-7484
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> The goal is to collect bytes/time and messages/time metrics in BQ read and 
> write tests in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7484) Throughput collection in BigQuery performance tests

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7484?focusedWorklogId=278598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278598
 ]

ASF GitHub Bot logged work on BEAM-7484:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:47
Start Date: 17/Jul/19 23:47
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8766: [BEAM-7484] Metrics 
collection in BigQuery perf tests
URL: https://github.com/apache/beam/pull/8766#issuecomment-512609860
 
 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278598)
Time Spent: 4h 50m  (was: 4h 40m)

> Throughput collection in BigQuery performance tests
> ---
>
> Key: BEAM-7484
> URL: https://issues.apache.org/jira/browse/BEAM-7484
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> The goal is to collect bytes/time and messages/time metrics in BQ read and 
> write tests in Python SDK.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278586
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:29
Start Date: 17/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512605907
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278586)
Time Spent: 1h 20m  (was: 1h 10m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278587=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278587
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:29
Start Date: 17/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512605970
 
 
   run python 3.5 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278587)
Time Spent: 1.5h  (was: 1h 20m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278590=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278590
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:29
Start Date: 17/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512606052
 
 
   run python 3.7 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278590)
Time Spent: 1h 50m  (was: 1h 40m)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278589
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:29
Start Date: 17/Jul/19 23:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512606008
 
 
   run python 3.6 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278589)
Time Spent: 1h 40m  (was: 1.5h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278585=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278585
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:27
Start Date: 17/Jul/19 23:27
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8899: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8899#issuecomment-512605643
 
 
   The issue is that Guava migrated to the checkerframework `@Nullable` instead 
of the javax version which made spotbugs perform its nullness checks. For 
example, the Guava Function class has the parameter marked as `@Nullable` which 
means that the function must correctly handle null inputs which some of our 
previous Function implementations were not. So I could either update them to 
handle null inputs or mark them as `@Nonnull`. The issue with the latter is 
that we are now narrowing the definition from a Function that took nullable 
input to one that didn't which required a different FB suppression.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278585)
Time Spent: 4h 10m  (was: 4h)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7679) examples.complete.game ITs might use the same BQ dataset

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7679?focusedWorklogId=278575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278575
 ]

ASF GitHub Bot logged work on BEAM-7679:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:16
Start Date: 17/Jul/19 23:16
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8991: [BEAM-7679] Add 
randomness to ITs' BQ dataset name
URL: https://github.com/apache/beam/pull/8991#issuecomment-512602924
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278575)
Time Spent: 2h  (was: 1h 50m)

> examples.complete.game ITs might use the same BQ dataset
> 
>
> Key: BEAM-7679
> URL: https://issues.apache.org/jira/browse/BEAM-7679
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Code is:
> {code:java}
> unique_dataset_name = dataset_base_name + str(int(time.time()))
> {code}
> [https://github.com/apache/beam/blob/932e802279a2daa0ff7797a8fc81e952a4e4f252/sdks/python/apache_beam/io/gcp/tests/utils.py#L59]
>  
> Example log: 
> [https://builds.apache.org/job/beam_PostCommit_Python3_Verify_PR/476/consoleFull]
> I suspect this issue because of this error:
> {code:java}
> google.api_core.exceptions.NotFound: 404 Not found: Table 
> apache-beam-testing:leader_board_it_dataset1562016299.leader_board_teams was 
> not found in location US{code}
> and the fact that a lot of such tests started at the same second.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7765) Add test for snippet accessing_valueprovider_info_after_run

2019-07-17 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7765:
---

 Summary: Add test for snippet 
accessing_valueprovider_info_after_run
 Key: BEAM-7765
 URL: https://issues.apache.org/jira/browse/BEAM-7765
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri


This snippet needs a unit test.
It has bugs. For example:
- apache_beam.utils.value_provider doesn't exist
- beam.combiners.Sum doesn't exist
- unused import of: WriteToText

cc: [~pabloem]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278574
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:11
Start Date: 17/Jul/19 23:11
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9093: [WIP] [BEAM-7257] 
[BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093#issuecomment-512601641
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278574)
Time Spent: 1h 10m  (was: 1h)

> Add withProducerConfigUpdates to KafkaIO
> 
>
> Key: BEAM-7257
> URL: https://issues.apache.org/jira/browse/BEAM-7257
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kafka
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> adding withProducerConfigUpdates and deprecating updateProducerProperties



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7257) Add withProducerConfigUpdates to KafkaIO

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7257?focusedWorklogId=278572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278572
 ]

ASF GitHub Bot logged work on BEAM-7257:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:05
Start Date: 17/Jul/19 23:05
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9093: [WIP] 
[BEAM-7257] [BEAM-7714] Split Python 3 postcommits into several Jenkins jobs.
URL: https://github.com/apache/beam/pull/9093
 
 
   Changes Jenkins jobs for Python 3.5, 3.6, 3.7 test suites. 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278570
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 17/Jul/19 23:00
Start Date: 17/Jul/19 23:00
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9056: [BEAM-7746] 
Add python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r304605958
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -193,7 +200,7 @@ def __repr__(self):
 self.input_types, self.output_types)
 
 
-class WithTypeHints(object):
+class WithTypeHints(Generic[InT, OutT]):
 
 Review comment:
   True.  I think the somewhat unsatisfactory answer is that for the time being 
you need both: one for runtime checking and the other for static checking, 
until such a time as they can become the same.  I think trying to do that all 
in one PR is going to be too much.
   
   off the top of my head, the order this should probably be done is:
   
   1. support runtime type hints using `typing` module instead of `typehints`: 
https://issues.apache.org/jira/browse/BEAM-7713
   2. add static type hints to the beam code and begin enforcing it using mypy: 
this PR (https://issues.apache.org/jira/browse/BEAM-7746) and possibly 
https://issues.apache.org/jira/browse/BEAM-7712
   3. support static validation of user pipelines (mypy plugin, etc)
   4. support runtime validations based on `typing` annotations
   
   There are a lot of "ifs" surrounding step 4.  We may need to get to 
python3-only first to avoid the pitfalls of type comments.  We may find that 
step 3 makes it less important. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278570)
Time Spent: 3h 40m  (was: 3.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-1580) Typo in bigquery_tornadoes example

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-1580?focusedWorklogId=278565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278565
 ]

ASF GitHub Bot logged work on BEAM-1580:


Author: ASF GitHub Bot
Created on: 17/Jul/19 22:55
Start Date: 17/Jul/19 22:55
Worklog Time Spent: 10m 
  Work Description: coveralls commented on issue #2390: [BEAM-1580] Fixed 
typos in the Python SDK examples. ( tornatoes -> tornadoes )
URL: https://github.com/apache/beam/pull/2390#issuecomment-290641745
 
 
   
   [![Coverage 
Status](https://coveralls.io/builds/24635781/badge)](https://coveralls.io/builds/24635781)
   
   Coverage increased (+28.0%) to 98.318% when pulling 
**1b59e33d9aa7cd4c2505b7bbfb581e22fe1bf96d on sungjunyoung:master** into 
**935ecd4e032e18e428ee33cbf5484c5fce726b4f on apache:master**.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278565)
Time Spent: 10m
Remaining Estimate: 0h

> Typo in bigquery_tornadoes example
> --
>
> Key: BEAM-1580
> URL: https://issues.apache.org/jira/browse/BEAM-1580
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Priority: Trivial
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are spelling errors in the example code (e.g. "tornatoes")



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278555
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 17/Jul/19 22:27
Start Date: 17/Jul/19 22:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#issuecomment-512591918
 
 
   I'm really curious to know what the general consensus is on this PR, 
implementation details aside.  
   
   Do you all like the idea of adding type annotations?  
   
   If I can get some subset of the current package passing, would you be 
willing to merge something like this in?  
   
   What do you see as the major blockers?
   
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278555)
Time Spent: 3.5h  (was: 3h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278550=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278550
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 17/Jul/19 22:20
Start Date: 17/Jul/19 22:20
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512590061
 
 
   Getting very slow downloads from maven central locally, too, which could be 
part of the issue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278550)
Time Spent: 1h 10m  (was: 1h)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278525
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 17/Jul/19 22:11
Start Date: 17/Jul/19 22:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512587773
 
 
   I've seen many builds failing due to dependency download issues. I will run 
this and publish a scan.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278525)
Time Spent: 1h  (was: 50m)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=278517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278517
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 17/Jul/19 22:05
Start Date: 17/Jul/19 22:05
Worklog Time Spent: 10m 
  Work Description: eddie-scio commented on issue #8457: [BEAM-3342] Create 
a Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-512586110
 
 
   Is there an ETA for landing this?  Thanks for all the work!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278517)
Time Spent: 28h  (was: 27h 50m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 28h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278512
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:56
Start Date: 17/Jul/19 21:56
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9056: [BEAM-7746] 
Add python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r304659145
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform.py
 ##
 @@ -465,56 +484,70 @@ def get_windowing(self, inputs):
 return inputs[0].windowing
 
   def __rrshift__(self, label):
+# type: (str) -> _NamedPTransform[InT, OutT]
 return _NamedPTransform(self, label)
 
   def __or__(self, right):
+# type: (PTransform[InT, OutT], PTransform[OutT, T]) -> 
_ChainedPTransform[OutT, T]
 """Used to compose PTransforms, e.g., ptransform1 | ptransform2."""
 if isinstance(right, PTransform):
   return _ChainedPTransform(self, right)
 return NotImplemented
 
-  def __ror__(self, left, label=None):
-"""Used to apply this PTransform to non-PValues, e.g., a tuple."""
-pvalueish, pvalues = self._extract_input_pvalues(left)
-pipelines = [v.pipeline for v in pvalues if isinstance(v, pvalue.PValue)]
-if pvalues and not pipelines:
-  deferred = False
+  if not typing.TYPE_CHECKING:
 
 Review comment:
   ah, sorry, I missed the context here.  yeah, that would be a nice feature.  
would be worth bringing up at the mypy github repo.
   
   note that this change is to accommodate analyzing user pipelines via the 
mypy plugin, which  I'm now thinking would be best to separate into another PR. 
  With some more hacking, I might ultimately be able to avoid this bit of 
ugliness.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278512)
Time Spent: 3h 20m  (was: 3h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7764) Dataflow run fails when service account is not set.

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7764?focusedWorklogId=278509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278509
 ]

ASF GitHub Bot logged work on BEAM-7764:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:50
Start Date: 17/Jul/19 21:50
Worklog Time Spent: 10m 
  Work Description: potatogopher commented on pull request #9092: 
[BEAM-7764] Add the ability to set the service account email for dataflow jobs
URL: https://github.com/apache/beam/pull/9092
 
 
   The dataflow runner is not setting the service account for the job that is 
being set up. This causes failures when trying to deploy.
   
   ```
Workflow failed. Causes: There was a problem refreshing your credentials. 
Please check:
   1. Dataflow API is enabled for your project.
   2. There is a robot service account for your project:
   service-[project 
number]@dataflow-service-producer-prod.iam.gserviceaccount.com should have 
access to your project. If this account does not appear in the permissions tab 
for your project, contact Dataflow support.
   ```
   
   Adding a flag to set the service account will fix this issue.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Created] (BEAM-7764) Dataflow run fails when service account is not set.

2019-07-17 Thread Nick Rucci (JIRA)
Nick Rucci created BEAM-7764:


 Summary: Dataflow run fails when service account is not set.
 Key: BEAM-7764
 URL: https://issues.apache.org/jira/browse/BEAM-7764
 Project: Beam
  Issue Type: Bug
  Components: runner-dataflow, sdk-go
Reporter: Nick Rucci


The dataflow runner is not setting the service account for the job that is 
being set up. This causes failures when trying to deploy.

```
 Workflow failed. Causes: There was a problem refreshing your credentials. 
Please check:
1. Dataflow API is enabled for your project.
2. There is a robot service account for your project:
service-[project number]@dataflow-service-producer-prod.iam.gserviceaccount.com 
should have access to your project. If this account does not appear in the 
permissions tab for your project, contact Dataflow support.
```

Adding a flag to set the service account will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=278499=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278499
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:40
Start Date: 17/Jul/19 21:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-512579507
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278499)
Time Spent: 22h  (was: 21h 50m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 22h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278490
 ]

ASF GitHub Bot logged work on BEAM-7726:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:22
Start Date: 17/Jul/19 21:22
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #9080:  [BEAM-7726] 
Implement State Backed Iterables in Go SDK
URL: https://github.com/apache/beam/pull/9080#issuecomment-512573799
 
 
   Run Go PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278490)
Time Spent: 1h 10m  (was: 1h)

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=278489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278489
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:19
Start Date: 17/Jul/19 21:19
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r304646263
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform.py
 ##
 @@ -465,56 +484,70 @@ def get_windowing(self, inputs):
 return inputs[0].windowing
 
   def __rrshift__(self, label):
+# type: (str) -> _NamedPTransform[InT, OutT]
 return _NamedPTransform(self, label)
 
   def __or__(self, right):
+# type: (PTransform[InT, OutT], PTransform[OutT, T]) -> 
_ChainedPTransform[OutT, T]
 """Used to compose PTransforms, e.g., ptransform1 | ptransform2."""
 if isinstance(right, PTransform):
   return _ChainedPTransform(self, right)
 return NotImplemented
 
-  def __ror__(self, left, label=None):
-"""Used to apply this PTransform to non-PValues, e.g., a tuple."""
-pvalueish, pvalues = self._extract_input_pvalues(left)
-pipelines = [v.pipeline for v in pvalues if isinstance(v, pvalue.PValue)]
-if pvalues and not pipelines:
-  deferred = False
+  if not typing.TYPE_CHECKING:
 
 Review comment:
   I don't think you can decorate an import, but this is a method.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278489)
Time Spent: 3h 10m  (was: 3h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278485=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278485
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:17
Start Date: 17/Jul/19 21:17
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8899: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8899
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278485)
Time Spent: 4h  (was: 3h 50m)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7680) synthetic_pipeline_test.py flaky

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7680?focusedWorklogId=278477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278477
 ]

ASF GitHub Bot logged work on BEAM-7680:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:10
Start Date: 17/Jul/19 21:10
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8993: [BEAM-7680] Skip 
flaky tests
URL: https://github.com/apache/beam/pull/8993
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278477)
Time Spent: 2h 20m  (was: 2h 10m)

> synthetic_pipeline_test.py flaky
> 
>
> Key: BEAM-7680
> URL: https://issues.apache.org/jira/browse/BEAM-7680
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code:java}
> 11:51:43 FAIL: testSyntheticSDFStep 
> (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest)
> 11:51:43 
> --
> 11:51:43 Traceback (most recent call last):
> 11:51:43   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py",
>  line 82, in testSyntheticSDFStep
> 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed)
> 11:51:43 AssertionError: False is not true : 3.659700632095337{code}
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull]
>  
> Two flaky TODOs: 
> [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7680) synthetic_pipeline_test.py flaky

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7680?focusedWorklogId=278478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278478
 ]

ASF GitHub Bot logged work on BEAM-7680:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:10
Start Date: 17/Jul/19 21:10
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #8993: [BEAM-7680] Skip flaky 
tests
URL: https://github.com/apache/beam/pull/8993#issuecomment-512569941
 
 
   Sure @kkucharc, this was supposed to be a quick fix until the flakiness is 
fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278478)
Time Spent: 2.5h  (was: 2h 20m)

> synthetic_pipeline_test.py flaky
> 
>
> Key: BEAM-7680
> URL: https://issues.apache.org/jira/browse/BEAM-7680
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Kasia Kucharczyk
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> {code:java}
> 11:51:43 FAIL: testSyntheticSDFStep 
> (apache_beam.testing.synthetic_pipeline_test.SyntheticPipelineTest)
> 11:51:43 
> --
> 11:51:43 Traceback (most recent call last):
> 11:51:43   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/synthetic_pipeline_test.py",
>  line 82, in testSyntheticSDFStep
> 11:51:43 self.assertTrue(0.5 <= elapsed <= 3, elapsed)
> 11:51:43 AssertionError: False is not true : 3.659700632095337{code}
> [https://builds.apache.org/job/beam_PreCommit_Python_Cron/1502/consoleFull]
>  
> Two flaky TODOs: 
> [https://github.com/apache/beam/blob/b79f24ced1c8519c29443ea7109c59ad18be2ebe/sdks/python/apache_beam/testing/synthetic_pipeline_test.py#L69-L82]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7499) ReifyTest.test_window fails in DirectRunner due to 'assign_context.window should not be None.'

2019-07-17 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-7499.
-
   Resolution: Fixed
Fix Version/s: 2.15.0

> ReifyTest.test_window fails in DirectRunner due to 'assign_context.window 
> should not be None.'
> --
>
> Key: BEAM-7499
> URL: https://issues.apache.org/jira/browse/BEAM-7499
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, test-failures
>Reporter: Luke Cwik
>Assignee: Pablo Estrada
>Priority: Minor
> Fix For: 2.15.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
>  
> [PR 8717|https://github.com/apache/beam/pull/8717] added 
> ReifyWindow.test_window which fails on the DirectRunner.
> {code:java}
> ERROR:root:Exception at bundle 
> , 
> due to an exception.
>  Traceback (most recent call last):
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 453, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  output_processor.process_outputs(
>  File "apache_beam/runners/common.py", line 915, in 
> apache_beam.runners.common._OutputProcessor.process_outputs
>  self.window_fn.assign(assign_context))
>  File "apache_beam/transforms/util.py", line 557, in assign
>  'assign_context.window should not be None. '
> ValueError: assign_context.window should not be None. This might be due to a 
> DoFn returning a TimestampedValue. [while running 'add_timestamps2']
> Traceback (most recent call last):
>  File "apache_beam/transforms/util_test.py", line 501, in test_window
>  assert_that(reified_pc, equal_to(expected), reify_windows=True)
>  File "apache_beam/pipeline.py", line 426, in __exit__
>  self.run().wait_until_finish()
>  File "apache_beam/testing/test_pipeline.py", line 109, in run
>  state = result.wait_until_finish()
>  File "apache_beam/runners/direct/direct_runner.py", line 430, in 
> wait_until_finish
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 400, in await_completion
>  self._executor.await_completion()
>  File "apache_beam/runners/direct/executor.py", line 446, in await_completion
>  raise_(t, v, tb)
>  File "apache_beam/runners/direct/executor.py", line 343, in call
>  finish_state)
>  File "apache_beam/runners/direct/executor.py", line 380, in attempt_call
>  evaluator.process_element(value)
>  File "apache_beam/runners/direct/transform_evaluator.py", line 636, in 
> process_element
>  self.runner.process(element)
>  File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>  def process(self, windowed_value):
>  File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
>  self._reraise_augmented(exn)
>  File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>  raise_with_traceback(new_exn)
>  File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>  return self.do_fn_invoker.invoke_process(windowed_value)
>  File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>  windowed_value, self.process_method(windowed_value.value))
>  File "apache_beam/transforms/core.py", line 1292, in 
>  wrapper = lambda x: [fn(x)]
>  File "apache_beam/testing/util.py", line 129, in _equal
>  'Failed assert: %r == %r' % (sorted_expected, sorted_actual))
> BeamAssertException: Failed assert: [TestWindowedValue(value=('a', 100, 
> GlobalWindow), timestamp=100, windows=[GlobalWindow]), 
> TestWindowedValue(value=('b', 200, GlobalWindow), timestamp=200, 
> windows=[GlobalWindow]), TestWindowedValue(value=('c', 300, GlobalWindow), 
> timestamp=300, windows=[GlobalWindow])] == [TestWindowedValue(value=(('a', 
> 100.0, (GlobalWindow,), PaneInfo(first: True, last: True, 

[jira] [Updated] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle

2019-07-17 Thread Udi Meiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-7763:

Status: Open  (was: Triage Needed)

> Python DirectRunner _PubSubReadEvaluator creates new client per bundle
> --
>
> Key: BEAM-7763
> URL: https://issues.apache.org/jira/browse/BEAM-7763
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Major
>  Labels: easy
>
> Lots of credential fetches.
> Similar to https://issues.apache.org/jira/browse/BEAM-2264
> but in this case the DirectRunner implementation seems to be creating a new 
> client for each bundle:
> https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474
> From: 
> https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278476
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:07
Start Date: 17/Jul/19 21:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8899: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8899#issuecomment-512569114
 
 
   Please self merge. Have two minor comments:
   1. We used to supress spotbugs warnings via a filters exclusion file, 
probably worth to keep that for consistency, but we can do that after in a 
subsequent PR.
   2. I really did not understand why it now complains to add a `@Nonnull` 
annotation, that's a bit of a bummer if we need to do this explicit, but 
specially I did not get why it does not complain in other parts (luckily maybe).
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278476)
Time Spent: 3h 50m  (was: 3h 40m)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-4948) Beam Dependency Update Request: com.google.guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-4948?focusedWorklogId=278474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278474
 ]

ASF GitHub Bot logged work on BEAM-4948:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:06
Start Date: 17/Jul/19 21:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8899: [BEAM-4948, 
BEAM-6267, BEAM-5559, BEAM-7289] Update the version of guava to 26.0-jre for 
all our vendored artifacts containing guava
URL: https://github.com/apache/beam/pull/8899#issuecomment-512568693
 
 
   Thanks Luke! LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278474)
Time Spent: 3h 40m  (was: 3.5h)

> Beam Dependency Update Request: com.google.guava
> 
>
> Key: BEAM-4948
> URL: https://issues.apache.org/jira/browse/BEAM-4948
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> 2018-07-25 20:28:03.628639
> Please review and upgrade the com.google.guava to the latest version 
> None 
>  
> cc: 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7262) LTS backport: normalize httplib2.Http initialization and usage

2019-07-17 Thread Kenneth Knowles (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-7262.
---
Resolution: Fixed

> LTS backport: normalize httplib2.Http initialization and usage
> --
>
> Key: BEAM-7262
> URL: https://issues.apache.org/jira/browse/BEAM-7262
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Ideally solve both issues below in one PR, but issue 1 has priority as it can 
> halt a pipeline.
> Issue 1:
> Datastore client (and other httplib2-based clients for GCS, Dataflow, 
> BigQuery, etc.) doesn't set a socket timeout.
> This can cause _flush_batch() in datastoreio.py to block forever waiting for 
> a response.
> This issue is very similar to https://issues.apache.org/jira/browse/BEAM-5915 
> and the solution should be similar.
> Issue 2:
> Standardize use of proxy environment settings, as in gcsio:
> https://github.com/apache/beam/blob/8d3389df78aa2e0a0de06b7c5743ca3530dec4ac/sdks/python/apache_beam/io/gcp/gcsio.py#L136
> Issue for proxy settings: https://issues.apache.org/jira/browse/BEAM-3184



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7722) Simplify running of Beam Python on Flink

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7722?focusedWorklogId=278472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278472
 ]

ASF GitHub Bot logged work on BEAM-7722:


Author: ASF GitHub Bot
Created on: 17/Jul/19 21:03
Start Date: 17/Jul/19 21:03
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9043: [BEAM-7722] Add a 
Python FlinkRunner that fetches and uses released artifacts.
URL: https://github.com/apache/beam/pull/9043#issuecomment-512567781
 
 
   This looks like a great step toward making the portable Flink runner more 
usable.
   
   Is it premature to update the documentation along with this PR? 
https://beam.apache.org/documentation/runners/flink/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278472)
Time Spent: 4h 20m  (was: 4h 10m)

> Simplify running of Beam Python on Flink
> 
>
> Key: BEAM-7722
> URL: https://issues.apache.org/jira/browse/BEAM-7722
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently this requires building and running several processes. We should be 
> able to automate most of this away. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-07-17 Thread Yueyang Qiu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yueyang Qiu updated BEAM-7013:
--
Fix Version/s: 2.15.0

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7641) collect statistics about python ITs

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7641?focusedWorklogId=278469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278469
 ]

ASF GitHub Bot logged work on BEAM-7641:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:52
Start Date: 17/Jul/19 20:52
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #8952: [BEAM-7641] 
Collect xunit statistics for Py ITs
URL: https://github.com/apache/beam/pull/8952
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278469)
Time Spent: 5h  (was: 4h 50m)

> collect statistics about python ITs
> ---
>
> Key: BEAM-7641
> URL: https://issues.apache.org/jira/browse/BEAM-7641
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently ITs don't generate xunit (nosetests.xml) files.
> Having this data will make it easier to see which tests failed in a 
> pre/postcommit run, and to tell if a particular test is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=278468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278468
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:50
Start Date: 17/Jul/19 20:50
Worklog Time Spent: 10m 
  Work Description: jklukas commented on issue #8945: [BEAM-5191] Support 
for BigQuery clustering
URL: https://github.com/apache/beam/pull/8945#issuecomment-512563458
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278468)
Time Spent: 13h 40m  (was: 13.5h)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Wout Scheepers
>Priority: Minor
>  Labels: features, newbie
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=278467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278467
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:50
Start Date: 17/Jul/19 20:50
Worklog Time Spent: 10m 
  Work Description: jklukas commented on issue #8945: [BEAM-5191] Support 
for BigQuery clustering
URL: https://github.com/apache/beam/pull/8945#issuecomment-512563416
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278467)
Time Spent: 13.5h  (was: 13h 20m)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Wout Scheepers
>Priority: Minor
>  Labels: features, newbie
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7641) collect statistics about python ITs

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7641?focusedWorklogId=278462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278462
 ]

ASF GitHub Bot logged work on BEAM-7641:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:38
Start Date: 17/Jul/19 20:38
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8952: [BEAM-7641] Collect 
xunit statistics for Py ITs
URL: https://github.com/apache/beam/pull/8952#issuecomment-512559185
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278462)
Time Spent: 4h 50m  (was: 4h 40m)

> collect statistics about python ITs
> ---
>
> Key: BEAM-7641
> URL: https://issues.apache.org/jira/browse/BEAM-7641
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently ITs don't generate xunit (nosetests.xml) files.
> Having this data will make it easier to see which tests failed in a 
> pre/postcommit run, and to tell if a particular test is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7079) Run Chicago Taxi Example on Dataflow

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7079?focusedWorklogId=278460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278460
 ]

ASF GitHub Bot logged work on BEAM-7079:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:30
Start Date: 17/Jul/19 20:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8939: [BEAM-7079] Add 
Chicago Taxi Example running on Dataflow
URL: https://github.com/apache/beam/pull/8939#issuecomment-512556486
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278460)
Time Spent: 21h 50m  (was: 21h 40m)

> Run Chicago Taxi Example on Dataflow
> 
>
> Key: BEAM-7079
> URL: https://issues.apache.org/jira/browse/BEAM-7079
> Project: Beam
>  Issue Type: Test
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 21h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call

2019-07-17 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387
 ] 

Udi Meiri edited comment on BEAM-2264 at 7/17/19 8:16 PM:
--

And also affects pubsub (under directrunner): 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298

Created https://issues.apache.org/jira/browse/BEAM-7763


was (Author: udim):
And also affects pubsub (under directrunner): 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298

> Re-use credential instead of generating a new one one each GCS call
> ---
>
> Key: BEAM-2264
> URL: https://issues.apache.org/jira/browse/BEAM-2264
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We should cache the credential used within a Pipeline and re-use it instead 
> of generating a new one on each GCS call. When executing (against 2.0.0 RC2):
> {code}
> python -m apache_beam.examples.wordcount --input 
> "gs://dataflow-samples/shakespeare/*" --output local_counts
> {code}
> Note that we seemingly generate a new access token each time instead of when 
> a refresh is required.
> {code}
>   super(GcsIO, cls).__new__(cls, storage_client))
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 1 files. Estimation 
> took 0.286200046539 seconds
> INFO:root:Running pipeline with DirectRunner.
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 43 files. Estimation 
> took 0.205624818802 seconds
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> ... many more times ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7763) Python DirectRunner _PubSubReadEvaluator creates new client per bundle

2019-07-17 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7763:
---

 Summary: Python DirectRunner _PubSubReadEvaluator creates new 
client per bundle
 Key: BEAM-7763
 URL: https://issues.apache.org/jira/browse/BEAM-7763
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri


Lots of credential fetches.
Similar to https://issues.apache.org/jira/browse/BEAM-2264
but in this case the DirectRunner implementation seems to be creating a new 
client for each bundle:
https://github.com/apache/beam/blob/d5d7a7b7d0408d8435031e7bfce1abe2227115f5/sdks/python/apache_beam/runners/direct/transform_evaluator.py#L474

From: 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278451
 ]

ASF GitHub Bot logged work on BEAM-7545:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:05
Start Date: 17/Jul/19 20:05
Worklog Time Spent: 10m 
  Work Description: akedin commented on issue #9040: [BEAM-7545] Reordering 
Beam Joins
URL: https://github.com/apache/beam/pull/9040#issuecomment-512548035
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278451)
Time Spent: 9h 40m  (was: 9.5h)

> Row Count Estimation for CSV TextTable
> --
>
> Key: BEAM-7545
> URL: https://issues.apache.org/jira/browse/BEAM-7545
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Implementing Row Count Estimation for CSV Tables by reading the first few 
> lines of the file and estimating the number of records based on the length of 
> these lines and the total length of the file.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7545) Row Count Estimation for CSV TextTable

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=278450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278450
 ]

ASF GitHub Bot logged work on BEAM-7545:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:03
Start Date: 17/Jul/19 20:03
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #9040: [BEAM-7545] 
Reordering Beam Joins
URL: https://github.com/apache/beam/pull/9040#discussion_r304617399
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rule/JoinReorderingTest.java
 ##
 @@ -0,0 +1,462 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rule;
+
+import java.math.BigInteger;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Map;
+import java.util.function.Function;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamRuleSets;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.meta.provider.test.TestTableProvider;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableMap;
+import org.apache.calcite.DataContext;
+import org.apache.calcite.adapter.enumerable.EnumerableConvention;
+import org.apache.calcite.adapter.enumerable.EnumerableRules;
+import org.apache.calcite.linq4j.Enumerable;
+import org.apache.calcite.linq4j.Linq4j;
+import org.apache.calcite.plan.ConventionTraitDef;
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelCollationTraitDef;
+import org.apache.calcite.rel.RelCollations;
+import org.apache.calcite.rel.RelFieldCollation;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.RelRoot;
+import org.apache.calcite.rel.core.Join;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.rules.JoinCommuteRule;
+import org.apache.calcite.rel.rules.SortProjectTransposeRule;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rel.type.RelDataTypeFactory;
+import org.apache.calcite.schema.ScannableTable;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.Statistic;
+import org.apache.calcite.schema.Statistics;
+import org.apache.calcite.schema.Table;
+import org.apache.calcite.schema.impl.AbstractSchema;
+import org.apache.calcite.schema.impl.AbstractTable;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.sql.parser.SqlParser;
+import org.apache.calcite.tools.FrameworkConfig;
+import org.apache.calcite.tools.Frameworks;
+import org.apache.calcite.tools.Planner;
+import org.apache.calcite.tools.Programs;
+import org.apache.calcite.tools.RuleSet;
+import org.apache.calcite.tools.RuleSets;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * This test ensures that we are reordering joins and get a plan similar to 
Join(large,Join(small,
+ * medium)) instead of Join(small, Join(medium,large).
+ */
+public class JoinReorderingTest {
+  private final PipelineOptions defaultPipelineOptions = 
PipelineOptionsFactory.create();
+
+  @Test
+  public void testTableSizes() {
+TestTableProvider tableProvider = new TestTableProvider();
+createThreeTables(tableProvider);
+
+Assert.assertEquals(
+BigInteger.ONE,
+tableProvider
+.buildBeamSqlTable(tableProvider.getTable("small_table"))
+.getRowCount(null)
+.getRowCount());
+
+Assert.assertEquals(
+BigInteger.valueOf(3),
+tableProvider
+.buildBeamSqlTable(tableProvider.getTable("medium_table"))
+.getRowCount(null)
+.getRowCount());
+
+

[jira] [Work logged] (BEAM-7726) [Go SDK] State Backed Iterables

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7726?focusedWorklogId=278448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278448
 ]

ASF GitHub Bot logged work on BEAM-7726:


Author: ASF GitHub Bot
Created on: 17/Jul/19 20:01
Start Date: 17/Jul/19 20:01
Worklog Time Spent: 10m 
  Work Description: lostluck commented on issue #9080:  [BEAM-7726] 
Implement State Backed Iterables in Go SDK
URL: https://github.com/apache/beam/pull/9080#issuecomment-512546674
 
 
   R: @youngoli 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278448)
Time Spent: 1h  (was: 50m)

> [Go SDK] State Backed Iterables
> ---
>
> Key: BEAM-7726
> URL: https://issues.apache.org/jira/browse/BEAM-7726
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: Not applicable
>Reporter: Robert Burke
>Assignee: Robert Burke
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Go SDK should support the State backed iterables protocol per the proto.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L644]
>  
> Primary case is for iterables after CoGBKs.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call

2019-07-17 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387
 ] 

Udi Meiri edited comment on BEAM-2264 at 7/17/19 8:00 PM:
--

And also affects pubsub (under directrunner): 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298


was (Author: udim):
And also affects pubsub: 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298

> Re-use credential instead of generating a new one one each GCS call
> ---
>
> Key: BEAM-2264
> URL: https://issues.apache.org/jira/browse/BEAM-2264
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We should cache the credential used within a Pipeline and re-use it instead 
> of generating a new one on each GCS call. When executing (against 2.0.0 RC2):
> {code}
> python -m apache_beam.examples.wordcount --input 
> "gs://dataflow-samples/shakespeare/*" --output local_counts
> {code}
> Note that we seemingly generate a new access token each time instead of when 
> a refresh is required.
> {code}
>   super(GcsIO, cls).__new__(cls, storage_client))
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 1 files. Estimation 
> took 0.286200046539 seconds
> INFO:root:Running pipeline with DirectRunner.
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 43 files. Estimation 
> took 0.205624818802 seconds
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> ... many more times ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-2264) Re-use credential instead of generating a new one one each GCS call

2019-07-17 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887387#comment-16887387
 ] 

Udi Meiri commented on BEAM-2264:
-

And also affects pubsub: 
https://stackoverflow.com/questions/57010426/dataflow-access-to-pubsub-access-tokens/57083298

> Re-use credential instead of generating a new one one each GCS call
> ---
>
> Key: BEAM-2264
> URL: https://issues.apache.org/jira/browse/BEAM-2264
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Luke Cwik
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We should cache the credential used within a Pipeline and re-use it instead 
> of generating a new one on each GCS call. When executing (against 2.0.0 RC2):
> {code}
> python -m apache_beam.examples.wordcount --input 
> "gs://dataflow-samples/shakespeare/*" --output local_counts
> {code}
> Note that we seemingly generate a new access token each time instead of when 
> a refresh is required.
> {code}
>   super(GcsIO, cls).__new__(cls, storage_client))
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 1 files. Estimation 
> took 0.286200046539 seconds
> INFO:root:Running pipeline with DirectRunner.
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 43 files. Estimation 
> took 0.205624818802 seconds
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> ... many more times ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278442
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 17/Jul/19 19:54
Start Date: 17/Jul/19 19:54
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512544455
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278442)
Time Spent: 50m  (was: 40m)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6972) LTS backport: CassandraIO is broken because of use of bad relocation of guava

2019-07-17 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6972?focusedWorklogId=278441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-278441
 ]

ASF GitHub Bot logged work on BEAM-6972:


Author: ASF GitHub Bot
Created on: 17/Jul/19 19:54
Start Date: 17/Jul/19 19:54
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9064: [BEAM-6972] 2.7.1 
LTS cherrypick: fix guava shading for Guava in CassandraIO
URL: https://github.com/apache/beam/pull/9064#issuecomment-512544378
 
 
   Failures in the gradle console log appear to be infrastructural. Have 
manually confirmed targets.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 278441)
Time Spent: 40m  (was: 0.5h)

> LTS backport: CassandraIO is broken because of use of bad relocation of guava
> -
>
> Key: BEAM-6972
> URL: https://issues.apache.org/jira/browse/BEAM-6972
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra
>Affects Versions: 2.5.0, 2.6.0, 2.7.0, 2.8.0, 2.9.0, 2.10.0, 2.11.0
>Reporter: Arun sethia
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.7.1
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While using apache beam to run dataflow job to read data from BigQuery and 
> Store/Write to Cassandra with following libaries:
>  # beam-sdks-java-io-cassandra - 2.6.0
>  # beam-sdks-java-io-jdbc - 2.6.0
>  # beam-sdks-java-io-google-cloud-platform - 2.6.0
>  # beam-sdks-java-core - 2.6.0
>  # google-cloud-dataflow-java-sdk-all - 2.5.0
>  # google-api-client -1.25.0
>  
> I am getting following error at the time insert/save data to Cassandra.
> {code:java}
> [error] (run-main-0) org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> com.datastax.driver.mapping.Mapper.saveAsync(Ljava/lang/Object;)Lorg/apache/beam/repackaged/beam_sdks_java_io_cassandra/com/google/common/util/concurrent/ListenableFuture;
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:332)
>  at 
> org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:302)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:197)
>  at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   3   >