date:20191107

[jira] [Commented] (BEAM-8207) KafkaIOITs generate different hashes each run, sometimes dropping records

2019-11-07 Thread Michal Walenia (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969909#comment-16969909
 ] 

Michal Walenia commented on BEAM-8207:
--

[~aromanenko] Yes, the issue can be closed now. Thanks!

> KafkaIOITs generate different hashes each run, sometimes dropping records
> -
>
> Key: BEAM-8207
> URL: https://issues.apache.org/jira/browse/BEAM-8207
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka, testing
>Reporter: Michal Walenia
>Priority: Major
>
> While working to adapt Java's KafkaIOIT to work with a large dataset 
> generated by a SyntheticSource I encountered a problem. I want to push 100M 
> records through a Kafka topic, verify data correctness and at the same time 
> check the performance of KafkaIO.Write and KafkaIO.Read.
>  
> To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam 
> repo 
> ([here|https://github.com/apache/beam/tree/master/.test-infra/kubernetes/kafka-cluster]).
>  
> The expected result would be that first the records are generated in a 
> deterministic way (using hashes of list positions as Random seeds), next they 
> are written to Kafka - this concludes the write pipeline.
> As for reading and correctness checking - first, the data is read from the 
> topic and after being decoded into String representations, a hashcode of the 
> whole PCollection is calculated (For details, check KafkaIOIT.java).
>  
> During the testing I ran into several problems:
> 1. When all the records are read from the Kafka topic, the hash is different 
> each time.
> 2. Sometimes not all the records are read and the Dataflow task waits for the 
> input indefinitely, occasionally throwing exceptions.
>  
> I believe there are two possible causes of this behavior:
>  
> either there is something wrong with the Kafka cluster configuration
> or KafkaIO behaves erratically on high data volumes, duplicating and/or 
> dropping records.
> Second option seems troubling and I would be grateful for help with the first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340376
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 07:09
Start Date: 08/Nov/19 07:09
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9997: [BEAM-8157] Revert key 
encoding changes for state requests / improve debugging and testing
URL: https://github.com/apache/beam/pull/9997#issuecomment-551412788
 
 
   Please note the mailing list discussion. Will update the PR today with the 
desired solution. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340376)
Time Spent: 7.5h  (was: 7h 20m)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8591) Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Summary: Exception is thrown while running Beam Pipeline on Kubernetes 
Flink Cluster.  (was: Exception is thrown when running Beam Pipeline on 
Kubernetes Flink Cluster.)

> Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.
> 
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  * Setup Kubernetes Flink Cluster using Minikube: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=PortableRunner",
> "--job_endpoint=localhost:8099",
> "--environment_type=LOOPBACK"
> ])
> with beam.Pipeline(options=options) as pipeline:
> data = ["Sample data",
> "Sample data - 0",
> "Sample data - 1"]
> raw_data = (pipeline
> | 'CreateHardCodeData' >> beam.Create(data)
> | 'Map' >> beam.Map(lambda line : line + '.')
> | 'Print' >> beam.Map(print)){code}
> Verify different environment_type in Python SDK Harness Configuration
>  *environment_type=LOOPBACK*
>  # Run pipeline on local cluster: Works Fine
>  # Run pipeline on K8S cluster, Exceptions are thrown:
>  java.lang.Exception: The user defined 'open()' method caused an exception: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> UNAVAILABLE: io exception Caused by: 
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>  Connection refused: localhost/127.0.0.1:51017
> *environment_type=DOCKER*
>  # Run pipeline on local cluster: Work fine
>  # Run pipeline on K8S cluster, Exception are thrown:
>  Caused by: java.io.IOException: Cannot run program "docker": error=2, No 
> such file or directory.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
h2. Setup Clusters
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 * Setup Kubernetes Flink Cluster using Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOPBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  

  was:
h2. Setup Clusters
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 * Setup Kubernetes Flink Cluster using Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  * Setup Kubernetes Flink Cluster using Minikube: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from

[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340371
 ]

ASF GitHub Bot logged work on BEAM-8157:


Author: ASF GitHub Bot
Created on: 08/Nov/19 06:49
Start Date: 08/Nov/19 06:49
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9997: [BEAM-8157] 
Revert key encoding changes for state requests / improve debugging and testing
URL: https://github.com/apache/beam/pull/9997#issuecomment-551407703
 
 
   Sorry for the late reply @mxm, I would like to check it now. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340371)
Time Spent: 7h 20m  (was: 7h 10m)

> Key encoding for state requests is not consistent across SDKs
> -
>
> Key: BEAM-8157
> URL: https://issues.apache.org/jira/browse/BEAM-8157
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Affects Versions: 2.13.0
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Critical
> Fix For: 2.17.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> The Flink runner requires the internal key to be encoded without a length 
> prefix (OUTER context). The user state request handler exposes a serialized 
> version of the key to the Runner. This key is encoded with the NESTED context 
> which may add a length prefix. We need to convert it to OUTER context to 
> match the Flink runner's key encoding.
> So far this has not caused the Flink Runner to behave incorrectly. However, 
> with the upcoming support for Flink 1.9, the state backend will not accept 
> requests for keys not part of any key group/partition of the operator. This 
> is very likely to happen with the encoding not being consistent.
> **NOTE** This is only applicable to the Java SDK, as the Python SDK uses 
> OUTER encoding for the key in state requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8594) Remove unnecessary error check of the control service accessing in DataFlow Runner

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8594?focusedWorklogId=340369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340369
 ]

ASF GitHub Bot logged work on BEAM-8594:


Author: ASF GitHub Bot
Created on: 08/Nov/19 06:46
Start Date: 08/Nov/19 06:46
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10039: 
[BEAM-8594] Remove unnecessary error check in DataFlow Runner
URL: https://github.com/apache/beam/pull/10039
 
 
   Currently there are a few places in the DataFlow Runner which checks if 
there is error reported when accessing the SDK harness's control service. 
Actually, the error reported by the SDK harness has already been handled in the 
[FnApiControlClient](https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L152).
 There is no need to check it anymore.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build

[jira] [Created] (BEAM-8594) Remove unnecessary error check of the control service accessing in DataFlow Runner

2019-11-07 Thread sunjincheng (Jira)

sunjincheng created BEAM-8594:
-

 Summary: Remove unnecessary error check of the control service 
accessing in DataFlow Runner
 Key: BEAM-8594
 URL: https://issues.apache.org/jira/browse/BEAM-8594
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: sunjincheng
Assignee: sunjincheng
 Fix For: 2.18.0


Currently there are a few places in the DataFlow Runner which checks if there 
is error reported when accessing the SDK harness's control service. Actually, 
the error reported by the SDK harness has already been handled in the 
[FnApiControlClient|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L152].
 There is no need to check it anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=340353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340353
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 08/Nov/19 05:09
Start Date: 08/Nov/19 05:09
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#issuecomment-551386849
 
 
   Hi @lukecwik, Could you please have another look, any comment is welcome, 
Thanks! :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340353)
Time Spent: 50m  (was: 40m)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=340335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340335
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 08/Nov/19 04:04
Start Date: 08/Nov/19 04:04
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-551375497
 
 
   run sql precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340335)
Remaining Estimate: 0h
Time Spent: 10m

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8593) Define expected behavior of running ZetaSQL query on tables with unsupported field types

2019-11-07 Thread Yueyang Qiu (Jira)

Yueyang Qiu created BEAM-8593:
-

 Summary: Define expected behavior of running ZetaSQL query on 
tables with unsupported field types
 Key: BEAM-8593
 URL: https://issues.apache.org/jira/browse/BEAM-8593
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql-zetasql
Reporter: Yueyang Qiu
Assignee: Yueyang Qiu


What should be the expected behavior if a user run a ZetaSQL query on a table 
with a field type (e.g. MAP) that is not supported by ZetaSQL?

More context: [https://github.com/apache/beam/pull/10020#issuecomment-551368105]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8566) Checkpoint buffer is flushed prematurely when another bundle is started during checkpointing

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8566?focusedWorklogId=340331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340331
 ]

ASF GitHub Bot logged work on BEAM-8566:


Author: ASF GitHub Bot
Created on: 08/Nov/19 03:44
Start Date: 08/Nov/19 03:44
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #10008: [BEAM-8566] Do 
not swallow execution errors during checkpointing
URL: https://github.com/apache/beam/pull/10008#discussion_r343978810
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java
 ##
 @@ -763,12 +763,19 @@ public final void snapshotState(StateSnapshotContext 
context) throws Exception {
 
 // We can't output here anymore because the checkpoint barrier has already 
been
 // sent downstream. This is going to change with 1.6/1.7's 
prepareSnapshotBarrier.
-outputManager.openBuffer();
-// Ensure that no new bundle gets started as part of finishing a bundle
-while (bundleStarted.get()) {
-  invokeFinishBundle();
+try {
+  outputManager.openBuffer();
+  // Ensure that no new bundle gets started as part of finishing a bundle
+  while (bundleStarted.get()) {
+invokeFinishBundle();
+  }
+  outputManager.closeBuffer();
+} catch (Exception e) {
+  // Any regular exception during checkpointing will be tolerated by Flink 
because those
 
 Review comment:
   ```suggestion
 // https://jira.apache.org/jira/browse/FLINK-14653
 // Any regular exception during checkpointing will be tolerated by 
Flink because those
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340331)
Time Spent: 1h 50m  (was: 1h 40m)

> Checkpoint buffer is flushed prematurely when another bundle is started 
> during checkpointing
> 
>
> Key: BEAM-8566
> URL: https://issues.apache.org/jira/browse/BEAM-8566
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As part of a checkpoint, the current bundle is finalized. When the bundle is 
> finalized, the watermark, currently held back, may also be progressed which 
> can cause the start of another bundle. When a new bundle is started, any 
> to-be-buffered items from the previous bundle for the pending checkpoint may 
> be emitted. This should not happen.
> This only effects portable pipelines where we have to hold back the watermark 
> due to the asynchronous processing of elements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-07 Thread Kenneth Knowles (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969807#comment-16969807
 ] 

Kenneth Knowles commented on BEAM-8592:
---

CC [~apilloud] [~amaliujia]

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-07 Thread Kenneth Knowles (Jira)

Kenneth Knowles created BEAM-8592:
-

 Summary: DataCatalogTableProvider should not squash table 
components together into a string
 Key: BEAM-8592
 URL: https://issues.apache.org/jira/browse/BEAM-8592
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql, dsl-sql-zetasql
Reporter: Kenneth Knowles
Assignee: Kenneth Knowles


Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
representing the components \{{"foo", "baz.bar", "bizzle"}} the 
DataCatalogTableProvider will concatenate the components into a string and 
resolve the identifier as if it represented \{{"foo", "baz", "bar", "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-07 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8592:
--
Status: Open  (was: Triage Needed)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8456) Add pipeline option to control truncate of BigQuery data processed by Beam SQL

2019-11-07 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-8456:
--
Summary: Add pipeline option to control truncate of BigQuery data processed 
by Beam SQL  (was: BigQuery to Beam SQL timestamp has the wrong default: 
truncation makes the most sense)

> Add pipeline option to control truncate of BigQuery data processed by Beam SQL
> --
>
> Key: BEAM-8456
> URL: https://issues.apache.org/jira/browse/BEAM-8456
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with 
> higher-than-millisecond precision timestamps may not even realize that the 
> data source created these high precision timestamps. They are probably 
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond 
> precision, it makes sense to "just work" by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8456) Add pipeline option to control truncate of BigQuery data processed by Beam SQL

2019-11-07 Thread Kenneth Knowles (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles resolved BEAM-8456.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> Add pipeline option to control truncate of BigQuery data processed by Beam SQL
> --
>
> Key: BEAM-8456
> URL: https://issues.apache.org/jira/browse/BEAM-8456
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with 
> higher-than-millisecond precision timestamps may not even realize that the 
> data source created these high precision timestamps. They are probably 
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond 
> precision, it makes sense to "just work" by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
h2. Setup Clusters
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 * Setup Kubernetes Flink Cluster using Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  

  was:
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster using Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  * Setup Kubernetes Flink Cluster using Minikube: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster using Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  

  was:
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  
>  * Setup Kubernetes Flink Cluster using Minikube: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
{code:java}
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
"--runner=PortableRunner",
"--job_endpoint=localhost:8099",
"--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
data = ["Sample data",
"Sample data - 0",
"Sample data - 1"]
raw_data = (pipeline
| 'CreateHardCodeData' >> beam.Create(data)
| 'Map' >> beam.Map(lambda line : line + '.')
| 'Print' >> beam.Map(print)){code}
Verify different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  

  was:
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instrution: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
 import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))


 Verfiy different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  
>  * Setup Kubernetes Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instruction: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
> {code:java}
> import apache_beam as beam 
> from apache_beam.options.pipeline_options import PipelineOptions
> options = PipelineOptions([
> "--runner=PortableRunner",
>

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
h2. Setup Clusters

 
 * Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 
 * Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instrution: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
 import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))


 Verfiy different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  

  was:
h2. Setup Clusters
 # 
h2. Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]

 # Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. Using Apache Beam Flink Runner

Instrution: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
 import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))
 Verfiy different environment_type in Python SDK Harness Configuration
 *environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
 java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
 Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
  


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> h2. Setup Clusters
>  
>  * Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  
>  * Setup Kubernetes Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. Using Apache Beam Flink Runner
> Instrution: [https://beam.apache.org/documentation/runners/flink/]
> Sample Pipeline Code:
>  import apache_beam as beam from apache_beam.options.pipeline_options import 
> PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
> "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
> beam.Pipeline(options=options) as pipeline:

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
# Apache Beam and Flink Walkthrough Test

## Setup Clusters

1. Setup Local Flink Cluster: 
https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html
2. Setup Kubernetes Flink Cluster: 
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html

## Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.

## Apache Beam Flink Runner

Instrution: https://beam.apache.org/documentation/runners/flink/

Sample Pipeline Code:

```python
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
 "--runner=PortableRunner",
 "--job_endpoint=localhost:8099",
 "--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
 data = ["Sample data",
 "Sample data - 0",
 "Sample data - 1"]
 raw_data = (pipeline
 | 'CreateHardCodeData' >> beam.Create(data)
 | 'Map' >> beam.Map(lambda line : line + '.')
 | 'Print' >> beam.Map(print))
```

Verfiy different environment_type in Python SDK Harness Configuration 
**environment_type=LOOKBACK**

1. Run pipeline on local cluster: Works Fine
2. Run pipeline on K8S cluster, Exceptions are thrown: 
java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

**environment_type=DOCKER**

1. Run pipeline on local cluster: Work fine
2. Run pipeline on K8S cluster, Exception are thrown: 
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.

  was:
h2. Setup Clusters
 # Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 # Setup Kubernetes Flink Cluster with Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify
 Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache
 Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))
Verfiy different environment_type in Python SDK Harness Configuration
*environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> # Apache Beam and Flink Walkthrough Test
> ## Setup Clusters
> 1. Setup Local Flink Cluster: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html
> 2. Setup Kubernetes Flink Cluster: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html
> ## Verify Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> ## Apache Beam Flink Runner
> Instrution: https://beam.apache.org/documentation/runners/flink/
>

[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Gong updated BEAM-8591:
-
Description: 
# 
h2. Setup Clusters
 # Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 # Setup Kubernetes Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify
 Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache
 Beam Flink Runner

Instrution: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))
Verfiy different environment_type in Python SDK Harness Configuration
*environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.
 

  was:
# Apache Beam and Flink Walkthrough Test

## Setup Clusters

1. Setup Local Flink Cluster: 
https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html
2. Setup Kubernetes Flink Cluster: 
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html

## Verify Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.

## Apache Beam Flink Runner

Instrution: https://beam.apache.org/documentation/runners/flink/

Sample Pipeline Code:

```python
import apache_beam as beam 
from apache_beam.options.pipeline_options import PipelineOptions

options = PipelineOptions([
 "--runner=PortableRunner",
 "--job_endpoint=localhost:8099",
 "--environment_type=LOOPBACK"
])
with beam.Pipeline(options=options) as pipeline:
 data = ["Sample data",
 "Sample data - 0",
 "Sample data - 1"]
 raw_data = (pipeline
 | 'CreateHardCodeData' >> beam.Create(data)
 | 'Map' >> beam.Map(lambda line : line + '.')
 | 'Print' >> beam.Map(print))
```

Verfiy different environment_type in Python SDK Harness Configuration 
**environment_type=LOOKBACK**

1. Run pipeline on local cluster: Works Fine
2. Run pipeline on K8S cluster, Exceptions are thrown: 
java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception
Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

**environment_type=DOCKER**

1. Run pipeline on local cluster: Work fine
2. Run pipeline on K8S cluster, Exception are thrown: 
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.


> Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
> ---
>
> Key: BEAM-8591
> URL: https://issues.apache.org/jira/browse/BEAM-8591
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Mingliang Gong
>Priority: Major
>
> # 
> h2. Setup Clusters
>  # Setup Local Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
>  # Setup Kubernetes Flink Cluster: 
> [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]
> h2. 
> [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify
>  Clusters
> Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both 
> Local and K8S Flink Cluster work fine.
> h2. 
>

[jira] [Created] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.

2019-11-07 Thread Mingliang Gong (Jira)

Mingliang Gong created BEAM-8591:


 Summary: Exception is thrown when running Beam Pipeline on 
Kubernetes Flink Cluster.
 Key: BEAM-8591
 URL: https://issues.apache.org/jira/browse/BEAM-8591
 Project: Beam
  Issue Type: Bug
  Components: runner-flink
Reporter: Mingliang Gong


h2. Setup Clusters
 # Setup Local Flink Cluster: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html]
 # Setup Kubernetes Flink Cluster with Minikube: 
[https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html]

h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify
 Clusters

Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local 
and K8S Flink Cluster work fine.
h2. 
[|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache
 Beam Flink Runner

Instruction: [https://beam.apache.org/documentation/runners/flink/]

Sample Pipeline Code:
import apache_beam as beam from apache_beam.options.pipeline_options import 
PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", 
"--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with 
beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data 
- 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> 
beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> 
beam.Map(print))
Verfiy different environment_type in Python SDK Harness Configuration
*environment_type=LOOKBACK*
 # Run pipeline on local cluster: Works Fine
 # Run pipeline on K8S cluster, Exceptions are thrown:
java.lang.Exception: The user defined 'open()' method caused an exception: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
UNAVAILABLE: io exception Caused by: 
org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 Connection refused: localhost/127.0.0.1:51017

*environment_type=DOCKER*
 # Run pipeline on local cluster: Work fine
 # Run pipeline on K8S cluster, Exception are thrown:
Caused by: java.io.IOException: Cannot run program "docker": error=2, No such 
file or directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any

2019-11-07 Thread Udi Meiri (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8590:

Status: Open  (was: Triage Needed)

> Python typehints: native types: consider bare container types as containing 
> Any
> ---
>
> Key: BEAM-8590
> URL: https://issues.apache.org/jira/browse/BEAM-8590
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> This is for convert_to_beam_type:
> For example, process(element: List) is the same as process(element: 
> List[Any]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any

2019-11-07 Thread Udi Meiri (Jira)

Udi Meiri created BEAM-8590:
---

 Summary: Python typehints: native types: consider bare container 
types as containing Any
 Key: BEAM-8590
 URL: https://issues.apache.org/jira/browse/BEAM-8590
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


This is for convert_to_beam_type:
For example, process(element: List) is the same as process(element: List[Any]).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340292
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:59
Start Date: 08/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-551349679
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340292)
Time Spent: 8h 50m  (was: 8h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=340277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340277
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:25
Start Date: 08/Nov/19 01:25
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] 
Unify bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/10004#issuecomment-551342271
 
 
   Thanks for the review @mxm !
   I am appreciate if you can have a look at the PR. @aaltay @ibzib @lukecwik :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340277)
Time Spent: 3h  (was: 2h 50m)

> Unify bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args

2019-11-07 Thread Udi Meiri (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri reassigned BEAM-8588:
---

Assignee: (was: Udi Meiri)

> MapTuple(fn) fails if fn has type hints but no default args
> ---
>
> Key: BEAM-8588
> URL: https://issues.apache.org/jira/browse/BEAM-8588
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Major
>
> {code}
>   def test_typed_maptuple(self):
> def fn(e1: int, e2: int) -> int:
>   return e1 * e2
> result = [(1, 1), (2, 2)] | beam.MapTuple(fn)
> self.assertEqual([2, 4], sorted(result))
> {code}
> Fails in getcallargs_forhints_impl_py3 with:
> {code}
> >   raise TypeCheckError(e)
> E   apache_beam.typehints.decorators.TypeCheckError: too many positional 
> arguments
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8589) Add instrumentation to portable runner to print pipeline proto and options when logging level is set to Debug.

2019-11-07 Thread Valentyn Tymofieiev (Jira)

Valentyn Tymofieiev created BEAM-8589:
-

 Summary: Add instrumentation to portable runner to print pipeline 
proto and options when logging level is set to Debug.
 Key: BEAM-8589
 URL: https://issues.apache.org/jira/browse/BEAM-8589
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev


Similar capability in Dataflow runner: 
https://github.com/apache/beam/blob/90d587843172143c15ed392513e396b74569a98c/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L567.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args

2019-11-07 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969702#comment-16969702
 ] 

Udi Meiri commented on BEAM-8588:
-

[~robertwb] I'm leaving this unassigned for now.

> MapTuple(fn) fails if fn has type hints but no default args
> ---
>
> Key: BEAM-8588
> URL: https://issues.apache.org/jira/browse/BEAM-8588
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Priority: Major
>
> {code}
>   def test_typed_maptuple(self):
> def fn(e1: int, e2: int) -> int:
>   return e1 * e2
> result = [(1, 1), (2, 2)] | beam.MapTuple(fn)
> self.assertEqual([2, 4], sorted(result))
> {code}
> Fails in getcallargs_forhints_impl_py3 with:
> {code}
> >   raise TypeCheckError(e)
> E   apache_beam.typehints.decorators.TypeCheckError: too many positional 
> arguments
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8583?focusedWorklogId=340268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340268
 ]

ASF GitHub Bot logged work on BEAM-8583:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:11
Start Date: 08/Nov/19 01:11
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10030: [BEAM-8583] Big 
query filter push down
URL: https://github.com/apache/beam/pull/10030#issuecomment-55133
 
 
   R: @apilloud 
   cc: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340268)
Time Spent: 20m  (was: 10m)

> [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
> -
>
> Key: BEAM-8583
> URL: https://issues.apache.org/jira/browse/BEAM-8583
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
> Calcite 1.20.0, but is present in unreleased versions).
>  * Create a BigQueryFilter class.
>  * BigQueryTable#buildIOReader should translate supported filters into a Sql 
> string and pass it to BigQueryIO.
>  
> Potential improvements:
>  * After updating vendor Calcite, class 
> `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's 
> `BigQuerySqlDialect` can be utilized instead.
>  * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
> should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340267
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:10
Start Date: 08/Nov/19 01:10
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551338685
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340267)
Time Spent: 6h  (was: 5h 50m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340266
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:10
Start Date: 08/Nov/19 01:10
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551338685
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340266)
Time Spent: 5h 50m  (was: 5h 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args

2019-11-07 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969696#comment-16969696
 ] 

Udi Meiri commented on BEAM-8588:
-

For this example:
{code}
def fn(element1: int, element2: int, side1: str) -> int:
  ...
{code}
The input type hints for the wrapper in MapTuple should be:
{code}
Tuple[int, int], str
{code}
so there would need to be 2 separate code paths in MapTuple:
1. For type hints originating from 
typehints.decorators.IOTypeHints.from_callable(fn), which would need to combine 
the non-side-input args into a Tuple.
2. For type hints originating from decorators, which would pass the type hints 
as-is.



> MapTuple(fn) fails if fn has type hints but no default args
> ---
>
> Key: BEAM-8588
> URL: https://issues.apache.org/jira/browse/BEAM-8588
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> {code}
>   def test_typed_maptuple(self):
> def fn(e1: int, e2: int) -> int:
>   return e1 * e2
> result = [(1, 1), (2, 2)] | beam.MapTuple(fn)
> self.assertEqual([2, 4], sorted(result))
> {code}
> Fails in getcallargs_forhints_impl_py3 with:
> {code}
> >   raise TypeCheckError(e)
> E   apache_beam.typehints.decorators.TypeCheckError: too many positional 
> arguments
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340264
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:07
Start Date: 08/Nov/19 01:07
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9965: [BEAM-8539] Make job 
state transitions in python-based runners consistent with java-based runners
URL: https://github.com/apache/beam/pull/9965#issuecomment-551338061
 
 
   > Should STOPPED be PAUSED?
   
   Based on the comments describing STOPPED, I'd say yes.  But @lukecwik has 
informed me in the other PR that there are no runners that actually support 
being paused (RUNNING -> STOPPED -> RUNNING transition) , so STOPPED is not 
currently used this way. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340264)
Time Spent: 4h 10m  (was: 4h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340263
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:05
Start Date: 08/Nov/19 01:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r343949978
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -195,7 +195,7 @@ def __init__(self,
 self._state = None
 self._state_queues = []
 self._log_queues = []
-self.state = beam_job_api_pb2.JobState.STARTING
+self.state = beam_job_api_pb2.JobState.STOPPED
 
 Review comment:
   The initial state in java runners is STOPPED.  Then it transitions to 
STARTING and RUNNING.  More info at #9969.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340263)
Time Spent: 4h  (was: 3h 50m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340262
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:03
Start Date: 08/Nov/19 01:03
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r343949646
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -213,16 +213,37 @@ message JobMessagesResponse {
 // without needing to pass through STARTING.
 message JobState {
   enum Enum {
+// The job state reported by a runner cannot be interpreted by the SDK.
 UNSPECIFIED = 0;
+
+// The job has been paused, or has not yet started.
 STOPPED = 1;
+
+// The job is currently running. (terminal)
 RUNNING = 2;
+
+// The job has successfully completed. (terminal)
 DONE = 3;
+
+// The job has failed. (terminal)
 FAILED = 4;
+
+// The job has been explicitly cancelled. (terminal)
 CANCELLED = 5;
+
+// The job has been updated.
 UPDATED = 6;
+
+// The job is draining its data.
 DRAINING = 7;
+
+// The job has completed draining its data. (terminal)
 DRAINED = 8;
+
+// The job is starting up.
 STARTING = 9;
+
+// The job is cancelling.
 CANCELLING = 10;
 UPDATING = 11;
 
 Review comment:
   Ah, sorry. that was added by @lukecwik after I created this.  Will do. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340262)
Time Spent: 3h 50m  (was: 3h 40m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340261
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:02
Start Date: 08/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r343949266
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -213,16 +213,37 @@ message JobMessagesResponse {
 // without needing to pass through STARTING.
 message JobState {
   enum Enum {
+// The job state reported by a runner cannot be interpreted by the SDK.
 UNSPECIFIED = 0;
+
+// The job has been paused, or has not yet started.
 STOPPED = 1;
+
+// The job is currently running. (terminal)
 RUNNING = 2;
+
+// The job has successfully completed. (terminal)
 DONE = 3;
+
+// The job has failed. (terminal)
 FAILED = 4;
+
+// The job has been explicitly cancelled. (terminal)
 CANCELLED = 5;
+
+// The job has been updated.
 UPDATED = 6;
+
+// The job is draining its data.
 DRAINING = 7;
+
+// The job has completed draining its data. (terminal)
 DRAINED = 8;
+
+// The job is starting up.
 STARTING = 9;
+
+// The job is cancelling.
 CANCELLING = 10;
 UPDATING = 11;
 
 Review comment:
   No comment on UPDATING?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340261)
Time Spent: 3h 40m  (was: 3.5h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340260
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:02
Start Date: 08/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9965: [BEAM-8539] 
Make job state transitions in python-based runners consistent with java-based 
runners
URL: https://github.com/apache/beam/pull/9965#discussion_r343949121
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -195,7 +195,7 @@ def __init__(self,
 self._state = None
 self._state_queues = []
 self._log_queues = []
-self.state = beam_job_api_pb2.JobState.STARTING
+self.state = beam_job_api_pb2.JobState.STOPPED
 
 Review comment:
   Why this change? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340260)
Time Spent: 3.5h  (was: 3h 20m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args

2019-11-07 Thread Udi Meiri (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969686#comment-16969686
 ] 

Udi Meiri commented on BEAM-8588:
-

One subtlety here is that it seems that the correct type hint for the input 
element is a Tuple (Tuple[int, int] in this case).


> MapTuple(fn) fails if fn has type hints but no default args
> ---
>
> Key: BEAM-8588
> URL: https://issues.apache.org/jira/browse/BEAM-8588
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> {code}
>   def test_typed_maptuple(self):
> def fn(e1: int, e2: int) -> int:
>   return e1 * e2
> result = [(1, 1), (2, 2)] | beam.MapTuple(fn)
> self.assertEqual([2, 4], sorted(result))
> {code}
> Fails in getcallargs_forhints_impl_py3 with:
> {code}
> >   raise TypeCheckError(e)
> E   apache_beam.typehints.decorators.TypeCheckError: too many positional 
> arguments
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args

2019-11-07 Thread Udi Meiri (Jira)

Udi Meiri created BEAM-8588:
---

 Summary: MapTuple(fn) fails if fn has type hints but no default 
args
 Key: BEAM-8588
 URL: https://issues.apache.org/jira/browse/BEAM-8588
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


{code}
  def test_typed_maptuple(self):
def fn(e1: int, e2: int) -> int:
  return e1 * e2

result = [(1, 1), (2, 2)] | beam.MapTuple(fn)
self.assertEqual([2, 4], sorted(result))
{code}
Fails in getcallargs_forhints_impl_py3 with:
{code}
>   raise TypeCheckError(e)
E   apache_beam.typehints.decorators.TypeCheckError: too many positional 
arguments
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=340251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340251
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 08/Nov/19 00:15
Start Date: 08/Nov/19 00:15
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-551325756
 
 
   R: @robertwb 
   R: @dpmills 
   Can you review this please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340251)
Time Spent: 20m  (was: 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=340250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340250
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 08/Nov/19 00:14
Start Date: 08/Nov/19 00:14
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10035: 
[BEAM-8581] and [BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035
 
 
   The GeneralTriggerDriver does not put watermark holds on timers, leading to 
the ontime empty pane being considered late data.
   
   The DefaultTrigger and AfterWatermark do not clear their timers after the 
watermark passed the end of the endow, leading to duplicate records being 
emitted.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Updated] (BEAM-8273) Improve worker script for environment_type=PROCESS

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8273:
--
Description: 
When environment_type=PROCESS, environment_config specifies the command to run 
the worker processes. Right now, it defaults to None and errors if not set 
(`TypeError: expected string or buffer`).

It might not be feasible to offer a one-size-fits-all executable for providing 
as environment_config, but we could at least:

a) make it easier to build one (right now I only see the executable being built 
in a test script that depends on docker: 
[https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165])

b) document the process

c) link to the documentation when no environment_config is provided

  was:
When environment_type=PROCESS, environment_config specifies the command to run 
the worker processes. Right now, it defaults to None and errors if not set 
(`TypeError: expected string or buffer`).

It might not be feasible to offer a one-size-fits-all executable for providing 
as environment_config, but we could at least:

a) make it easier to build one (right now I only see the executable being built 
in a test script that depends on docker: )

b) document the process

c) link to the documentation when no environment_config is provided


> Improve worker script for environment_type=PROCESS
> --
>
> Key: BEAM-8273
> URL: https://issues.apache.org/jira/browse/BEAM-8273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> When environment_type=PROCESS, environment_config specifies the command to 
> run the worker processes. Right now, it defaults to None and errors if not 
> set (`TypeError: expected string or buffer`).
> It might not be feasible to offer a one-size-fits-all executable for 
> providing as environment_config, but we could at least:
> a) make it easier to build one (right now I only see the executable being 
> built in a test script that depends on docker: 
> [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165])
> b) document the process
> c) link to the documentation when no environment_config is provided



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8524) Stop using pubsub in fnapi streaming dataflow Impluse

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8524?focusedWorklogId=340248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340248
 ]

ASF GitHub Bot logged work on BEAM-8524:


Author: ASF GitHub Bot
Created on: 08/Nov/19 00:09
Start Date: 08/Nov/19 00:09
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10034: [BEAM-8524]  
Still use fake pubsub signals when using windmill appliance with data…
URL: https://github.com/apache/beam/pull/10034
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340248)
Time Spent: 0.5h  (was: 20m)

> Stop using pubsub in fnapi streaming dataflow Impluse
> -
>
> Key: BEAM-8524
> URL: https://issues.apache.org/jira/browse/BEAM-8524
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8521.
---
Resolution: Fixed

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8534) XlangParquetIOTest failing

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8534.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> XlangParquetIOTest failing
> --
>
> Key: BEAM-8534
> URL: https://issues.apache.org/jira/browse/BEAM-8534
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Heejong Lee
>Priority: Major
> Fix For: Not applicable
>
>
>  *13:43:05* [grpc-default-executor-1] ERROR 
> org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while 
> trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: 
> unable to deserialize Custom DoFn With Execution Info*13:43:05*   at 
> org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
>  at 
> org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
> at 
> org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
> at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
>at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
> at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
> java.io.InvalidClassException: 
> org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
> incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
> class serialVersionUID = -7141898054594373712*13:43:05*   at 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*  
>at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*
>at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
>

[jira] [Commented] (BEAM-8534) XlangParquetIOTest failing

2019-11-07 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969668#comment-16969668
 ] 

Kyle Weaver commented on BEAM-8534:
---

Fixed by [https://github.com/apache/beam/pull/10017]. 

[https://builds.apache.org/job/beam_PostCommit_XVR_Flink/]

> XlangParquetIOTest failing
> --
>
> Key: BEAM-8534
> URL: https://issues.apache.org/jira/browse/BEAM-8534
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Heejong Lee
>Priority: Major
>
>  *13:43:05* [grpc-default-executor-1] ERROR 
> org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while 
> trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: 
> unable to deserialize Custom DoFn With Execution Info*13:43:05*   at 
> org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
>  at 
> org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
> at 
> org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
> at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
>at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
> at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
> java.io.InvalidClassException: 
> org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
> incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
> class serialVersionUID = -7141898054594373712*13:43:05*   at 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*  
>at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*
>at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
>

[jira] [Commented] (BEAM-8298) Implement state caching for side inputs

2019-11-07 Thread Jing Chen (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969667#comment-16969667
 ] 

Jing Chen commented on BEAM-8298:
-

[~mxm] would you mind sharing details on the issue?

I am kinda interested on working on it if it is still free

> Implement state caching for side inputs
> ---
>
> Key: BEAM-8298
> URL: https://issues.apache.org/jira/browse/BEAM-8298
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Maximilian Michels
>Priority: Major
>
> Caching is currently only implemented for user state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8570) Use SDK version in default Java container tag

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8570.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> Use SDK version in default Java container tag
> -
>
> Key: BEAM-8570
> URL: https://issues.apache.org/jira/browse/BEAM-8570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by 
> default [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [[1] 
> https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8294.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340238
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:53
Start Date: 07/Nov/19 23:53
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #10033: [BEAM-8575] Add a 
trigger test to test Discarding accumulation mode w…
URL: https://github.com/apache/beam/pull/10033#issuecomment-551320229
 
 
   @robertwb Hi Robert, this PR is a test case to cover early data with 
Discarding accumulation mode using the test suite you added:) May I get your 
review? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340238)
Time Spent: 1h 10m  (was: 1h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340237
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:47
Start Date: 07/Nov/19 23:47
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on pull request #10033: 
[BEAM-8575] Add a trigger test to test Discarding accumulation mode w…
URL: https://github.com/apache/beam/pull/10033
 
 
   …ith early data
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Commented] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-11-07 Thread Chris Larsen (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969660#comment-16969660
 ] 

Chris Larsen commented on BEAM-8561:


[~jkff] definitely, will do :)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340235
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:43
Start Date: 07/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-551317534
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340235)
Time Spent: 8h 40m  (was: 8.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8570) Use SDK version in default Java container tag

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8570?focusedWorklogId=340236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340236
 ]

ASF GitHub Bot logged work on BEAM-8570:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:43
Start Date: 07/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10017: [BEAM-8570] Use 
SDK version in default Java container tag
URL: https://github.com/apache/beam/pull/10017
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340236)
Time Spent: 40m  (was: 0.5h)

> Use SDK version in default Java container tag
> -
>
> Key: BEAM-8570
> URL: https://issues.apache.org/jira/browse/BEAM-8570
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by 
> default [1]. This causes confusion when using locally built containers [2], 
> especially since images are automatically pulled, meaning the release image 
> is used instead of the developer's own image (BEAM-8545).
> [[1] 
> https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91]
> [[2] 
> https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=340234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340234
 ]

ASF GitHub Bot logged work on BEAM-876:
---

Author: ASF GitHub Bot
Created on: 07/Nov/19 23:41
Start Date: 07/Nov/19 23:41
Worklog Time Spent: 10m 
  Work Description: ziel commented on issue #9524: [BEAM-876] Support 
schemaUpdateOption in BigQueryIO
URL: https://github.com/apache/beam/pull/9524#issuecomment-551317214
 
 
   @pabloem I found some time to write up an integration test. 
   
   I'm seeing a failing check 
(`org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety`) which I think 
may be unrelated? I may be missing the connection though :-S  There are a 
number of lines like this in the output like: `java.lang.SecurityException: 
User name [test_user] or password is invalid.` ...so I suspect this may be some 
sort of CI setup thing.. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340234)
Time Spent: 2h 10m  (was: 2h)

> Support schemaUpdateOption in BigQueryIO
> 
>
> Key: BEAM-876
> URL: https://issues.apache.org/jira/browse/BEAM-876
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Eugene Kirpichov
>Assignee: canaan silberberg
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> BigQuery recently added support for updating the schema as a side effect of 
> the load job.
> Here is the relevant API method in JobConfigurationLoad: 
> https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List)
> BigQueryIO should support this too. See user request for this: 
> http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340233
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:41
Start Date: 07/Nov/19 23:41
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-551317048
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340233)
Time Spent: 2h  (was: 1h 50m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340232
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:41
Start Date: 07/Nov/19 23:41
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-551317000
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340232)
Time Spent: 1h 50m  (was: 1h 40m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340230
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:31
Start Date: 07/Nov/19 23:31
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#discussion_r343858825
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1953,8 +1953,10 @@ class BeamModulePlugin implements Plugin {
   }
   project.ext.addPortableWordCountTasks = {
 ->
-addPortableWordCountTask(false)
-addPortableWordCountTask(true)
+addPortableWordCountTask(false, "PortableRunner")
 
 Review comment:
   > Theoretically, the PortableRunner is a subset of FlinkRunner, but there 
are still things that can go wrong
   
   +1 As long as PortableRunner remains a publicly documented option (and it 
will probably have to for at least a little while longer) we should test it. 
Also, they are testing slightly different things -- PortableRunner starts a 
containerized job server, while FlinkRunner (when local) starts the job server 
in a Java subprocess.
   
   > Since these tasks are Flink specific, should they be renamed and possibly 
moved out of BeamModulePlugin?
   
   I'm going to add these tasks to Spark 
([eventually](https://issues.apache.org/jira/browse/BEAM-7224)).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340230)
Time Spent: 1h 40m  (was: 1.5h)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8583:

Description: 
* Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
Calcite 1.20.0, but is present in unreleased versions).
 * Create a BigQueryFilter class.
 * BigQueryTable#buildIOReader should translate supported filters into a Sql 
string and pass it to BigQueryIO.

 

Potential improvements:
 * After updating vendor Calcite, class `BigQuerySqlDialectWithTypeTranslation` 
can be deleted and Calcite's `BigQuerySqlDialect` can be utilized instead.
 * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
should be updated.

> [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
> -
>
> Key: BEAM-8583
> URL: https://issues.apache.org/jira/browse/BEAM-8583
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Add BigQuery Dialect with TypeTranslation (since it is not implemented in 
> Calcite 1.20.0, but is present in unreleased versions).
>  * Create a BigQueryFilter class.
>  * BigQueryTable#buildIOReader should translate supported filters into a Sql 
> string and pass it to BigQueryIO.
>  
> Potential improvements:
>  * After updating vendor Calcite, class 
> `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's 
> `BigQuerySqlDialect` can be utilized instead.
>  * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` 
> should be updated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340222
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:10
Start Date: 07/Nov/19 23:10
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r343922194
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java
 ##
 @@ -138,12 +138,7 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
   builder.withSelectedFields(fieldNames);
 
 Review comment:
   Talked about this more, `builder` isn't actually a builder.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340222)
Time Spent: 4h 40m  (was: 4.5h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=340220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340220
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:07
Start Date: 07/Nov/19 23:07
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #: [BEAM-8294] run 
Spark portable validates runner tests in parallel
URL: https://github.com/apache/beam/pull/
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340220)
Time Spent: 1h 40m  (was: 1.5h)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340219
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:02
Start Date: 07/Nov/19 23:02
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340219)
Time Spent: 4.5h  (was: 4h 20m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340218
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:01
Start Date: 07/Nov/19 23:01
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r343919486
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.impl.rel;
+
+import static 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument;
+
+import java.util.List;
+import java.util.Map;
+import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable;
+import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel;
+import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable;
+import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter;
+import org.apache.beam.sdk.extensions.sql.meta.DefaultTableFilter;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollectionList;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet;
+import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter;
+import 
org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery;
+
+public class BeamPushDownIOSourceRel extends BeamIOSourceRel {
+  private final List usedFields;
+  private final BeamSqlTableFilter tableFilters;
+
+  public BeamPushDownIOSourceRel(
+  RelOptCluster cluster,
+  RelTraitSet traitSet,
+  RelOptTable table,
+  BeamSqlTable beamTable,
+  List usedFields,
+  BeamSqlTableFilter tableFilters,
+  Map pipelineOptions,
+  BeamCalciteTable calciteTable) {
+super(cluster, traitSet, table, beamTable, pipelineOptions, calciteTable);
+this.usedFields = usedFields;
+this.tableFilters = tableFilters;
+  }
+
+  @Override
+  public RelWriter explainTerms(RelWriter pw) {
+super.explainTerms(pw);
+
+// This is done to tell Calcite planner that BeamIOSourceRel cannot be 
simply substituted by
+//  another BeamIOSourceRel, except for when they carry the same content.
+if (!usedFields.isEmpty()) {
+  pw.item("usedFields", usedFields.toString());
+}
+if (!(tableFilters instanceof DefaultTableFilter)) {
+  pw.item(tableFilters.getClass().getSimpleName(), 
tableFilters.toString());
+}
+
+return pw;
+  }
+
+  @Override
+  public PTransform, PCollection> buildPTransform() {
+return new Transform();
+  }
+
+  private class Transform extends PTransform, 
PCollection> {
+
+@Override
+public PCollection expand(PCollectionList input) {
+  checkArgument(
+  input.size() == 0,
+  "Should not have received input for %s: %s",
+  BeamIOSourceRel.class.getSimpleName(),
+  input);
+
+  final PBegin begin = input.getPipeline().begin();
+  final BeamSqlTable beamSqlTable = 
BeamPushDownIOSourceRel.this.getBeamSqlTable();
+
+  if (usedFields.isEmpty() && tableFilters instanceof DefaultTableFilter) {
+return beamSqlTable.buildIOReader(begin);
+  }
+
+  final Schema newBeamSchema = CalciteUtils.toSchema(getRowType());
+  return beamSqlTable
+  .buildIOReader(begin, tableFilters, usedFields)
+

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340216
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:58
Start Date: 07/Nov/19 22:58
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r343918593
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java
 ##
 @@ -138,12 +138,7 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
   builder.withSelectedFields(fieldNames);
 
 Review comment:
   I think this is fine as it is. Builder normally stores the changes and just 
returns itself as a convenience.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340216)
Time Spent: 4h 10m  (was: 4h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-11-07 Thread Valentyn Tymofieiev (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969621#comment-16969621
 ] 

Valentyn Tymofieiev commented on BEAM-8418:
---

For anyone following this, the issues is fixed in 2.17.0. The workaround on 
earlier releases is to replace 

_ = p | beam.Impulse() with 

_ = p | beam.Create([None]) or something similar.

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8587) Add TestStream support for Dataflow runner

2019-11-07 Thread Andrew Crites (Jira)

Andrew Crites created BEAM-8587:
---

 Summary: Add TestStream support for Dataflow runner
 Key: BEAM-8587
 URL: https://issues.apache.org/jira/browse/BEAM-8587
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow, testing
Reporter: Andrew Crites


TestStream support needed to test features like late data and processing time 
triggers on local Dataflow runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340193
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:35
Start Date: 07/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: liumomo315 commented on issue #9957: [BEAM-8575] Add 
validates runner tests for 1. Custom window fn: Test a customized window fn 
work as expected; 2. Windows idempotency: Applying the same window fn (or 
window fn + GBK) to the input multiple times will have the same effect as 
applying it once.
URL: https://github.com/apache/beam/pull/9957#issuecomment-551297870
 
 
   Hi Luke, sorry that I missed to read the "how to make review process 
smoother" and merged my commits locally and force pushed to origin. Will make 
sure not do that in the future. @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340193)
Time Spent: 50m  (was: 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340183
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:15
Start Date: 07/Nov/19 22:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551291329
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340183)
Time Spent: 5h 40m  (was: 5.5h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340182
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:15
Start Date: 07/Nov/19 22:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551291329
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340182)
Time Spent: 5.5h  (was: 5h 20m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails

2019-11-07 Thread Kirill Kozlov (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov closed BEAM-8574.
---
Fix Version/s: 2.18.0
   Resolution: Resolved

> [SQL] MongoDb PostCommit_SQL fails
> --
>
> Key: BEAM-8574
> URL: https://issues.apache.org/jira/browse/BEAM-8574
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Critical
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Integration test for Sql MongoDb table read and write fails.
> Jenkins: 
> [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/]
> Cause: [https://github.com/apache/beam/pull/9806] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8585) Include path in error message in path_to_beam_jar

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8585?focusedWorklogId=340179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340179
 ]

ASF GitHub Bot logged work on BEAM-8585:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:04
Start Date: 07/Nov/19 22:04
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10032: [BEAM-8585] 
Include path in error message in path_to_beam_jar
URL: https://github.com/apache/beam/pull/10032
 
 
   Admittedly a pretty trivial change, but maybe it will be useful some day.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340178
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:03
Start Date: 07/Nov/19 22:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551287100
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340178)
Time Spent: 5h 20m  (was: 5h 10m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340177
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 22:03
Start Date: 07/Nov/19 22:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551287100
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340177)
Time Spent: 5h 10m  (was: 5h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8585) Include path in error message in path_to_beam_jar

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8585:
--
Priority: Trivial  (was: Minor)

> Include path in error message in path_to_beam_jar
> -
>
> Key: BEAM-8585
> URL: https://issues.apache.org/jira/browse/BEAM-8585
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Trivial
>  Labels: portability-flink
>
> Right now, the error message looks like this when the job server jar can't be 
> found:
> 12:35:50 RuntimeError: Please build the server with 
> 12:35:50 cd 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; 
> ./gradlew runners:flink:1.9:job-server:shadowJar
> I would like to know the path of the missing jar to help me debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8586) Add a server for MongoDb Integration Test

2019-11-07 Thread Kirill Kozlov (Jira)

Kirill Kozlov created BEAM-8586:
---

 Summary: Add a server for MongoDb Integration Test
 Key: BEAM-8586
 URL: https://issues.apache.org/jira/browse/BEAM-8586
 Project: Beam
  Issue Type: Test
  Components: dsl-sql
Reporter: Kirill Kozlov


We need to pass pipeline options with server information to the 
MongoDbReadWriteIT.

For now that test is ignored and excluded from the build.gradle file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8585) Include path in error message in path_to_beam_jar

2019-11-07 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8585:
--
Status: Open  (was: Triage Needed)

> Include path in error message in path_to_beam_jar
> -
>
> Key: BEAM-8585
> URL: https://issues.apache.org/jira/browse/BEAM-8585
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>
> Right now, the error message looks like this when the job server jar can't be 
> found:
> 12:35:50 RuntimeError: Please build the server with 
> 12:35:50 cd 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; 
> ./gradlew runners:flink:1.9:job-server:shadowJar
> I would like to know the path of the missing jar to help me debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8585) Include path in error message in path_to_beam_jar

2019-11-07 Thread Kyle Weaver (Jira)

Kyle Weaver created BEAM-8585:
-

 Summary: Include path in error message in path_to_beam_jar
 Key: BEAM-8585
 URL: https://issues.apache.org/jira/browse/BEAM-8585
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Right now, the error message looks like this when the job server jar can't be 
found:

12:35:50 RuntimeError: Please build the server with 
12:35:50 cd 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; 
./gradlew runners:flink:1.9:job-server:shadowJar

I would like to know the path of the missing jar to help me debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340172
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 21:41
Start Date: 07/Nov/19 21:41
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551278228
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340172)
Time Spent: 5h  (was: 4h 50m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340171
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 21:41
Start Date: 07/Nov/19 21:41
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create 
a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031#issuecomment-551278228
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340171)
Time Spent: 4h 50m  (was: 4h 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8584) Remove TestPipelineOptions

2019-11-07 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-8584:
---

 Summary: Remove TestPipelineOptions
 Key: BEAM-8584
 URL: https://issues.apache.org/jira/browse/BEAM-8584
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Brian Hulette
Assignee: Brian Hulette


See [ML 
thread|https://lists.apache.org/thread.html/cc2ac6db764e0d750688f8bae540728e38759365b86ba6f3fabfa6dd@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340167
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 07/Nov/19 21:37
Start Date: 07/Nov/19 21:37
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10031: [BEAM-8427] 
Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/10031
 
 
   - Create a Table with read support (for now, write will be added in a 
separate PR)
   - Add some tests
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Closed] (BEAM-6274) Timer Backlog Bug

2019-11-07 Thread Sam Rohde (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde closed BEAM-6274.
---
Fix Version/s: Not applicable
   Resolution: Won't Fix

> Timer Backlog Bug
> -
>
> Key: BEAM-6274
> URL: https://issues.apache.org/jira/browse/BEAM-6274
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: Not applicable
>
>
> * Move timer receiver into a new class
>  * Investigate what the getNextFiredTimer(window coder) parameter actually 
> does
>  * Add custom payload feature
>  * Set the correct "IsBounded" on the generated MainInput PCollection for 
> timers (CreateExecutableStageNodeFunction)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-6453) Create a single Jenkins job or Gradle task to serve for release test validation

2019-11-07 Thread Sam Rohde (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde reassigned BEAM-6453:
---

Assignee: (was: Sam Rohde)

> Create a single Jenkins job or Gradle task to serve for release test 
> validation
> ---
>
> Key: BEAM-6453
> URL: https://issues.apache.org/jira/browse/BEAM-6453
> Project: Beam
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Sam Rohde
>Priority: Major
>
> As per [https://github.com/apache/beam/pull/7509,] it looks like you can only 
> run a single jenkins job per phrase per comment. In addition, the list of 
> precommit and postcommit jobs will easily get stale. By creating a Jenkins 
> job or Gradle task, we can kill two birds with one stone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-6672) Make bundle execution with ExecutableStage support user states

2019-11-07 Thread Sam Rohde (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde reassigned BEAM-6672:
---

Assignee: (was: Sam Rohde)

> Make bundle execution with ExecutableStage support user states
> --
>
> Key: BEAM-6672
> URL: https://issues.apache.org/jira/browse/BEAM-6672
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-11-07 Thread Sam Rohde (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde resolved BEAM-8491.
-
Fix Version/s: 2.16.0
   Resolution: Fixed

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8583?focusedWorklogId=340153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340153
 ]

ASF GitHub Bot logged work on BEAM-8583:


Author: ASF GitHub Bot
Created on: 07/Nov/19 21:07
Start Date: 07/Nov/19 21:07
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #10030: [BEAM-8583] 
Big query filter push down
URL: https://github.com/apache/beam/pull/10030
 
 
   - BigQuery should apply predicate push-down when appropriate.
   
   Based on top of #9943.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Created] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode

2019-11-07 Thread Kirill Kozlov (Jira)

Kirill Kozlov created BEAM-8583:
---

 Summary: [SQL] BigQuery should support predicate push-down in 
DIRECT_READ mode
 Key: BEAM-8583
 URL: https://issues.apache.org/jira/browse/BEAM-8583
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8582) Python SDK emits duplicate records for Default and AfterWatermark triggers

2019-11-07 Thread Sam Rohde (Jira)

Sam Rohde created BEAM-8582:
---

 Summary: Python SDK emits duplicate records for Default and 
AfterWatermark triggers
 Key: BEAM-8582
 URL: https://issues.apache.org/jira/browse/BEAM-8582
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Sam Rohde
Assignee: Sam Rohde


This was found after fixing https://issues.apache.org/jira/browse/BEAM-8581. 
The fix for 8581 was to pass in the input watermark. Previously, it was using 
MIN_TIMESTAMP for all of its EOW calculations. By giving it a proper input 
watermark, this bug started to manifest.

The DefaultTrigger and AfterWatermark do not clear their timers after the 
watermark passed the end of the endow, leading to duplicate records being 
emitted.

Fix: Clear the watermark timer when the watermark reaches the end of the window.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-07 Thread Sam Rohde (Jira)

Sam Rohde created BEAM-8581:
---

 Summary: Python SDK labels ontime empty panes as late
 Key: BEAM-8581
 URL: https://issues.apache.org/jira/browse/BEAM-8581
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Sam Rohde
Assignee: Sam Rohde


The GeneralTriggerDriver does not put watermark holds on timers, leading to the 
ontime empty pane being considered late data.


Fix: Add a new notion of whether a trigger has an ontime pane. If it does, then 
set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8580) Request Python API to support windows ClosingBehavior

2019-11-07 Thread wendy liu (Jira)

wendy liu created BEAM-8580:
---

 Summary: Request Python API to support windows ClosingBehavior
 Key: BEAM-8580
 URL: https://issues.apache.org/jira/browse/BEAM-8580
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py-core
Reporter: wendy liu


Beam Python should have an API to support windows ClosingBehavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340150
 ]

ASF GitHub Bot logged work on BEAM-8559:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:40
Start Date: 07/Nov/19 20:40
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run 
Nexmark Dataflow suites with Java 11
URL: https://github.com/apache/beam/pull/9994#issuecomment-551254214
 
 
   Run Dataflow Runner Nexmark Tests - Java 11
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340150)
Time Spent: 40m  (was: 0.5h)

> Run Dataflow Nexmark suites with Java 11
> 
>
> Key: BEAM-8559
> URL: https://issues.apache.org/jira/browse/BEAM-8559
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.
> The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
> verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340146
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:34
Start Date: 07/Nov/19 20:34
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#issuecomment-551251982
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340146)
Time Spent: 1.5h  (was: 1h 20m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340139
 ]

ASF GitHub Bot logged work on BEAM-8512:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:29
Start Date: 07/Nov/19 20:29
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add 
integration tests for flink_runner.py
URL: https://github.com/apache/beam/pull/9998#discussion_r343858825
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1953,8 +1953,10 @@ class BeamModulePlugin implements Plugin {
   }
   project.ext.addPortableWordCountTasks = {
 ->
-addPortableWordCountTask(false)
-addPortableWordCountTask(true)
+addPortableWordCountTask(false, "PortableRunner")
 
 Review comment:
   > Theoretically, the PortableRunner is a subset of FlinkRunner, but there 
are still things that can go wrong
   
   +1 As long as PortableRunner remains a publicly documented option (and it 
will probably have to for at least a little while longer) we should test it. 
Also, they are testing slightly different things -- PortableRunner starts a 
containerized job server, while FlinkRunner (when local) starts the job server 
in a Java subprocess.
   
   > Since these tasks are Flink specific, should they be renamed and possibly 
moved out of BeamModulePlugin?
   
   I'm going to add these tasks to Spark (eventually).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340139)
Time Spent: 1h 20m  (was: 1h 10m)

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340137
 ]

ASF GitHub Bot logged work on BEAM-8559:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:27
Start Date: 07/Nov/19 20:27
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run 
Nexmark Dataflow suites with Java 11
URL: https://github.com/apache/beam/pull/9994#issuecomment-551249539
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340137)
Time Spent: 0.5h  (was: 20m)

> Run Dataflow Nexmark suites with Java 11
> 
>
> Key: BEAM-8559
> URL: https://issues.apache.org/jira/browse/BEAM-8559
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.
> The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
> verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340129
 ]

ASF GitHub Bot logged work on BEAM-8559:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:13
Start Date: 07/Nov/19 20:13
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run 
Nexmark Dataflow suites with Java 11
URL: https://github.com/apache/beam/pull/9994#issuecomment-551244425
 
 
   Run Dataflow Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340129)
Time Spent: 20m  (was: 10m)

> Run Dataflow Nexmark suites with Java 11
> 
>
> Key: BEAM-8559
> URL: https://issues.apache.org/jira/browse/BEAM-8559
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.
> The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
> verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340125
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:03
Start Date: 07/Nov/19 20:03
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343848058
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   Discussed with David and Sam. Since we also want to track jobs started from 
notebook even if the user never uses `InteractiveRunner`, checking the 
environment might just be the only way to do it.
   By putting the logic into a try-except block as it is, we could avoid 
introducing `ipython` dependency into `DataflowRunner`. If the `[interactive]` 
dependency is never installed and current execution_path has never imported 
`ipython`, the code would just never be executed.
   
   I'll move the logic into a standalone utility module and import it in 
DataflowRunner to do the check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340125)
Time Spent: 8.5h  (was: 8h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340123
 ]

ASF GitHub Bot logged work on BEAM-8559:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:00
Start Date: 07/Nov/19 20:00
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run 
Nexmark Dataflow suites with Java 11
URL: https://github.com/apache/beam/pull/9994#issuecomment-551239737
 
 
   Run seed job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340123)
Remaining Estimate: 0h
Time Spent: 10m

> Run Dataflow Nexmark suites with Java 11
> 
>
> Key: BEAM-8559
> URL: https://issues.apache.org/jira/browse/BEAM-8559
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.
> The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
> verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340120
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 19:57
Start Date: 07/Nov/19 19:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343845181
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   Putting this discussion on next Monday's agenda and will remove changes to 
the API.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340120)
Time Spent: 8h 20m  (was: 8h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 201 matches

Mail list logo