[jira] [Commented] (BEAM-8207) KafkaIOITs generate different hashes each run, sometimes dropping records
[ https://issues.apache.org/jira/browse/BEAM-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969909#comment-16969909 ] Michal Walenia commented on BEAM-8207: -- [~aromanenko] Yes, the issue can be closed now. Thanks! > KafkaIOITs generate different hashes each run, sometimes dropping records > - > > Key: BEAM-8207 > URL: https://issues.apache.org/jira/browse/BEAM-8207 > Project: Beam > Issue Type: Bug > Components: io-java-kafka, testing >Reporter: Michal Walenia >Priority: Major > > While working to adapt Java's KafkaIOIT to work with a large dataset > generated by a SyntheticSource I encountered a problem. I want to push 100M > records through a Kafka topic, verify data correctness and at the same time > check the performance of KafkaIO.Write and KafkaIO.Read. > > To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam > repo > ([here|https://github.com/apache/beam/tree/master/.test-infra/kubernetes/kafka-cluster]). > > The expected result would be that first the records are generated in a > deterministic way (using hashes of list positions as Random seeds), next they > are written to Kafka - this concludes the write pipeline. > As for reading and correctness checking - first, the data is read from the > topic and after being decoded into String representations, a hashcode of the > whole PCollection is calculated (For details, check KafkaIOIT.java). > > During the testing I ran into several problems: > 1. When all the records are read from the Kafka topic, the hash is different > each time. > 2. Sometimes not all the records are read and the Dataflow task waits for the > input indefinitely, occasionally throwing exceptions. > > I believe there are two possible causes of this behavior: > > either there is something wrong with the Kafka cluster configuration > or KafkaIO behaves erratically on high data volumes, duplicating and/or > dropping records. > Second option seems troubling and I would be grateful for help with the first. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340376 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 08/Nov/19 07:09 Start Date: 08/Nov/19 07:09 Worklog Time Spent: 10m Work Description: mxm commented on issue #9997: [BEAM-8157] Revert key encoding changes for state requests / improve debugging and testing URL: https://github.com/apache/beam/pull/9997#issuecomment-551412788 Please note the mailing list discussion. Will update the PR today with the desired solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340376) Time Spent: 7.5h (was: 7h 20m) > Key encoding for state requests is not consistent across SDKs > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > The Flink runner requires the internal key to be encoded without a length > prefix (OUTER context). The user state request handler exposes a serialized > version of the key to the Runner. This key is encoded with the NESTED context > which may add a length prefix. We need to convert it to OUTER context to > match the Flink runner's key encoding. > So far this has not caused the Flink Runner to behave incorrectly. However, > with the upcoming support for Flink 1.9, the state backend will not accept > requests for keys not part of any key group/partition of the operator. This > is very likely to happen with the encoding not being consistent. > **NOTE** This is only applicable to the Java SDK, as the Python SDK uses > OUTER encoding for the key in state requests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8591) Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Summary: Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster. (was: Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.) > Exception is thrown while running Beam Pipeline on Kubernetes Flink Cluster. > > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > * Setup Kubernetes Flink Cluster using Minikube: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instruction: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > options = PipelineOptions([ > "--runner=PortableRunner", > "--job_endpoint=localhost:8099", > "--environment_type=LOOPBACK" > ]) > with beam.Pipeline(options=options) as pipeline: > data = ["Sample data", > "Sample data - 0", > "Sample data - 1"] > raw_data = (pipeline > | 'CreateHardCodeData' >> beam.Create(data) > | 'Map' >> beam.Map(lambda line : line + '.') > | 'Print' >> beam.Map(print)){code} > Verify different environment_type in Python SDK Harness Configuration > *environment_type=LOOPBACK* > # Run pipeline on local cluster: Works Fine > # Run pipeline on K8S cluster, Exceptions are thrown: > java.lang.Exception: The user defined 'open()' method caused an exception: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > UNAVAILABLE: io exception Caused by: > org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: > Connection refused: localhost/127.0.0.1:51017 > *environment_type=DOCKER* > # Run pipeline on local cluster: Work fine > # Run pipeline on K8S cluster, Exception are thrown: > Caused by: java.io.IOException: Cannot run program "docker": error=2, No > such file or directory. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster using Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOPBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster using Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > * Setup Kubernetes Flink Cluster using Minikube: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instruction: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > {code:java} > import apache_beam as beam > from
[jira] [Work logged] (BEAM-8157) Key encoding for state requests is not consistent across SDKs
[ https://issues.apache.org/jira/browse/BEAM-8157?focusedWorklogId=340371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340371 ] ASF GitHub Bot logged work on BEAM-8157: Author: ASF GitHub Bot Created on: 08/Nov/19 06:49 Start Date: 08/Nov/19 06:49 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9997: [BEAM-8157] Revert key encoding changes for state requests / improve debugging and testing URL: https://github.com/apache/beam/pull/9997#issuecomment-551407703 Sorry for the late reply @mxm, I would like to check it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340371) Time Spent: 7h 20m (was: 7h 10m) > Key encoding for state requests is not consistent across SDKs > - > > Key: BEAM-8157 > URL: https://issues.apache.org/jira/browse/BEAM-8157 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.13.0 >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Critical > Fix For: 2.17.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > The Flink runner requires the internal key to be encoded without a length > prefix (OUTER context). The user state request handler exposes a serialized > version of the key to the Runner. This key is encoded with the NESTED context > which may add a length prefix. We need to convert it to OUTER context to > match the Flink runner's key encoding. > So far this has not caused the Flink Runner to behave incorrectly. However, > with the upcoming support for Flink 1.9, the state backend will not accept > requests for keys not part of any key group/partition of the operator. This > is very likely to happen with the encoding not being consistent. > **NOTE** This is only applicable to the Java SDK, as the Python SDK uses > OUTER encoding for the key in state requests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8594) Remove unnecessary error check of the control service accessing in DataFlow Runner
[ https://issues.apache.org/jira/browse/BEAM-8594?focusedWorklogId=340369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340369 ] ASF GitHub Bot logged work on BEAM-8594: Author: ASF GitHub Bot Created on: 08/Nov/19 06:46 Start Date: 08/Nov/19 06:46 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #10039: [BEAM-8594] Remove unnecessary error check in DataFlow Runner URL: https://github.com/apache/beam/pull/10039 Currently there are a few places in the DataFlow Runner which checks if there is error reported when accessing the SDK harness's control service. Actually, the error reported by the SDK harness has already been handled in the [FnApiControlClient](https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L152). There is no need to check it anymore. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build
[jira] [Created] (BEAM-8594) Remove unnecessary error check of the control service accessing in DataFlow Runner
sunjincheng created BEAM-8594: - Summary: Remove unnecessary error check of the control service accessing in DataFlow Runner Key: BEAM-8594 URL: https://issues.apache.org/jira/browse/BEAM-8594 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: sunjincheng Assignee: sunjincheng Fix For: 2.18.0 Currently there are a few places in the DataFlow Runner which checks if there is error reported when accessing the SDK harness's control service. Actually, the error reported by the SDK harness has already been handled in the [FnApiControlClient|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/control/FnApiControlClient.java#L152]. There is no need to check it anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=340353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340353 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 08/Nov/19 05:09 Start Date: 08/Nov/19 05:09 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#issuecomment-551386849 Hi @lukecwik, Could you please have another look, any comment is welcome, Thanks! :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340353) Time Spent: 50m (was: 40m) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string
[ https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=340335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340335 ] ASF GitHub Bot logged work on BEAM-8592: Author: ASF GitHub Bot Created on: 08/Nov/19 04:04 Start Date: 08/Nov/19 04:04 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10021: [BEAM-8592] Adjusting ZetaSQL table resolution to standard URL: https://github.com/apache/beam/pull/10021#issuecomment-551375497 run sql precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340335) Remaining Estimate: 0h Time Spent: 10m > DataCatalogTableProvider should not squash table components together into a > string > -- > > Key: BEAM-8592 > URL: https://issues.apache.org/jira/browse/BEAM-8592 > Project: Beam > Issue Type: Bug > Components: dsl-sql, dsl-sql-zetasql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} > representing the components \{{"foo", "baz.bar", "bizzle"}} the > DataCatalogTableProvider will concatenate the components into a string and > resolve the identifier as if it represented \{{"foo", "baz", "bar", > "bizzle"}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8593) Define expected behavior of running ZetaSQL query on tables with unsupported field types
Yueyang Qiu created BEAM-8593: - Summary: Define expected behavior of running ZetaSQL query on tables with unsupported field types Key: BEAM-8593 URL: https://issues.apache.org/jira/browse/BEAM-8593 Project: Beam Issue Type: Improvement Components: dsl-sql-zetasql Reporter: Yueyang Qiu Assignee: Yueyang Qiu What should be the expected behavior if a user run a ZetaSQL query on a table with a field type (e.g. MAP) that is not supported by ZetaSQL? More context: [https://github.com/apache/beam/pull/10020#issuecomment-551368105] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8566) Checkpoint buffer is flushed prematurely when another bundle is started during checkpointing
[ https://issues.apache.org/jira/browse/BEAM-8566?focusedWorklogId=340331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340331 ] ASF GitHub Bot logged work on BEAM-8566: Author: ASF GitHub Bot Created on: 08/Nov/19 03:44 Start Date: 08/Nov/19 03:44 Worklog Time Spent: 10m Work Description: tweise commented on pull request #10008: [BEAM-8566] Do not swallow execution errors during checkpointing URL: https://github.com/apache/beam/pull/10008#discussion_r343978810 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java ## @@ -763,12 +763,19 @@ public final void snapshotState(StateSnapshotContext context) throws Exception { // We can't output here anymore because the checkpoint barrier has already been // sent downstream. This is going to change with 1.6/1.7's prepareSnapshotBarrier. -outputManager.openBuffer(); -// Ensure that no new bundle gets started as part of finishing a bundle -while (bundleStarted.get()) { - invokeFinishBundle(); +try { + outputManager.openBuffer(); + // Ensure that no new bundle gets started as part of finishing a bundle + while (bundleStarted.get()) { +invokeFinishBundle(); + } + outputManager.closeBuffer(); +} catch (Exception e) { + // Any regular exception during checkpointing will be tolerated by Flink because those Review comment: ```suggestion // https://jira.apache.org/jira/browse/FLINK-14653 // Any regular exception during checkpointing will be tolerated by Flink because those ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340331) Time Spent: 1h 50m (was: 1h 40m) > Checkpoint buffer is flushed prematurely when another bundle is started > during checkpointing > > > Key: BEAM-8566 > URL: https://issues.apache.org/jira/browse/BEAM-8566 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > As part of a checkpoint, the current bundle is finalized. When the bundle is > finalized, the watermark, currently held back, may also be progressed which > can cause the start of another bundle. When a new bundle is started, any > to-be-buffered items from the previous bundle for the pending checkpoint may > be emitted. This should not happen. > This only effects portable pipelines where we have to hold back the watermark > due to the asynchronous processing of elements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string
[ https://issues.apache.org/jira/browse/BEAM-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969807#comment-16969807 ] Kenneth Knowles commented on BEAM-8592: --- CC [~apilloud] [~amaliujia] > DataCatalogTableProvider should not squash table components together into a > string > -- > > Key: BEAM-8592 > URL: https://issues.apache.org/jira/browse/BEAM-8592 > Project: Beam > Issue Type: Bug > Components: dsl-sql, dsl-sql-zetasql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} > representing the components \{{"foo", "baz.bar", "bizzle"}} the > DataCatalogTableProvider will concatenate the components into a string and > resolve the identifier as if it represented \{{"foo", "baz", "bar", > "bizzle"}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string
Kenneth Knowles created BEAM-8592: - Summary: DataCatalogTableProvider should not squash table components together into a string Key: BEAM-8592 URL: https://issues.apache.org/jira/browse/BEAM-8592 Project: Beam Issue Type: Bug Components: dsl-sql, dsl-sql-zetasql Reporter: Kenneth Knowles Assignee: Kenneth Knowles Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} representing the components \{{"foo", "baz.bar", "bizzle"}} the DataCatalogTableProvider will concatenate the components into a string and resolve the identifier as if it represented \{{"foo", "baz", "bar", "bizzle"}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string
[ https://issues.apache.org/jira/browse/BEAM-8592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8592: -- Status: Open (was: Triage Needed) > DataCatalogTableProvider should not squash table components together into a > string > -- > > Key: BEAM-8592 > URL: https://issues.apache.org/jira/browse/BEAM-8592 > Project: Beam > Issue Type: Bug > Components: dsl-sql, dsl-sql-zetasql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > > Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} > representing the components \{{"foo", "baz.bar", "bizzle"}} the > DataCatalogTableProvider will concatenate the components into a string and > resolve the identifier as if it represented \{{"foo", "baz", "bar", > "bizzle"}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8456) Add pipeline option to control truncate of BigQuery data processed by Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-8456: -- Summary: Add pipeline option to control truncate of BigQuery data processed by Beam SQL (was: BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense) > Add pipeline option to control truncate of BigQuery data processed by Beam SQL > -- > > Key: BEAM-8456 > URL: https://issues.apache.org/jira/browse/BEAM-8456 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Most of the time, a user reading a timestamp from BigQuery with > higher-than-millisecond precision timestamps may not even realize that the > data source created these high precision timestamps. They are probably > timestamps on log entries generated by a system with higher precision. > If they are using it with Beam SQL, which only supports millisecond > precision, it makes sense to "just work" by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8456) Add pipeline option to control truncate of BigQuery data processed by Beam SQL
[ https://issues.apache.org/jira/browse/BEAM-8456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles resolved BEAM-8456. --- Fix Version/s: 2.18.0 Resolution: Fixed > Add pipeline option to control truncate of BigQuery data processed by Beam SQL > -- > > Key: BEAM-8456 > URL: https://issues.apache.org/jira/browse/BEAM-8456 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.18.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Most of the time, a user reading a timestamp from BigQuery with > higher-than-millisecond precision timestamps may not even realize that the > data source created these high precision timestamps. They are probably > timestamps on log entries generated by a system with higher precision. > If they are using it with Beam SQL, which only supports millisecond > precision, it makes sense to "just work" by default. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster using Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster using Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > * Setup Kubernetes Flink Cluster using Minikube: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instruction: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > {code:java} > import apache_beam as beam > from
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster using Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > > * Setup Kubernetes Flink Cluster using Minikube: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instruction: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > {code:java} > import apache_beam as beam > from
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: {code:java} import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)){code} Verify different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instrution: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > > * Setup Kubernetes Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instruction: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > options = PipelineOptions([ > "--runner=PortableRunner", >
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: h2. Setup Clusters * Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] * Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instrution: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters # h2. Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] # Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. Using Apache Beam Flink Runner Instrution: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > h2. Setup Clusters > > * Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > > * Setup Kubernetes Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. Using Apache Beam Flink Runner > Instrution: [https://beam.apache.org/documentation/runners/flink/] > Sample Pipeline Code: > import apache_beam as beam from apache_beam.options.pipeline_options import > PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", > "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with > beam.Pipeline(options=options) as pipeline:
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: # Apache Beam and Flink Walkthrough Test ## Setup Clusters 1. Setup Local Flink Cluster: https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html 2. Setup Kubernetes Flink Cluster: https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html ## Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. ## Apache Beam Flink Runner Instrution: https://beam.apache.org/documentation/runners/flink/ Sample Pipeline Code: ```python import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) ``` Verfiy different environment_type in Python SDK Harness Configuration **environment_type=LOOKBACK** 1. Run pipeline on local cluster: Works Fine 2. Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 **environment_type=DOCKER** 1. Run pipeline on local cluster: Work fine 2. Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: h2. Setup Clusters # Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] # Setup Kubernetes Flink Cluster with Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > # Apache Beam and Flink Walkthrough Test > ## Setup Clusters > 1. Setup Local Flink Cluster: > https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html > 2. Setup Kubernetes Flink Cluster: > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html > ## Verify Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > ## Apache Beam Flink Runner > Instrution: https://beam.apache.org/documentation/runners/flink/ >
[jira] [Updated] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
[ https://issues.apache.org/jira/browse/BEAM-8591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Gong updated BEAM-8591: - Description: # h2. Setup Clusters # Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] # Setup Kubernetes Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache Beam Flink Runner Instrution: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. was: # Apache Beam and Flink Walkthrough Test ## Setup Clusters 1. Setup Local Flink Cluster: https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html 2. Setup Kubernetes Flink Cluster: https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html ## Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. ## Apache Beam Flink Runner Instrution: https://beam.apache.org/documentation/runners/flink/ Sample Pipeline Code: ```python import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) ``` Verfiy different environment_type in Python SDK Harness Configuration **environment_type=LOOKBACK** 1. Run pipeline on local cluster: Works Fine 2. Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 **environment_type=DOCKER** 1. Run pipeline on local cluster: Work fine 2. Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. > Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. > --- > > Key: BEAM-8591 > URL: https://issues.apache.org/jira/browse/BEAM-8591 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Mingliang Gong >Priority: Major > > # > h2. Setup Clusters > # Setup Local Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] > # Setup Kubernetes Flink Cluster: > [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] > h2. > [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify > Clusters > Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both > Local and K8S Flink Cluster work fine. > h2. >
[jira] [Created] (BEAM-8591) Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster.
Mingliang Gong created BEAM-8591: Summary: Exception is thrown when running Beam Pipeline on Kubernetes Flink Cluster. Key: BEAM-8591 URL: https://issues.apache.org/jira/browse/BEAM-8591 Project: Beam Issue Type: Bug Components: runner-flink Reporter: Mingliang Gong h2. Setup Clusters # Setup Local Flink Cluster: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/tutorials/local_setup.html] # Setup Kubernetes Flink Cluster with Minikube: [https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/deployment/kubernetes.html] h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#verify-clusters]Verify Clusters Execute command “./bin/flink run examples/streaming/WordCount.jar”. Both Local and K8S Flink Cluster work fine. h2. [|https://github.com/sql-machine-learning/elasticdl/wiki/Test-Walkthrough-Apache-Beam-and-Flink#apache-beam-flink-runner]Apache Beam Flink Runner Instruction: [https://beam.apache.org/documentation/runners/flink/] Sample Pipeline Code: import apache_beam as beam from apache_beam.options.pipeline_options import PipelineOptions options = PipelineOptions([ "--runner=PortableRunner", "--job_endpoint=localhost:8099", "--environment_type=LOOPBACK" ]) with beam.Pipeline(options=options) as pipeline: data = ["Sample data", "Sample data - 0", "Sample data - 1"] raw_data = (pipeline | 'CreateHardCodeData' >> beam.Create(data) | 'Map' >> beam.Map(lambda line : line + '.') | 'Print' >> beam.Map(print)) Verfiy different environment_type in Python SDK Harness Configuration *environment_type=LOOKBACK* # Run pipeline on local cluster: Works Fine # Run pipeline on K8S cluster, Exceptions are thrown: java.lang.Exception: The user defined 'open()' method caused an exception: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: UNAVAILABLE: io exception Caused by: org.apache.beam.vendor.grpc.v1p21p0.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:51017 *environment_type=DOCKER* # Run pipeline on local cluster: Work fine # Run pipeline on K8S cluster, Exception are thrown: Caused by: java.io.IOException: Cannot run program "docker": error=2, No such file or directory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any
[ https://issues.apache.org/jira/browse/BEAM-8590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-8590: Status: Open (was: Triage Needed) > Python typehints: native types: consider bare container types as containing > Any > --- > > Key: BEAM-8590 > URL: https://issues.apache.org/jira/browse/BEAM-8590 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > This is for convert_to_beam_type: > For example, process(element: List) is the same as process(element: > List[Any]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8590) Python typehints: native types: consider bare container types as containing Any
Udi Meiri created BEAM-8590: --- Summary: Python typehints: native types: consider bare container types as containing Any Key: BEAM-8590 URL: https://issues.apache.org/jira/browse/BEAM-8590 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri This is for convert_to_beam_type: For example, process(element: List) is the same as process(element: List[Any]). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340292=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340292 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 08/Nov/19 01:59 Start Date: 08/Nov/19 01:59 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-551349679 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340292) Time Spent: 8h 50m (was: 8h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unify bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=340277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340277 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 08/Nov/19 01:25 Start Date: 08/Nov/19 01:25 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10004: [BEAM-8442] Unify bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/10004#issuecomment-551342271 Thanks for the review @mxm ! I am appreciate if you can have a look at the PR. @aaltay @ibzib @lukecwik :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340277) Time Spent: 3h (was: 2h 50m) > Unify bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args
[ https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-8588: --- Assignee: (was: Udi Meiri) > MapTuple(fn) fails if fn has type hints but no default args > --- > > Key: BEAM-8588 > URL: https://issues.apache.org/jira/browse/BEAM-8588 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > > {code} > def test_typed_maptuple(self): > def fn(e1: int, e2: int) -> int: > return e1 * e2 > result = [(1, 1), (2, 2)] | beam.MapTuple(fn) > self.assertEqual([2, 4], sorted(result)) > {code} > Fails in getcallargs_forhints_impl_py3 with: > {code} > > raise TypeCheckError(e) > E apache_beam.typehints.decorators.TypeCheckError: too many positional > arguments > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8589) Add instrumentation to portable runner to print pipeline proto and options when logging level is set to Debug.
Valentyn Tymofieiev created BEAM-8589: - Summary: Add instrumentation to portable runner to print pipeline proto and options when logging level is set to Debug. Key: BEAM-8589 URL: https://issues.apache.org/jira/browse/BEAM-8589 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Valentyn Tymofieiev Similar capability in Dataflow runner: https://github.com/apache/beam/blob/90d587843172143c15ed392513e396b74569a98c/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L567. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args
[ https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969702#comment-16969702 ] Udi Meiri commented on BEAM-8588: - [~robertwb] I'm leaving this unassigned for now. > MapTuple(fn) fails if fn has type hints but no default args > --- > > Key: BEAM-8588 > URL: https://issues.apache.org/jira/browse/BEAM-8588 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > > {code} > def test_typed_maptuple(self): > def fn(e1: int, e2: int) -> int: > return e1 * e2 > result = [(1, 1), (2, 2)] | beam.MapTuple(fn) > self.assertEqual([2, 4], sorted(result)) > {code} > Fails in getcallargs_forhints_impl_py3 with: > {code} > > raise TypeCheckError(e) > E apache_beam.typehints.decorators.TypeCheckError: too many positional > arguments > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?focusedWorklogId=340268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340268 ] ASF GitHub Bot logged work on BEAM-8583: Author: ASF GitHub Bot Created on: 08/Nov/19 01:11 Start Date: 08/Nov/19 01:11 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10030: [BEAM-8583] Big query filter push down URL: https://github.com/apache/beam/pull/10030#issuecomment-55133 R: @apilloud cc: @amaliujia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340268) Time Spent: 20m (was: 10m) > [SQL] BigQuery should support predicate push-down in DIRECT_READ mode > - > > Key: BEAM-8583 > URL: https://issues.apache.org/jira/browse/BEAM-8583 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > * Add BigQuery Dialect with TypeTranslation (since it is not implemented in > Calcite 1.20.0, but is present in unreleased versions). > * Create a BigQueryFilter class. > * BigQueryTable#buildIOReader should translate supported filters into a Sql > string and pass it to BigQueryIO. > > Potential improvements: > * After updating vendor Calcite, class > `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's > `BigQuerySqlDialect` can be utilized instead. > * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` > should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340267 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 08/Nov/19 01:10 Start Date: 08/Nov/19 01:10 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551338685 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340267) Time Spent: 6h (was: 5h 50m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340266 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 08/Nov/19 01:10 Start Date: 08/Nov/19 01:10 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551338685 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340266) Time Spent: 5h 50m (was: 5h 40m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args
[ https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969696#comment-16969696 ] Udi Meiri commented on BEAM-8588: - For this example: {code} def fn(element1: int, element2: int, side1: str) -> int: ... {code} The input type hints for the wrapper in MapTuple should be: {code} Tuple[int, int], str {code} so there would need to be 2 separate code paths in MapTuple: 1. For type hints originating from typehints.decorators.IOTypeHints.from_callable(fn), which would need to combine the non-side-input args into a Tuple. 2. For type hints originating from decorators, which would pass the type hints as-is. > MapTuple(fn) fails if fn has type hints but no default args > --- > > Key: BEAM-8588 > URL: https://issues.apache.org/jira/browse/BEAM-8588 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > {code} > def test_typed_maptuple(self): > def fn(e1: int, e2: int) -> int: > return e1 * e2 > result = [(1, 1), (2, 2)] | beam.MapTuple(fn) > self.assertEqual([2, 4], sorted(result)) > {code} > Fails in getcallargs_forhints_impl_py3 with: > {code} > > raise TypeCheckError(e) > E apache_beam.typehints.decorators.TypeCheckError: too many positional > arguments > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340264 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 08/Nov/19 01:07 Start Date: 08/Nov/19 01:07 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9965: [BEAM-8539] Make job state transitions in python-based runners consistent with java-based runners URL: https://github.com/apache/beam/pull/9965#issuecomment-551338061 > Should STOPPED be PAUSED? Based on the comments describing STOPPED, I'd say yes. But @lukecwik has informed me in the other PR that there are no runners that actually support being paused (RUNNING -> STOPPED -> RUNNING transition) , so STOPPED is not currently used this way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340264) Time Spent: 4h 10m (was: 4h) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340263 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 08/Nov/19 01:05 Start Date: 08/Nov/19 01:05 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9965: [BEAM-8539] Make job state transitions in python-based runners consistent with java-based runners URL: https://github.com/apache/beam/pull/9965#discussion_r343949978 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -195,7 +195,7 @@ def __init__(self, self._state = None self._state_queues = [] self._log_queues = [] -self.state = beam_job_api_pb2.JobState.STARTING +self.state = beam_job_api_pb2.JobState.STOPPED Review comment: The initial state in java runners is STOPPED. Then it transitions to STARTING and RUNNING. More info at #9969. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340263) Time Spent: 4h (was: 3h 50m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340262 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 08/Nov/19 01:03 Start Date: 08/Nov/19 01:03 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9965: [BEAM-8539] Make job state transitions in python-based runners consistent with java-based runners URL: https://github.com/apache/beam/pull/9965#discussion_r343949646 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -213,16 +213,37 @@ message JobMessagesResponse { // without needing to pass through STARTING. message JobState { enum Enum { +// The job state reported by a runner cannot be interpreted by the SDK. UNSPECIFIED = 0; + +// The job has been paused, or has not yet started. STOPPED = 1; + +// The job is currently running. (terminal) RUNNING = 2; + +// The job has successfully completed. (terminal) DONE = 3; + +// The job has failed. (terminal) FAILED = 4; + +// The job has been explicitly cancelled. (terminal) CANCELLED = 5; + +// The job has been updated. UPDATED = 6; + +// The job is draining its data. DRAINING = 7; + +// The job has completed draining its data. (terminal) DRAINED = 8; + +// The job is starting up. STARTING = 9; + +// The job is cancelling. CANCELLING = 10; UPDATING = 11; Review comment: Ah, sorry. that was added by @lukecwik after I created this. Will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340262) Time Spent: 3h 50m (was: 3h 40m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340261 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 08/Nov/19 01:02 Start Date: 08/Nov/19 01:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9965: [BEAM-8539] Make job state transitions in python-based runners consistent with java-based runners URL: https://github.com/apache/beam/pull/9965#discussion_r343949266 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -213,16 +213,37 @@ message JobMessagesResponse { // without needing to pass through STARTING. message JobState { enum Enum { +// The job state reported by a runner cannot be interpreted by the SDK. UNSPECIFIED = 0; + +// The job has been paused, or has not yet started. STOPPED = 1; + +// The job is currently running. (terminal) RUNNING = 2; + +// The job has successfully completed. (terminal) DONE = 3; + +// The job has failed. (terminal) FAILED = 4; + +// The job has been explicitly cancelled. (terminal) CANCELLED = 5; + +// The job has been updated. UPDATED = 6; + +// The job is draining its data. DRAINING = 7; + +// The job has completed draining its data. (terminal) DRAINED = 8; + +// The job is starting up. STARTING = 9; + +// The job is cancelling. CANCELLING = 10; UPDATING = 11; Review comment: No comment on UPDATING? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340261) Time Spent: 3h 40m (was: 3.5h) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=340260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340260 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 08/Nov/19 01:02 Start Date: 08/Nov/19 01:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9965: [BEAM-8539] Make job state transitions in python-based runners consistent with java-based runners URL: https://github.com/apache/beam/pull/9965#discussion_r343949121 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -195,7 +195,7 @@ def __init__(self, self._state = None self._state_queues = [] self._log_queues = [] -self.state = beam_job_api_pb2.JobState.STARTING +self.state = beam_job_api_pb2.JobState.STOPPED Review comment: Why this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340260) Time Spent: 3.5h (was: 3h 20m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args
[ https://issues.apache.org/jira/browse/BEAM-8588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969686#comment-16969686 ] Udi Meiri commented on BEAM-8588: - One subtlety here is that it seems that the correct type hint for the input element is a Tuple (Tuple[int, int] in this case). > MapTuple(fn) fails if fn has type hints but no default args > --- > > Key: BEAM-8588 > URL: https://issues.apache.org/jira/browse/BEAM-8588 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > {code} > def test_typed_maptuple(self): > def fn(e1: int, e2: int) -> int: > return e1 * e2 > result = [(1, 1), (2, 2)] | beam.MapTuple(fn) > self.assertEqual([2, 4], sorted(result)) > {code} > Fails in getcallargs_forhints_impl_py3 with: > {code} > > raise TypeCheckError(e) > E apache_beam.typehints.decorators.TypeCheckError: too many positional > arguments > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8588) MapTuple(fn) fails if fn has type hints but no default args
Udi Meiri created BEAM-8588: --- Summary: MapTuple(fn) fails if fn has type hints but no default args Key: BEAM-8588 URL: https://issues.apache.org/jira/browse/BEAM-8588 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri {code} def test_typed_maptuple(self): def fn(e1: int, e2: int) -> int: return e1 * e2 result = [(1, 1), (2, 2)] | beam.MapTuple(fn) self.assertEqual([2, 4], sorted(result)) {code} Fails in getcallargs_forhints_impl_py3 with: {code} > raise TypeCheckError(e) E apache_beam.typehints.decorators.TypeCheckError: too many positional arguments {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=340251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340251 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 08/Nov/19 00:15 Start Date: 08/Nov/19 00:15 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035#issuecomment-551325756 R: @robertwb R: @dpmills Can you review this please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340251) Time Spent: 20m (was: 10m) > Python SDK labels ontime empty panes as late > > > Key: BEAM-8581 > URL: https://issues.apache.org/jira/browse/BEAM-8581 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The GeneralTriggerDriver does not put watermark holds on timers, leading to > the ontime empty pane being considered late data. > Fix: Add a new notion of whether a trigger has an ontime pane. If it does, > then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late
[ https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=340250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340250 ] ASF GitHub Bot logged work on BEAM-8581: Author: ASF GitHub Bot Created on: 08/Nov/19 00:14 Start Date: 08/Nov/19 00:14 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10035: [BEAM-8581] and [BEAM-8582] watermark and trigger fixes URL: https://github.com/apache/beam/pull/10035 The GeneralTriggerDriver does not put watermark holds on timers, leading to the ontime empty pane being considered late data. The DefaultTrigger and AfterWatermark do not clear their timers after the watermark passed the end of the endow, leading to duplicate records being emitted. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-8273) Improve worker script for environment_type=PROCESS
[ https://issues.apache.org/jira/browse/BEAM-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8273: -- Description: When environment_type=PROCESS, environment_config specifies the command to run the worker processes. Right now, it defaults to None and errors if not set (`TypeError: expected string or buffer`). It might not be feasible to offer a one-size-fits-all executable for providing as environment_config, but we could at least: a) make it easier to build one (right now I only see the executable being built in a test script that depends on docker: [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165]) b) document the process c) link to the documentation when no environment_config is provided was: When environment_type=PROCESS, environment_config specifies the command to run the worker processes. Right now, it defaults to None and errors if not set (`TypeError: expected string or buffer`). It might not be feasible to offer a one-size-fits-all executable for providing as environment_config, but we could at least: a) make it easier to build one (right now I only see the executable being built in a test script that depends on docker: ) b) document the process c) link to the documentation when no environment_config is provided > Improve worker script for environment_type=PROCESS > -- > > Key: BEAM-8273 > URL: https://issues.apache.org/jira/browse/BEAM-8273 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > When environment_type=PROCESS, environment_config specifies the command to > run the worker processes. Right now, it defaults to None and errors if not > set (`TypeError: expected string or buffer`). > It might not be feasible to offer a one-size-fits-all executable for > providing as environment_config, but we could at least: > a) make it easier to build one (right now I only see the executable being > built in a test script that depends on docker: > [https://github.com/apache/beam/blob/cbf8a900819c52940a0edd90f59bf6aec55c817a/sdks/python/test-suites/portable/py2/build.gradle#L146-L165]) > b) document the process > c) link to the documentation when no environment_config is provided -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8524) Stop using pubsub in fnapi streaming dataflow Impluse
[ https://issues.apache.org/jira/browse/BEAM-8524?focusedWorklogId=340248=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340248 ] ASF GitHub Bot logged work on BEAM-8524: Author: ASF GitHub Bot Created on: 08/Nov/19 00:09 Start Date: 08/Nov/19 00:09 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10034: [BEAM-8524] Still use fake pubsub signals when using windmill appliance with data… URL: https://github.com/apache/beam/pull/10034 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340248) Time Spent: 0.5h (was: 20m) > Stop using pubsub in fnapi streaming dataflow Impluse > - > > Key: BEAM-8524 > URL: https://issues.apache.org/jira/browse/BEAM-8524 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.18.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8521. --- Resolution: Fixed > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8534) XlangParquetIOTest failing
[ https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8534. --- Fix Version/s: Not applicable Resolution: Fixed > XlangParquetIOTest failing > -- > > Key: BEAM-8534 > URL: https://issues.apache.org/jira/browse/BEAM-8534 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Heejong Lee >Priority: Major > Fix For: Not applicable > > > *13:43:05* [grpc-default-executor-1] ERROR > org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while > trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: > unable to deserialize Custom DoFn With Execution Info*13:43:05* at > org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* >at > org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* > at > org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* > at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: > java.io.InvalidClassException: > org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class > incompatible: stream classdesc serialVersionUID = -7089438576249123133, local > class serialVersionUID = -7141898054594373712*13:43:05* at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* >at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at >
[jira] [Commented] (BEAM-8534) XlangParquetIOTest failing
[ https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969668#comment-16969668 ] Kyle Weaver commented on BEAM-8534: --- Fixed by [https://github.com/apache/beam/pull/10017]. [https://builds.apache.org/job/beam_PostCommit_XVR_Flink/] > XlangParquetIOTest failing > -- > > Key: BEAM-8534 > URL: https://issues.apache.org/jira/browse/BEAM-8534 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Heejong Lee >Priority: Major > > *13:43:05* [grpc-default-executor-1] ERROR > org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while > trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: > unable to deserialize Custom DoFn With Execution Info*13:43:05* at > org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* >at > org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* > at > org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* > at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: > java.io.InvalidClassException: > org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class > incompatible: stream classdesc serialVersionUID = -7089438576249123133, local > class serialVersionUID = -7141898054594373712*13:43:05* at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* >at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at >
[jira] [Commented] (BEAM-8298) Implement state caching for side inputs
[ https://issues.apache.org/jira/browse/BEAM-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969667#comment-16969667 ] Jing Chen commented on BEAM-8298: - [~mxm] would you mind sharing details on the issue? I am kinda interested on working on it if it is still free > Implement state caching for side inputs > --- > > Key: BEAM-8298 > URL: https://issues.apache.org/jira/browse/BEAM-8298 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Maximilian Michels >Priority: Major > > Caching is currently only implemented for user state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8570) Use SDK version in default Java container tag
[ https://issues.apache.org/jira/browse/BEAM-8570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8570. --- Fix Version/s: 2.18.0 Resolution: Fixed > Use SDK version in default Java container tag > - > > Key: BEAM-8570 > URL: https://issues.apache.org/jira/browse/BEAM-8570 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by > default [1]. This causes confusion when using locally built containers [2], > especially since images are automatically pulled, meaning the release image > is used instead of the developer's own image (BEAM-8545). > [[1] > https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91] > [[2] > https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8294. --- Fix Version/s: Not applicable Resolution: Fixed > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Fix For: Not applicable > > Time Spent: 1h 40m > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340238 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 07/Nov/19 23:53 Start Date: 07/Nov/19 23:53 Worklog Time Spent: 10m Work Description: liumomo315 commented on issue #10033: [BEAM-8575] Add a trigger test to test Discarding accumulation mode w… URL: https://github.com/apache/beam/pull/10033#issuecomment-551320229 @robertwb Hi Robert, this PR is a test case to cover early data with Discarding accumulation mode using the test suite you added:) May I get your review? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340238) Time Spent: 1h 10m (was: 1h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340237 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 07/Nov/19 23:47 Start Date: 07/Nov/19 23:47 Worklog Time Spent: 10m Work Description: liumomo315 commented on pull request #10033: [BEAM-8575] Add a trigger test to test Discarding accumulation mode w… URL: https://github.com/apache/beam/pull/10033 …ith early data **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Commented] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969660#comment-16969660 ] Chris Larsen commented on BEAM-8561: [~jkff] definitely, will do :) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340235 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 23:43 Start Date: 07/Nov/19 23:43 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-551317534 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340235) Time Spent: 8h 40m (was: 8.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8570) Use SDK version in default Java container tag
[ https://issues.apache.org/jira/browse/BEAM-8570?focusedWorklogId=340236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340236 ] ASF GitHub Bot logged work on BEAM-8570: Author: ASF GitHub Bot Created on: 07/Nov/19 23:43 Start Date: 07/Nov/19 23:43 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10017: [BEAM-8570] Use SDK version in default Java container tag URL: https://github.com/apache/beam/pull/10017 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340236) Time Spent: 40m (was: 0.5h) > Use SDK version in default Java container tag > - > > Key: BEAM-8570 > URL: https://issues.apache.org/jira/browse/BEAM-8570 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Currently, the Java SDK uses container `apachebeam/java_sdk:latest` by > default [1]. This causes confusion when using locally built containers [2], > especially since images are automatically pulled, meaning the release image > is used instead of the developer's own image (BEAM-8545). > [[1] > https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91|https://github.com/apache/beam/blob/473377ef8f51949983508f70663e75ef0ee24a7f/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/Environments.java#L84-L91] > [[2] > https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E|https://lists.apache.org/thread.html/07131e314e229ec60100eaa2c0cf6dfc206bf2b0f78c3cee9ebb0bda@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-876) Support schemaUpdateOption in BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-876?focusedWorklogId=340234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340234 ] ASF GitHub Bot logged work on BEAM-876: --- Author: ASF GitHub Bot Created on: 07/Nov/19 23:41 Start Date: 07/Nov/19 23:41 Worklog Time Spent: 10m Work Description: ziel commented on issue #9524: [BEAM-876] Support schemaUpdateOption in BigQueryIO URL: https://github.com/apache/beam/pull/9524#issuecomment-551317214 @pabloem I found some time to write up an integration test. I'm seeing a failing check (`org.apache.beam.sdk.io.jms.JmsIOTest.testCheckpointMarkSafety`) which I think may be unrelated? I may be missing the connection though :-S There are a number of lines like this in the output like: `java.lang.SecurityException: User name [test_user] or password is invalid.` ...so I suspect this may be some sort of CI setup thing.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340234) Time Spent: 2h 10m (was: 2h) > Support schemaUpdateOption in BigQueryIO > > > Key: BEAM-876 > URL: https://issues.apache.org/jira/browse/BEAM-876 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Eugene Kirpichov >Assignee: canaan silberberg >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > BigQuery recently added support for updating the schema as a side effect of > the load job. > Here is the relevant API method in JobConfigurationLoad: > https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html#setSchemaUpdateOptions(java.util.List) > BigQueryIO should support this too. See user request for this: > http://stackoverflow.com/questions/40333245/is-it-possible-to-update-schema-while-doing-a-load-into-an-existing-bigquery-tab -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340233 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 07/Nov/19 23:41 Start Date: 07/Nov/19 23:41 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998#issuecomment-551317048 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340233) Time Spent: 2h (was: 1h 50m) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340232 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 07/Nov/19 23:41 Start Date: 07/Nov/19 23:41 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998#issuecomment-551317000 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340232) Time Spent: 1h 50m (was: 1h 40m) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340230 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 07/Nov/19 23:31 Start Date: 07/Nov/19 23:31 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998#discussion_r343858825 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1953,8 +1953,10 @@ class BeamModulePlugin implements Plugin { } project.ext.addPortableWordCountTasks = { -> -addPortableWordCountTask(false) -addPortableWordCountTask(true) +addPortableWordCountTask(false, "PortableRunner") Review comment: > Theoretically, the PortableRunner is a subset of FlinkRunner, but there are still things that can go wrong +1 As long as PortableRunner remains a publicly documented option (and it will probably have to for at least a little while longer) we should test it. Also, they are testing slightly different things -- PortableRunner starts a containerized job server, while FlinkRunner (when local) starts the job server in a Java subprocess. > Since these tasks are Flink specific, should they be renamed and possibly moved out of BeamModulePlugin? I'm going to add these tasks to Spark ([eventually](https://issues.apache.org/jira/browse/BEAM-7224)). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340230) Time Spent: 1h 40m (was: 1.5h) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8583: Description: * Add BigQuery Dialect with TypeTranslation (since it is not implemented in Calcite 1.20.0, but is present in unreleased versions). * Create a BigQueryFilter class. * BigQueryTable#buildIOReader should translate supported filters into a Sql string and pass it to BigQueryIO. Potential improvements: * After updating vendor Calcite, class `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's `BigQuerySqlDialect` can be utilized instead. * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` should be updated. > [SQL] BigQuery should support predicate push-down in DIRECT_READ mode > - > > Key: BEAM-8583 > URL: https://issues.apache.org/jira/browse/BEAM-8583 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > * Add BigQuery Dialect with TypeTranslation (since it is not implemented in > Calcite 1.20.0, but is present in unreleased versions). > * Create a BigQueryFilter class. > * BigQueryTable#buildIOReader should translate supported filters into a Sql > string and pass it to BigQueryIO. > > Potential improvements: > * After updating vendor Calcite, class > `BigQuerySqlDialectWithTypeTranslation` can be deleted and Calcite's > `BigQuerySqlDialect` can be utilized instead. > * Once BigQuery adds support for more filters, `BigQueryFilter#isSupported` > should be updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340222 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 07/Nov/19 23:10 Start Date: 07/Nov/19 23:10 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r343922194 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java ## @@ -138,12 +138,7 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { builder.withSelectedFields(fieldNames); Review comment: Talked about this more, `builder` isn't actually a builder. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340222) Time Spent: 4h 40m (was: 4.5h) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out
[ https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=340220=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340220 ] ASF GitHub Bot logged work on BEAM-8294: Author: ASF GitHub Bot Created on: 07/Nov/19 23:07 Start Date: 07/Nov/19 23:07 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #: [BEAM-8294] run Spark portable validates runner tests in parallel URL: https://github.com/apache/beam/pull/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340220) Time Spent: 1h 40m (was: 1.5h) > Spark portable validates runner tests timing out > > > Key: BEAM-8294 > URL: https://issues.apache.org/jira/browse/BEAM-8294 > Project: Beam > Issue Type: Improvement > Components: runner-spark, test-failures, testing >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing, portability-spark > Time Spent: 1h 40m > Remaining Estimate: 0h > > This postcommit has been timing out for 11 days. > [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. > Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK > worker management stack caused this to slow down. > [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340219=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340219 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 07/Nov/19 23:02 Start Date: 07/Nov/19 23:02 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340219) Time Spent: 4.5h (was: 4h 20m) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340218=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340218 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 07/Nov/19 23:01 Start Date: 07/Nov/19 23:01 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r343919486 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamPushDownIOSourceRel.java ## @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rel; + +import static org.apache.beam.vendor.calcite.v1_20_0.com.google.common.base.Preconditions.checkArgument; + +import java.util.List; +import java.util.Map; +import org.apache.beam.sdk.extensions.sql.impl.BeamCalciteTable; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTable; +import org.apache.beam.sdk.extensions.sql.meta.BeamSqlTableFilter; +import org.apache.beam.sdk.extensions.sql.meta.DefaultTableFilter; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptTable; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelWriter; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; + +public class BeamPushDownIOSourceRel extends BeamIOSourceRel { + private final List usedFields; + private final BeamSqlTableFilter tableFilters; + + public BeamPushDownIOSourceRel( + RelOptCluster cluster, + RelTraitSet traitSet, + RelOptTable table, + BeamSqlTable beamTable, + List usedFields, + BeamSqlTableFilter tableFilters, + Map pipelineOptions, + BeamCalciteTable calciteTable) { +super(cluster, traitSet, table, beamTable, pipelineOptions, calciteTable); +this.usedFields = usedFields; +this.tableFilters = tableFilters; + } + + @Override + public RelWriter explainTerms(RelWriter pw) { +super.explainTerms(pw); + +// This is done to tell Calcite planner that BeamIOSourceRel cannot be simply substituted by +// another BeamIOSourceRel, except for when they carry the same content. +if (!usedFields.isEmpty()) { + pw.item("usedFields", usedFields.toString()); +} +if (!(tableFilters instanceof DefaultTableFilter)) { + pw.item(tableFilters.getClass().getSimpleName(), tableFilters.toString()); +} + +return pw; + } + + @Override + public PTransform, PCollection> buildPTransform() { +return new Transform(); + } + + private class Transform extends PTransform, PCollection> { + +@Override +public PCollection expand(PCollectionList input) { + checkArgument( + input.size() == 0, + "Should not have received input for %s: %s", + BeamIOSourceRel.class.getSimpleName(), + input); + + final PBegin begin = input.getPipeline().begin(); + final BeamSqlTable beamSqlTable = BeamPushDownIOSourceRel.this.getBeamSqlTable(); + + if (usedFields.isEmpty() && tableFilters instanceof DefaultTableFilter) { +return beamSqlTable.buildIOReader(begin); + } + + final Schema newBeamSchema = CalciteUtils.toSchema(getRowType()); + return beamSqlTable + .buildIOReader(begin, tableFilters, usedFields) +
[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down
[ https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=340216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340216 ] ASF GitHub Bot logged work on BEAM-8508: Author: ASF GitHub Bot Created on: 07/Nov/19 22:58 Start Date: 07/Nov/19 22:58 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9943: [BEAM-8508] [SQL] Standalone filter push down URL: https://github.com/apache/beam/pull/9943#discussion_r343918593 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/BigQueryTable.java ## @@ -138,12 +138,7 @@ public BeamTableStatistics getTableStatistics(PipelineOptions options) { builder.withSelectedFields(fieldNames); Review comment: I think this is fine as it is. Builder normally stores the changes and just returns itself as a convenience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340216) Time Spent: 4h 10m (was: 4h) > [SQL] Support predicate push-down without project push-down > --- > > Key: BEAM-8508 > URL: https://issues.apache.org/jira/browse/BEAM-8508 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > In this PR: [https://github.com/apache/beam/pull/9863] > Support for Predicate push-down is added, but only for IOs that support > project push-down. > In order to accomplish that some checks need to be added to not perform > certain Calc and IO manipulations when only filter push-down is needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.
[ https://issues.apache.org/jira/browse/BEAM-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969621#comment-16969621 ] Valentyn Tymofieiev commented on BEAM-8418: --- For anyone following this, the issues is fixed in 2.17.0. The workaround on earlier releases is to replace _ = p | beam.Impulse() with _ = p | beam.Create([None]) or something similar. > Fix handling of Impulse transform in Dataflow runner. > -- > > Key: BEAM-8418 > URL: https://issues.apache.org/jira/browse/BEAM-8418 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Following pipeline fails on Dataflow runner unless we use beam_fn_api > experiment. > {noformat} > class NoOpDoFn(beam.DoFn): > def process(self, element): > return element > p = beam.Pipeline(options=pipeline_options) > _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn()) > result = p.run() > {noformat} > The reason is that we encode Impluse payload using url-escaping in [1], while > Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF > runner expects URL escaping. > We should fix or reconcile the encoding in non-FnAPI path, and add a > ValidatesRunner test that catches this error. > [1] > https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8587) Add TestStream support for Dataflow runner
Andrew Crites created BEAM-8587: --- Summary: Add TestStream support for Dataflow runner Key: BEAM-8587 URL: https://issues.apache.org/jira/browse/BEAM-8587 Project: Beam Issue Type: Improvement Components: runner-dataflow, testing Reporter: Andrew Crites TestStream support needed to test features like late data and processing time triggers on local Dataflow runner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=340193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340193 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 07/Nov/19 22:35 Start Date: 07/Nov/19 22:35 Worklog Time Spent: 10m Work Description: liumomo315 commented on issue #9957: [BEAM-8575] Add validates runner tests for 1. Custom window fn: Test a customized window fn work as expected; 2. Windows idempotency: Applying the same window fn (or window fn + GBK) to the input multiple times will have the same effect as applying it once. URL: https://github.com/apache/beam/pull/9957#issuecomment-551297870 Hi Luke, sorry that I missed to read the "how to make review process smoother" and merged my commits locally and force pushed to origin. Will make sure not do that in the future. @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340193) Time Spent: 50m (was: 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340183 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 22:15 Start Date: 07/Nov/19 22:15 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551291329 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340183) Time Spent: 5h 40m (was: 5.5h) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340182 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 22:15 Start Date: 07/Nov/19 22:15 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551291329 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340182) Time Spent: 5.5h (was: 5h 20m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8574) [SQL] MongoDb PostCommit_SQL fails
[ https://issues.apache.org/jira/browse/BEAM-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov closed BEAM-8574. --- Fix Version/s: 2.18.0 Resolution: Resolved > [SQL] MongoDb PostCommit_SQL fails > -- > > Key: BEAM-8574 > URL: https://issues.apache.org/jira/browse/BEAM-8574 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Critical > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Integration test for Sql MongoDb table read and write fails. > Jenkins: > [https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_SQL/3126/] > Cause: [https://github.com/apache/beam/pull/9806] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8585) Include path in error message in path_to_beam_jar
[ https://issues.apache.org/jira/browse/BEAM-8585?focusedWorklogId=340179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340179 ] ASF GitHub Bot logged work on BEAM-8585: Author: ASF GitHub Bot Created on: 07/Nov/19 22:04 Start Date: 07/Nov/19 22:04 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10032: [BEAM-8585] Include path in error message in path_to_beam_jar URL: https://github.com/apache/beam/pull/10032 Admittedly a pretty trivial change, but maybe it will be useful some day. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340178 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 22:03 Start Date: 07/Nov/19 22:03 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551287100 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340178) Time Spent: 5h 20m (was: 5h 10m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340177 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 22:03 Start Date: 07/Nov/19 22:03 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551287100 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340177) Time Spent: 5h 10m (was: 5h) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8585) Include path in error message in path_to_beam_jar
[ https://issues.apache.org/jira/browse/BEAM-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8585: -- Priority: Trivial (was: Minor) > Include path in error message in path_to_beam_jar > - > > Key: BEAM-8585 > URL: https://issues.apache.org/jira/browse/BEAM-8585 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Trivial > Labels: portability-flink > > Right now, the error message looks like this when the job server jar can't be > found: > 12:35:50 RuntimeError: Please build the server with > 12:35:50 cd > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; > ./gradlew runners:flink:1.9:job-server:shadowJar > I would like to know the path of the missing jar to help me debug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8586) Add a server for MongoDb Integration Test
Kirill Kozlov created BEAM-8586: --- Summary: Add a server for MongoDb Integration Test Key: BEAM-8586 URL: https://issues.apache.org/jira/browse/BEAM-8586 Project: Beam Issue Type: Test Components: dsl-sql Reporter: Kirill Kozlov We need to pass pipeline options with server information to the MongoDbReadWriteIT. For now that test is ignored and excluded from the build.gradle file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8585) Include path in error message in path_to_beam_jar
[ https://issues.apache.org/jira/browse/BEAM-8585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8585: -- Status: Open (was: Triage Needed) > Include path in error message in path_to_beam_jar > - > > Key: BEAM-8585 > URL: https://issues.apache.org/jira/browse/BEAM-8585 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Minor > > Right now, the error message looks like this when the job server jar can't be > found: > 12:35:50 RuntimeError: Please build the server with > 12:35:50 cd > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; > ./gradlew runners:flink:1.9:job-server:shadowJar > I would like to know the path of the missing jar to help me debug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8585) Include path in error message in path_to_beam_jar
Kyle Weaver created BEAM-8585: - Summary: Include path in error message in path_to_beam_jar Key: BEAM-8585 URL: https://issues.apache.org/jira/browse/BEAM-8585 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Kyle Weaver Assignee: Kyle Weaver Right now, the error message looks like this when the job server jar can't be found: 12:35:50 RuntimeError: Please build the server with 12:35:50 cd /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src; ./gradlew runners:flink:1.9:job-server:shadowJar I would like to know the path of the missing jar to help me debug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340172 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 21:41 Start Date: 07/Nov/19 21:41 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551278228 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340172) Time Spent: 5h (was: 4h 50m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340171 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 21:41 Start Date: 07/Nov/19 21:41 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031#issuecomment-551278228 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340171) Time Spent: 4h 50m (was: 4h 40m) > [SQL] Add support for MongoDB source > > > Key: BEAM-8427 > URL: https://issues.apache.org/jira/browse/BEAM-8427 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > In progress: > * Create a MongoDB table and table provider. > * Implement buildIOReader > * Support primitive types > Still needs to be done: > * Implement buildIOWrite > * improve getTableStatistics -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8584) Remove TestPipelineOptions
Brian Hulette created BEAM-8584: --- Summary: Remove TestPipelineOptions Key: BEAM-8584 URL: https://issues.apache.org/jira/browse/BEAM-8584 Project: Beam Issue Type: Improvement Components: testing Reporter: Brian Hulette Assignee: Brian Hulette See [ML thread|https://lists.apache.org/thread.html/cc2ac6db764e0d750688f8bae540728e38759365b86ba6f3fabfa6dd@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=340167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340167 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 07/Nov/19 21:37 Start Date: 07/Nov/19 21:37 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #10031: [BEAM-8427] Create a table and a table provider for MongoDB URL: https://github.com/apache/beam/pull/10031 - Create a Table with read support (for now, write will be added in a separate PR) - Add some tests Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Closed] (BEAM-6274) Timer Backlog Bug
[ https://issues.apache.org/jira/browse/BEAM-6274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Rohde closed BEAM-6274. --- Fix Version/s: Not applicable Resolution: Won't Fix > Timer Backlog Bug > - > > Key: BEAM-6274 > URL: https://issues.apache.org/jira/browse/BEAM-6274 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: Not applicable > > > * Move timer receiver into a new class > * Investigate what the getNextFiredTimer(window coder) parameter actually > does > * Add custom payload feature > * Set the correct "IsBounded" on the generated MainInput PCollection for > timers (CreateExecutableStageNodeFunction) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6453) Create a single Jenkins job or Gradle task to serve for release test validation
[ https://issues.apache.org/jira/browse/BEAM-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Rohde reassigned BEAM-6453: --- Assignee: (was: Sam Rohde) > Create a single Jenkins job or Gradle task to serve for release test > validation > --- > > Key: BEAM-6453 > URL: https://issues.apache.org/jira/browse/BEAM-6453 > Project: Beam > Issue Type: Improvement > Components: project-management >Reporter: Sam Rohde >Priority: Major > > As per [https://github.com/apache/beam/pull/7509,] it looks like you can only > run a single jenkins job per phrase per comment. In addition, the list of > precommit and postcommit jobs will easily get stale. By creating a Jenkins > job or Gradle task, we can kill two birds with one stone. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6672) Make bundle execution with ExecutableStage support user states
[ https://issues.apache.org/jira/browse/BEAM-6672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Rohde reassigned BEAM-6672: --- Assignee: (was: Sam Rohde) > Make bundle execution with ExecutableStage support user states > -- > > Key: BEAM-6672 > URL: https://issues.apache.org/jira/browse/BEAM-6672 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Rohde resolved BEAM-8491. - Fix Version/s: 2.16.0 Resolution: Fixed > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.16.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
[ https://issues.apache.org/jira/browse/BEAM-8583?focusedWorklogId=340153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340153 ] ASF GitHub Bot logged work on BEAM-8583: Author: ASF GitHub Bot Created on: 07/Nov/19 21:07 Start Date: 07/Nov/19 21:07 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #10030: [BEAM-8583] Big query filter push down URL: https://github.com/apache/beam/pull/10030 - BigQuery should apply predicate push-down when appropriate. Based on top of #9943. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-8583) [SQL] BigQuery should support predicate push-down in DIRECT_READ mode
Kirill Kozlov created BEAM-8583: --- Summary: [SQL] BigQuery should support predicate push-down in DIRECT_READ mode Key: BEAM-8583 URL: https://issues.apache.org/jira/browse/BEAM-8583 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8582) Python SDK emits duplicate records for Default and AfterWatermark triggers
Sam Rohde created BEAM-8582: --- Summary: Python SDK emits duplicate records for Default and AfterWatermark triggers Key: BEAM-8582 URL: https://issues.apache.org/jira/browse/BEAM-8582 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Sam Rohde Assignee: Sam Rohde This was found after fixing https://issues.apache.org/jira/browse/BEAM-8581. The fix for 8581 was to pass in the input watermark. Previously, it was using MIN_TIMESTAMP for all of its EOW calculations. By giving it a proper input watermark, this bug started to manifest. The DefaultTrigger and AfterWatermark do not clear their timers after the watermark passed the end of the endow, leading to duplicate records being emitted. Fix: Clear the watermark timer when the watermark reaches the end of the window. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8581) Python SDK labels ontime empty panes as late
Sam Rohde created BEAM-8581: --- Summary: Python SDK labels ontime empty panes as late Key: BEAM-8581 URL: https://issues.apache.org/jira/browse/BEAM-8581 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Sam Rohde Assignee: Sam Rohde The GeneralTriggerDriver does not put watermark holds on timers, leading to the ontime empty pane being considered late data. Fix: Add a new notion of whether a trigger has an ontime pane. If it does, then set a watermark hold to end of window - 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8580) Request Python API to support windows ClosingBehavior
wendy liu created BEAM-8580: --- Summary: Request Python API to support windows ClosingBehavior Key: BEAM-8580 URL: https://issues.apache.org/jira/browse/BEAM-8580 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: wendy liu Beam Python should have an API to support windows ClosingBehavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11
[ https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340150 ] ASF GitHub Bot logged work on BEAM-8559: Author: ASF GitHub Bot Created on: 07/Nov/19 20:40 Start Date: 07/Nov/19 20:40 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run Nexmark Dataflow suites with Java 11 URL: https://github.com/apache/beam/pull/9994#issuecomment-551254214 Run Dataflow Runner Nexmark Tests - Java 11 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340150) Time Spent: 40m (was: 0.5h) > Run Dataflow Nexmark suites with Java 11 > > > Key: BEAM-8559 > URL: https://issues.apache.org/jira/browse/BEAM-8559 > Project: Beam > Issue Type: Sub-task > Components: testing-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > This task is similar to https://issues.apache.org/jira/browse/BEAM-6936. > The goal is to run Nexmark suites with Java 11 but compile with java 8 to > verify compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340146 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 07/Nov/19 20:34 Start Date: 07/Nov/19 20:34 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998#issuecomment-551251982 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340146) Time Spent: 1.5h (was: 1h 20m) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8512) Add integration tests for Python "flink_runner.py"
[ https://issues.apache.org/jira/browse/BEAM-8512?focusedWorklogId=340139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340139 ] ASF GitHub Bot logged work on BEAM-8512: Author: ASF GitHub Bot Created on: 07/Nov/19 20:29 Start Date: 07/Nov/19 20:29 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9998: [BEAM-8512] Add integration tests for flink_runner.py URL: https://github.com/apache/beam/pull/9998#discussion_r343858825 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -1953,8 +1953,10 @@ class BeamModulePlugin implements Plugin { } project.ext.addPortableWordCountTasks = { -> -addPortableWordCountTask(false) -addPortableWordCountTask(true) +addPortableWordCountTask(false, "PortableRunner") Review comment: > Theoretically, the PortableRunner is a subset of FlinkRunner, but there are still things that can go wrong +1 As long as PortableRunner remains a publicly documented option (and it will probably have to for at least a little while longer) we should test it. Also, they are testing slightly different things -- PortableRunner starts a containerized job server, while FlinkRunner (when local) starts the job server in a Java subprocess. > Since these tasks are Flink specific, should they be renamed and possibly moved out of BeamModulePlugin? I'm going to add these tasks to Spark (eventually). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340139) Time Spent: 1h 20m (was: 1h 10m) > Add integration tests for Python "flink_runner.py" > -- > > Key: BEAM-8512 > URL: https://issues.apache.org/jira/browse/BEAM-8512 > Project: Beam > Issue Type: Test > Components: runner-flink, sdk-py-core >Reporter: Maximilian Michels >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are currently no integration tests for the Python FlinkRunner. We need > a set of tests similar to {{flink_runner_test.py}} which currently use the > PortableRunner and not the FlinkRunner. > CC [~robertwb] [~ibzib] [~thw] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11
[ https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340137 ] ASF GitHub Bot logged work on BEAM-8559: Author: ASF GitHub Bot Created on: 07/Nov/19 20:27 Start Date: 07/Nov/19 20:27 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run Nexmark Dataflow suites with Java 11 URL: https://github.com/apache/beam/pull/9994#issuecomment-551249539 Run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340137) Time Spent: 0.5h (was: 20m) > Run Dataflow Nexmark suites with Java 11 > > > Key: BEAM-8559 > URL: https://issues.apache.org/jira/browse/BEAM-8559 > Project: Beam > Issue Type: Sub-task > Components: testing-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > This task is similar to https://issues.apache.org/jira/browse/BEAM-6936. > The goal is to run Nexmark suites with Java 11 but compile with java 8 to > verify compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11
[ https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340129 ] ASF GitHub Bot logged work on BEAM-8559: Author: ASF GitHub Bot Created on: 07/Nov/19 20:13 Start Date: 07/Nov/19 20:13 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run Nexmark Dataflow suites with Java 11 URL: https://github.com/apache/beam/pull/9994#issuecomment-551244425 Run Dataflow Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340129) Time Spent: 20m (was: 10m) > Run Dataflow Nexmark suites with Java 11 > > > Key: BEAM-8559 > URL: https://issues.apache.org/jira/browse/BEAM-8559 > Project: Beam > Issue Type: Sub-task > Components: testing-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > This task is similar to https://issues.apache.org/jira/browse/BEAM-6936. > The goal is to run Nexmark suites with Java 11 but compile with java 8 to > verify compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340125=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340125 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 20:03 Start Date: 07/Nov/19 20:03 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343848058 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: Discussed with David and Sam. Since we also want to track jobs started from notebook even if the user never uses `InteractiveRunner`, checking the environment might just be the only way to do it. By putting the logic into a try-except block as it is, we could avoid introducing `ipython` dependency into `DataflowRunner`. If the `[interactive]` dependency is never installed and current execution_path has never imported `ipython`, the code would just never be executed. I'll move the logic into a standalone utility module and import it in DataflowRunner to do the check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340125) Time Spent: 8.5h (was: 8h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8559) Run Dataflow Nexmark suites with Java 11
[ https://issues.apache.org/jira/browse/BEAM-8559?focusedWorklogId=340123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340123 ] ASF GitHub Bot logged work on BEAM-8559: Author: ASF GitHub Bot Created on: 07/Nov/19 20:00 Start Date: 07/Nov/19 20:00 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #9994: [BEAM-8559] Run Nexmark Dataflow suites with Java 11 URL: https://github.com/apache/beam/pull/9994#issuecomment-551239737 Run seed job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340123) Remaining Estimate: 0h Time Spent: 10m > Run Dataflow Nexmark suites with Java 11 > > > Key: BEAM-8559 > URL: https://issues.apache.org/jira/browse/BEAM-8559 > Project: Beam > Issue Type: Sub-task > Components: testing-nexmark >Reporter: Lukasz Gajowy >Assignee: Lukasz Gajowy >Priority: Minor > Fix For: Not applicable > > Time Spent: 10m > Remaining Estimate: 0h > > This task is similar to https://issues.apache.org/jira/browse/BEAM-6936. > The goal is to run Nexmark suites with Java 11 but compile with java 8 to > verify compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340120 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 19:57 Start Date: 07/Nov/19 19:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343845181 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: Putting this discussion on next Monday's agenda and will remove changes to the API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340120) Time Spent: 8h 20m (was: 8h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)