[jira] [Updated] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails
[ https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michal Walenia updated BEAM-8980: - Description: When running a GBK Load test using Java harness image and JobServer image generated from master, the load test fails with a cryptic exception: {code:java} Exception in thread "main" java.lang.RuntimeException: Invalid job state: FAILED. 11:45:31at org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) 11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) {code} After some investigation, I found a stacktrace of the error: {code:java} org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Suppressed: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle failed, TODO: [BEAM-3962] abort bundle. at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) ... 8 more {code} It seems that the core issue is an IllegalStateException thrown from SdkHarnessClient.java:320, related to BEAM-3962. It is important to note that the stacktrace above comes from the Flink cluster, not from the Gradle job that was executed. The link to Jenkins job is here: [https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console] was: When running a GBK Load test using Java harness image and JobServer image generated from master, the load test fails with a cryptic exception: {code:java} Exception in thread "main" java.lang.RuntimeException: Invalid job state: FAILED. 11:45:31at org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) 11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) {code} After some investigation, I found a stacktrace of the error: {code:java} org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at
[jira] [Comment Edited] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails
[ https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998839#comment-16998839 ] Michal Walenia edited comment on BEAM-8980 at 12/18/19 6:52 AM: [~angoenka] thanks for reminding me, I added the link, though there's not much more in the console - the execution took place on the Flink cluster which was wiped during the final phase of the job. The second stacktrace is from a Flink cluster I used for investigation of the error was (Author: mwalenia): [~angoenka] thanks for reminding me, I added the link, though there's not much more in the console - the execution took place on the Flink cluster which was wiped during the final phase of the job. > Running GroupByKeyLoadTest on Portable Flink fails > -- > > Key: BEAM-8980 > URL: https://issues.apache.org/jira/browse/BEAM-8980 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michal Walenia >Priority: Major > > When running a GBK Load test using Java harness image and JobServer image > generated from master, the load test fails with a cryptic exception: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Invalid job state: > FAILED. > 11:45:31 at > org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) > 11:45:31 at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) > {code} > > After some investigation, I found a stacktrace of the error: > {code:java} > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already > cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at > java.lang.Thread.run(Thread.java:748) Suppressed: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) > ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle > failed, TODO: [BEAM-3962] abort bundle. at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) > ... 8 more > {code} > It seems that the core issue is an IllegalStateException thrown from > SdkHarnessClient.java:320, related to BEAM-3962. > It is important to note that the stacktrace above comes from the Flink > cluster, not from the Gradle job that was executed. > The link to Jenkins job is here: >
[jira] [Commented] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails
[ https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998839#comment-16998839 ] Michal Walenia commented on BEAM-8980: -- [~angoenka] thanks for reminding me, I added the link, though there's not much more in the console - the execution took place on the Flink cluster which was wiped during the final phase of the job. > Running GroupByKeyLoadTest on Portable Flink fails > -- > > Key: BEAM-8980 > URL: https://issues.apache.org/jira/browse/BEAM-8980 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michal Walenia >Priority: Major > > When running a GBK Load test using Java harness image and JobServer image > generated from master, the load test fails with a cryptic exception: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Invalid job state: > FAILED. > 11:45:31 at > org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) > 11:45:31 at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) > {code} > > After some investigation, I found a stacktrace of the error: > {code:java} > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already > cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at > java.lang.Thread.run(Thread.java:748) Suppressed: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) > ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle > failed, TODO: [BEAM-3962] abort bundle. at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) > ... 8 more > {code} > It seems that the core issue is an IllegalStateException thrown from > SdkHarnessClient.java:320, related to BEAM-3962. > > The link to Jenkins job is here: > [https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails
[ https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michal Walenia updated BEAM-8980: - Description: When running a GBK Load test using Java harness image and JobServer image generated from master, the load test fails with a cryptic exception: {code:java} Exception in thread "main" java.lang.RuntimeException: Invalid job state: FAILED. 11:45:31at org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) 11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) {code} After some investigation, I found a stacktrace of the error: {code:java} org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) at org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) Suppressed: org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) at org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) at org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle failed, TODO: [BEAM-3962] abort bundle. at org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) ... 8 more {code} It seems that the core issue is an IllegalStateException thrown from SdkHarnessClient.java:320, related to BEAM-3962. The link to Jenkins job is here: [https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console] was: When running a GBK Load test using Java harness image and JobServer image generated from master, the load test fails with a cryptic exception: {code:java} Exception in thread "main" java.lang.RuntimeException: Invalid job state: FAILED. 11:45:31at org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) 11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) 11:45:31at org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) {code} After some investigation, I found a stacktrace of the error: {code:java} org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: call already cancelled at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) at org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) at
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361291 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Dec/19 04:42 Start Date: 18/Dec/19 04:42 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10159: [BEAM-8575] Added a unit test to CombineTest class to test that Combi… URL: https://github.com/apache/beam/pull/10159#issuecomment-566864658 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361291) Time Spent: 38h 10m (was: 38h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 38h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361289 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Dec/19 04:38 Start Date: 18/Dec/19 04:38 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10190: [BEAM-8575] Added two unit tests to CombineTest class to test that Co… URL: https://github.com/apache/beam/pull/10190#discussion_r357487258 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -399,6 +432,108 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + # Test that three different kinds of metrics work with a customized + # CounterIncrememtingCombineFn. + def test_simple_combine(self): +p = TestPipeline() +input = (p + | beam.Create([('c', 'b'), +('c', 'be'), +('c', 'bea'), +('d', 'beam'), +('d', 'apache')])) + +# The result of concatenating all values regardless of key. +global_concat = (input + | beam.Values() + | beam.CombineGlobally(CounterIncrememtingCombineFn())) + +# The (key, concatenated_string) pairs for all keys. +concat_per_key = (input | beam.CombinePerKey( +CounterIncrememtingCombineFn())) + +result = p.run() +result.wait_until_finish() + +# Verify the concatenated strings are correct. +expected_concat_per_key = [('c', 'bbebea'), ('d', 'beamapache')] +assert_that(global_concat, equal_to(['bbebeabeamapache']), +label='global concat') +assert_that(concat_per_key, equal_to(expected_concat_per_key), +label='concat per key') + +# Verify the values of metrics are correct. +word_counter_filter = MetricsFilter().with_name('word_counter') +query_result = result.metrics().query(word_counter_filter) +if query_result['counters']: + word_counter = query_result['counters'][0] + self.assertEqual(word_counter.result, 5) + +word_lengths_filter = MetricsFilter().with_name('word_lengths') +query_result = result.metrics().query(word_lengths_filter) +if query_result['counters']: + word_lengths = query_result['counters'][0] + self.assertEqual(word_lengths.result, 16) + +word_len_dist_filter = MetricsFilter().with_name('word_len_dist') +query_result = result.metrics().query(word_len_dist_filter) +if query_result['distributions']: + word_len_dist = query_result['distributions'][0] + self.assertEqual(word_len_dist.result.mean, 3.2) Review comment: Done. After using different words as input, the mean is now 3. self.assertEqual(word_len_dist.result.mean, 3) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361289) Time Spent: 38h (was: 37h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 38h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361288 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Dec/19 04:36 Start Date: 18/Dec/19 04:36 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10190: [BEAM-8575] Added two unit tests to CombineTest class to test that Co… URL: https://github.com/apache/beam/pull/10190#discussion_r357473023 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -48,6 +50,37 @@ from apache_beam.utils.timestamp import Timestamp +class CounterIncrememtingCombineFn(beam.CombineFn): + """CombineFn for incrementing three different counters: + counter, distribution, gauge, + at the same time concatenating words.""" + + def __init__(self): +beam.CombineFn.__init__(self) +self.word_counter = Metrics.counter(self.__class__, 'word_counter') +self.word_lengths_counter = Metrics.counter( +self.__class__, 'word_lengths') +self.word_lengths_dist = Metrics.distribution( +self.__class__, 'word_len_dist') +self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len') + + def create_accumulator(self): +return '' + + def add_input(self, acc, element): +self.word_counter.inc(1) +self.word_lengths_counter.inc(len(element)) +self.word_lengths_dist.update(len(element)) +self.last_word_len.set(len(element)) +return acc + element Review comment: Done. Uses ''.join() to convert the list to a string. return ''.join(sorted(acc + element)) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361288) Time Spent: 37h 50m (was: 37h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 37h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361287 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Dec/19 04:35 Start Date: 18/Dec/19 04:35 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10190: [BEAM-8575] Added two unit tests to CombineTest class to test that Co… URL: https://github.com/apache/beam/pull/10190#discussion_r357473023 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -48,6 +50,37 @@ from apache_beam.utils.timestamp import Timestamp +class CounterIncrememtingCombineFn(beam.CombineFn): + """CombineFn for incrementing three different counters: + counter, distribution, gauge, + at the same time concatenating words.""" + + def __init__(self): +beam.CombineFn.__init__(self) +self.word_counter = Metrics.counter(self.__class__, 'word_counter') +self.word_lengths_counter = Metrics.counter( +self.__class__, 'word_lengths') +self.word_lengths_dist = Metrics.distribution( +self.__class__, 'word_len_dist') +self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len') + + def create_accumulator(self): +return '' + + def add_input(self, acc, element): +self.word_counter.inc(1) +self.word_lengths_counter.inc(len(element)) +self.word_lengths_dist.update(len(element)) +self.last_word_len.set(len(element)) +return acc + element Review comment: Done. # ''.join() converts the list to a string. return ''.join(sorted(acc + element)) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361287) Time Spent: 37h 40m (was: 37.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 37h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam
[ https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361265 ] ASF GitHub Bot logged work on BEAM-8269: Author: ASF GitHub Bot Created on: 18/Dec/19 02:16 Start Date: 18/Dec/19 02:16 Worklog Time Spent: 10m Work Description: udim commented on pull request #9602: [BEAM-8269] Convert Py3 type hints to Beam types URL: https://github.com/apache/beam/pull/9602#discussion_r359123181 ## File path: sdks/python/apache_beam/typehints/typehints.py ## @@ -1190,9 +1190,14 @@ def get_yielded_type(type_hint): if is_consistent_with(type_hint, Iterator[Any]): return type_hint.yielded_type if is_consistent_with(type_hint, Tuple[Any, ...]): -return Union[type_hint.tuple_types] +if isinstance(type_hint, TupleConstraint): + return Union[type_hint.tuple_types] +else: # TupleSequenceConstraint + return type_hint.inner_type if is_consistent_with(type_hint, Iterable[Any]): return type_hint.inner_type + if is_consistent_with(type_hint, Dict[Any, Any]): Review comment: You're right, this branch has been removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361265) Time Spent: 2h 20m (was: 2h 10m) > IOTypehints.from_callable doesn't convert native type hints to Beam > --- > > Key: BEAM-8269 > URL: https://issues.apache.org/jira/browse/BEAM-8269 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Users typically write type hints using typing module types. We should allow > that, be internally convert these type to Beam module types for now. > In the future, Beam should stop using these internal types (BEAM-8156). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361264 ] ASF GitHub Bot logged work on BEAM-8933: Author: ASF GitHub Bot Created on: 18/Dec/19 02:06 Start Date: 18/Dec/19 02:06 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10384: [WIP] [BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as Rows URL: https://github.com/apache/beam/pull/10384#issuecomment-566833607 Pushed a commit that moved the arrow code into `:sdks:java:extension:arrow`, thanks for the suggestion @iemejia. I want to do a little more clean up tomorrow before sending this out for review but it should be enough for @11moon11 to work from for the gcp io changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361264) Time Spent: 3h 40m (was: 3.5h) > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam
[ https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361263 ] ASF GitHub Bot logged work on BEAM-8269: Author: ASF GitHub Bot Created on: 18/Dec/19 02:02 Start Date: 18/Dec/19 02:02 Worklog Time Spent: 10m Work Description: udim commented on pull request #9602: [BEAM-8269] Convert Py3 type hints to Beam types URL: https://github.com/apache/beam/pull/9602#discussion_r359120339 ## File path: sdks/python/apache_beam/typehints/decorators_test_py3.py ## @@ -33,18 +34,20 @@ decorators._enable_from_callable = True T = TypeVariable('T') +# Name is 'T' so it converts to a beam type with the same name. +T_typing = typing.TypeVar('T') class IOTypeHintsTest(unittest.TestCase): def test_from_callable(self): def fn(a: int, b: str = None, *args: Tuple[T], foo: List[int], - **kwargs: Dict[str, str]) -> Tuple: + **kwargs: Dict[str, str]) -> Tuple[Any, ...]: Review comment: Yes if these were typing module types, but in this case this is `typehints.Tuple[typehints.Any, ...]`, which requires subscripting to turn it into a constraint. I will remove such usage in the future and make the `typehints.` prefix explicit. FYI, the conversion from typing.Tuple to a typehints.Tuple constraint is taken care of here: https://github.com/apache/beam/pull/10042/files#diff-9b2eac354738047a44814d4b429cR245 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361263) Time Spent: 2h 10m (was: 2h) > IOTypehints.from_callable doesn't convert native type hints to Beam > --- > > Key: BEAM-8269 > URL: https://issues.apache.org/jira/browse/BEAM-8269 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Users typically write type hints using typing module types. We should allow > that, be internally convert these type to Beam module types for now. > In the future, Beam should stop using these internal types (BEAM-8156). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam
[ https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361260 ] ASF GitHub Bot logged work on BEAM-8269: Author: ASF GitHub Bot Created on: 18/Dec/19 01:55 Start Date: 18/Dec/19 01:55 Worklog Time Spent: 10m Work Description: udim commented on issue #9602: [BEAM-8269] Convert Py3 type hints to Beam types URL: https://github.com/apache/beam/pull/9602#issuecomment-566831210 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361260) Time Spent: 2h (was: 1h 50m) > IOTypehints.from_callable doesn't convert native type hints to Beam > --- > > Key: BEAM-8269 > URL: https://issues.apache.org/jira/browse/BEAM-8269 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Users typically write type hints using typing module types. We should allow > that, be internally convert these type to Beam module types for now. > In the future, Beam should stop using these internal types (BEAM-8156). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts
[ https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361259 ] ASF GitHub Bot logged work on BEAM-8481: Author: ASF GitHub Bot Created on: 18/Dec/19 01:53 Start Date: 18/Dec/19 01:53 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a race condition in proto stubs generation. URL: https://github.com/apache/beam/pull/10378#issuecomment-566830870 New errors: https://issues.apache.org/jira/browse/BEAM-8988. Overall, postcommits are running and no longer timing out. I suggest we merge this PR since it will help reduce flakiness, which in turn will help verify fixes for BEAM-8988. @udim could you please review and/or merge if that sounds good to you? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361259) Time Spent: 4h 10m (was: 4h) > Python 3.7 Postcommit test -- frequent timeouts > --- > > Key: BEAM-8481 > URL: https://issues.apache.org/jira/browse/BEAM-8481 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Critical > Time Spent: 4h 10m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite > seemingly frequently timing out. Other suites are not affected by these > timeouts. From the history, the issues started before Oct 10 and we cannot > pinpoint because history is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=361256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361256 ] ASF GitHub Bot logged work on BEAM-1440: Author: ASF GitHub Bot Created on: 18/Dec/19 01:51 Start Date: 18/Dec/19 01:51 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #9772: [BEAM-1440] Create a BigQuery source that implements iobase.BoundedSource for Python URL: https://github.com/apache/beam/pull/9772#issuecomment-566830300 Postcommits tests are failing with this change: https://issues.apache.org/jira/browse/BEAM-8988. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361256) Time Spent: 19.5h (was: 19h 20m) > Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK > -- > > Key: BEAM-1440 > URL: https://issues.apache.org/jira/browse/BEAM-1440 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 19.5h > Remaining Estimate: 0h > > Currently we have a BigQuery native source for Python SDK [1]. > This can only be used by Dataflow runner. > We should implement a Beam BigQuery source that implements > iobase.BoundedSource [2] interface so that other runners that try to use > Python SDK can read from BigQuery as well. Java SDK already has a Beam > BigQuery source [3]. > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70 > [3] > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361258 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 18/Dec/19 01:51 Start Date: 18/Dec/19 01:51 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405#discussion_r359116365 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to build and run background caching job. + +For internal use only; no backwards-compatibility guarantees. + +A background caching job is a job that caches events for all unbounded sources +of a given pipeline. With Interactive Beam, one such job is started when a +pipeline run happens (which produces a main job in contrast to the background +caching job) and meets the following conditions: + + #. The pipeline contains unbounded sources. + #. No such background job is running. + #. No such background job has completed successfully and the cached events are + still valid (invalidated when unbounded sources change in the pipeline). + +Once started, the background caching job runs asynchronously until it hits some +cache size limit. Meanwhile, the main job and future main jobs from the pipeline +will run using the deterministic replay-able cached events until they are +invalidated. +""" + +from __future__ import absolute_import + +import apache_beam as beam +from apache_beam import runners +from apache_beam.runners.interactive import interactive_environment as ie + + +def attempt_to_run_background_caching_job(runner, user_pipeline, options=None): + """Attempts to run a background caching job for a user-defined pipeline. + + If a background caching job is started, return the pipeline result. Otherwise, + return None. + The pipeline result is automatically tracked by Interactive Beam in case + future cancellation/cleanup is needed. + """ + if is_background_caching_job_needed(user_pipeline): +# Cancel non-terminal jobs if there is any before starting a new one. +attempt_to_cancel_background_caching_job(user_pipeline) +# TODO(BEAM-8335): refactor background caching job logic from +# pipeline_instrument module to this module and aggregate tests. +from apache_beam.runners.interactive import pipeline_instrument as instr +runner_pipeline = beam.pipeline.Pipeline.from_runner_api( +user_pipeline.to_runner_api(use_fake_coders=True), +runner, +options) +background_caching_job_result = beam.pipeline.Pipeline.from_runner_api( +instr.pin(runner_pipeline).background_caching_pipeline_proto(), +runner, +options).run() +ie.current_env().set_pipeline_result(user_pipeline, + background_caching_job_result, + is_main_job=False) + +def is_background_caching_job_needed(user_pipeline): + """Determines if a background caching job needs to be started.""" + background_caching_job_result = ie.current_env().pipeline_result( + user_pipeline, is_main_job=False) + from apache_beam.runners.interactive import pipeline_instrument as instr + # Checks if the pipeline contains unbounded sources. + return (instr.has_unbounded_sources(user_pipeline) and Review comment: This may not be in the scope of this PR, but we also need to let the user specify their own sources to be cached, because the user may have a large bounded source that they want to capture a small segment of. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361258) Time
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361257 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 18/Dec/19 01:51 Start Date: 18/Dec/19 01:51 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405#discussion_r359117001 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Module to build and run background caching job. + +For internal use only; no backwards-compatibility guarantees. + +A background caching job is a job that caches events for all unbounded sources +of a given pipeline. With Interactive Beam, one such job is started when a +pipeline run happens (which produces a main job in contrast to the background +caching job) and meets the following conditions: + + #. The pipeline contains unbounded sources. + #. No such background job is running. + #. No such background job has completed successfully and the cached events are + still valid (invalidated when unbounded sources change in the pipeline). + +Once started, the background caching job runs asynchronously until it hits some +cache size limit. Meanwhile, the main job and future main jobs from the pipeline +will run using the deterministic replay-able cached events until they are +invalidated. +""" + +from __future__ import absolute_import + +import apache_beam as beam +from apache_beam import runners +from apache_beam.runners.interactive import interactive_environment as ie + + +def attempt_to_run_background_caching_job(runner, user_pipeline, options=None): + """Attempts to run a background caching job for a user-defined pipeline. + + If a background caching job is started, return the pipeline result. Otherwise, + return None. Review comment: From reading the code, this function doesn't return anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361257) Time Spent: 49h 10m (was: 49h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 49h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8902) parameterize input type of Java external transform
[ https://issues.apache.org/jira/browse/BEAM-8902?focusedWorklogId=361255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361255 ] ASF GitHub Bot logged work on BEAM-8902: Author: ASF GitHub Bot Created on: 18/Dec/19 01:51 Start Date: 18/Dec/19 01:51 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10305: [BEAM-8902] parameterize input type of Java external transform URL: https://github.com/apache/beam/pull/10305#discussion_r359117711 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java ## @@ -139,7 +139,7 @@ public void expandPythonTest() { testPipeline .apply(Create.of("1", "2", "2", "3", "3", "3")) .apply( - External.>of( + External., KV>of( Review comment: Yeah, we can create a variant of `of()` for the common case. Do you have any good naming suggestion? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361255) Time Spent: 1h 40m (was: 1.5h) > parameterize input type of Java external transform > -- > > Key: BEAM-8902 > URL: https://issues.apache.org/jira/browse/BEAM-8902 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > parameterize input type of Java external transform to allow multiple > pcollections like PCollectionTuple or PCollectionList. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read
[ https://issues.apache.org/jira/browse/BEAM-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998734#comment-16998734 ] Valentyn Tymofieiev commented on BEAM-8988: --- cc: [~pabloem] [~chamikara] > apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: > BigQuery source must be split before being read > --- > > Key: BEAM-8988 > URL: https://issues.apache.org/jira/browse/BEAM-8988 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Valentyn Tymofieiev >Assignee: Kamil Wasilewski >Priority: Critical > > Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/ > Triggered by https://github.com/apache/beam/pull/9772. > Stacktrace: > {noformat} > Pipeline > BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 > failed in state FAILED: java.lang.RuntimeException: Error received from SDK > harness for instruction 96: Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 879, in process > return self.do_fn_invoker.invoke_process(windowed_value) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 669, in invoke_process > windowed_value, additional_args, additional_kwargs, output_processor) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 747, in _invoke_process_per_window > windowed_value, self.process_method(*args_for_process)) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 998, in process_outputs > for result in results: > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py", > line 1256, in process > yield element, self.restriction_provider.initial_restriction(element) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py", > line 1518, in initial_restriction > range_tracker = self._source.get_range_tracker(None, None) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py", > line 652, in get_range_tracker > raise NotImplementedError('BigQuery source must be split before being > read') > NotImplementedError: BigQuery source must be split before being read > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read
Valentyn Tymofieiev created BEAM-8988: - Summary: apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read Key: BEAM-8988 URL: https://issues.apache.org/jira/browse/BEAM-8988 Project: Beam Issue Type: Bug Components: io-py-gcp Reporter: Valentyn Tymofieiev Assignee: Kamil Wasilewski Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/ Triggered by https://github.com/apache/beam/pull/9772. Stacktrace: {noformat} Pipeline BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 failed in state FAILED: java.lang.RuntimeException: Error received from SDK harness for instruction 96: Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", line 879, in process return self.do_fn_invoker.invoke_process(windowed_value) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", line 669, in invoke_process windowed_value, additional_args, additional_kwargs, output_processor) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", line 747, in _invoke_process_per_window windowed_value, self.process_method(*args_for_process)) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", line 998, in process_outputs for result in results: File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 1256, in process yield element, self.restriction_provider.initial_restriction(element) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py", line 1518, in initial_restriction range_tracker = self._source.get_range_tracker(None, None) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py", line 652, in get_range_tracker raise NotImplementedError('BigQuery source must be split before being read') NotImplementedError: BigQuery source must be split before being read {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7991) gradle cleanPython race
[ https://issues.apache.org/jira/browse/BEAM-7991?focusedWorklogId=361253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361253 ] ASF GitHub Bot logged work on BEAM-7991: Author: ASF GitHub Bot Created on: 18/Dec/19 01:42 Start Date: 18/Dec/19 01:42 Worklog Time Spent: 10m Work Description: udim commented on issue #10388: [BEAM-7991] Fix cleanPython race with :clean URL: https://github.com/apache/beam/pull/10388#issuecomment-566828423 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361253) Time Spent: 0.5h (was: 20m) > gradle cleanPython race > --- > > Key: BEAM-7991 > URL: https://issues.apache.org/jira/browse/BEAM-7991 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Under sdks/python run: > {code} > $ ../../gradlew setupVirtualenv > $ ../../gradlew clean > {code} > And you should get with high probability errors about missing modules. > Running this gives no errors: > {code} > $ ../../gradlew setupVirtualenv > $ ../../gradlew clean --no-parallel > {code} > But notice that setup.py is not called in the second example, meaning that > some other task has already wiped out the build/ directory and the > virtualenvs in it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361251 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 18/Dec/19 01:33 Start Date: 18/Dec/19 01:33 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405#discussion_r359114098 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py ## @@ -125,6 +125,30 @@ def apply(self, transform, pvalueish, options): def run_pipeline(self, pipeline, options): pipeline_instrument = inst.pin(pipeline, options) +# The user_pipeline analyzed might be None if the pipeline given has nothing +# to be cached and tracing back to the user defined pipeline is impossible. +# When it's None, there is no need to cache including the background +# caching job and no result to track since no background caching job is +# started at all. +user_pipeline = pipeline_instrument.user_pipeline + +background_caching_job_result = ie.current_env().pipeline_result( +user_pipeline, is_main_job=False) +if (not background_caching_job_result or Review comment: Refactored this logic into a separate module that can be invoked here and in outer scopes such as `show`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361251) Time Spent: 49h (was: 48h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 49h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361244 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 18/Dec/19 01:20 Start Date: 18/Dec/19 01:20 Worklog Time Spent: 10m Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#issuecomment-566823427 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361244) Time Spent: 10h 20m (was: 10h 10m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361245 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 18/Dec/19 01:20 Start Date: 18/Dec/19 01:20 Worklog Time Spent: 10m Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#issuecomment-566699073 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361245) Time Spent: 10.5h (was: 10h 20m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam
[ https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361238 ] ASF GitHub Bot logged work on BEAM-8269: Author: ASF GitHub Bot Created on: 18/Dec/19 01:11 Start Date: 18/Dec/19 01:11 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9602: [BEAM-8269] Convert Py3 type hints to Beam types URL: https://github.com/apache/beam/pull/9602#issuecomment-566821483 BTW, I'm not able to reproduce the error, in 2.7 or 3.6, with or without cython. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361238) Time Spent: 1h 50m (was: 1h 40m) > IOTypehints.from_callable doesn't convert native type hints to Beam > --- > > Key: BEAM-8269 > URL: https://issues.apache.org/jira/browse/BEAM-8269 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Users typically write type hints using typing module types. We should allow > that, be internally convert these type to Beam module types for now. > In the future, Beam should stop using these internal types (BEAM-8156). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361236 ] ASF GitHub Bot logged work on BEAM-8944: Author: ASF GitHub Bot Created on: 18/Dec/19 01:00 Start Date: 18/Dec/19 01:00 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to use single thread in py sdk bundle progress report URL: https://github.com/apache/beam/pull/10387#issuecomment-566819037 > > > Would it be better to have the runner request progress less frequently? > > > > > > I think that helps too. I believe right now JRH requests every 0.1 sec. Not exactly sure how the frequency is picked. > > 0.1 secs is a lot and doesn't seem right. I think this is where it is set, we can try to tune that down. https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java#L296 Single thread should be good enough for progress report in python sdk harness and shouldn't incur stuckness issues, and it could also help limiting its impact on the bundle processing critical path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361236) Time Spent: 2h 40m (was: 2.5h) > Python SDK harness performance degradation with UnboundedThreadPoolExecutor > --- > > Key: BEAM-8944 > URL: https://issues.apache.org/jira/browse/BEAM-8944 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.18.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Blocker > Attachments: profiling.png, profiling_one_thread.png, > profiling_twelve_threads.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We are seeing a performance degradation for python streaming word count load > tests. > > After some investigation, it appears to be caused by swapping the original > ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is > that python performance is worse with more threads on cpu-bounded tasks. > > A simple test for comparing the multiple thread pool executor performance: > > {code:python} > def test_performance(self): > def run_perf(executor): > total_number = 100 > q = queue.Queue() > def task(number): > hash(number) > q.put(number + 200) > return number > t = time.time() > count = 0 > for i in range(200): > q.put(i) > while count < total_number: > executor.submit(task, q.get(block=True)) > count += 1 > print('%s uses %s' % (executor, time.time() - t)) > with UnboundedThreadPoolExecutor() as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=1) as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=12) as executor: > run_perf(executor) > {code} > Results: > 0x7fab400dbe50> uses 268.160675049 > uses > 79.904583931 > uses > 191.179054976 > ``` > Profiling: > UnboundedThreadPoolExecutor: > !profiling.png! > 1 Thread ThreadPoolExecutor: > !profiling_one_thread.png! > 12 Threads ThreadPoolExecutor: > !profiling_twelve_threads.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361235 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:50 Start Date: 18/Dec/19 00:50 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359103959 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/ThriftIdlParser.java ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser; + +import java.io.IOException; +import java.io.Reader; +import org.antlr.runtime.ANTLRReaderStream; +import org.antlr.runtime.CommonTokenStream; +import org.antlr.runtime.RecognitionException; +import org.antlr.runtime.tree.BufferedTreeNodeStream; +import org.antlr.runtime.tree.Tree; +import org.antlr.runtime.tree.TreeNodeStream; +import org.apache.beam.sdk.io.thrift.parser.antlr.DocumentGenerator; +import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftLexer; +import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftParser; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class ThriftIdlParser { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIdlParser.class); + + /** Generates {@link Document} from {@link org.antlr.runtime.tree.Tree}. */ + public static Document parseThriftIdl(CharSource input) throws IOException { +Tree tree = parseTree(input); +TreeNodeStream stream = new BufferedTreeNodeStream(tree); +DocumentGenerator generator = new DocumentGenerator(stream); +try { + return generator.document().value; +} catch (RecognitionException e) { + LOG.error("Failed to generate document: " + e.getMessage()); + throw new RuntimeException(e); +} + } + + /** Generates {@link org.antlr.runtime.tree.Tree} from input. */ + static Tree parseTree(CharSource input) throws IOException { +try (Reader reader = input.openStream()) { + ThriftLexer lexer = new ThriftLexer(new ANTLRReaderStream(reader)); + ThriftParser parser = new ThriftParser(new CommonTokenStream(lexer)); + try { +Tree tree = (Tree) parser.document().getTree(); +if (parser.getNumberOfSyntaxErrors() > 0) { + LOG.error("Parsing generated " + parser.getNumberOfSyntaxErrors() + "errors."); + throw new RuntimeException("syntax error"); Review comment: That is covered in the `catch` below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361235) Time Spent: 8h 10m (was: 8h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361233 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:42 Start Date: 18/Dec/19 00:42 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364 @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for parser has been opened. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361233) Time Spent: 8h (was: 7h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361232 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:40 Start Date: 18/Dec/19 00:40 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359101649 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361231 ] ASF GitHub Bot logged work on BEAM-8944: Author: ASF GitHub Bot Created on: 18/Dec/19 00:34 Start Date: 18/Dec/19 00:34 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10387: [BEAM-8944] Change to use single thread in py sdk bundle progress report URL: https://github.com/apache/beam/pull/10387#issuecomment-566812644 > > Would it be better to have the runner request progress less frequently? > > I think that helps too. I believe right now JRH requests every 0.1 sec. Not exactly sure how the frequency is picked. 0.1 secs is a lot and doesn't seem right. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361231) Time Spent: 2.5h (was: 2h 20m) > Python SDK harness performance degradation with UnboundedThreadPoolExecutor > --- > > Key: BEAM-8944 > URL: https://issues.apache.org/jira/browse/BEAM-8944 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.18.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Blocker > Attachments: profiling.png, profiling_one_thread.png, > profiling_twelve_threads.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > We are seeing a performance degradation for python streaming word count load > tests. > > After some investigation, it appears to be caused by swapping the original > ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is > that python performance is worse with more threads on cpu-bounded tasks. > > A simple test for comparing the multiple thread pool executor performance: > > {code:python} > def test_performance(self): > def run_perf(executor): > total_number = 100 > q = queue.Queue() > def task(number): > hash(number) > q.put(number + 200) > return number > t = time.time() > count = 0 > for i in range(200): > q.put(i) > while count < total_number: > executor.submit(task, q.get(block=True)) > count += 1 > print('%s uses %s' % (executor, time.time() - t)) > with UnboundedThreadPoolExecutor() as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=1) as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=12) as executor: > run_perf(executor) > {code} > Results: > 0x7fab400dbe50> uses 268.160675049 > uses > 79.904583931 > uses > 191.179054976 > ``` > Profiling: > UnboundedThreadPoolExecutor: > !profiling.png! > 1 Thread ThreadPoolExecutor: > !profiling_one_thread.png! > 12 Threads ThreadPoolExecutor: > !profiling_twelve_threads.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361230 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 18/Dec/19 00:29 Start Date: 18/Dec/19 00:29 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364 @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for parser has been opened This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361230) Time Spent: 7h 40m (was: 7.5h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada reassigned BEAM-6860: --- Assignee: Chamikara Madhusanka Jayalath (was: Pawel Kordek) > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Labels: newbie > Time Spent: 40m > Remaining Estimate: 0h > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 1206, in process_bundle > result_future = self._controller.control_handler.push(process_bundle) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 821, in push > response = self.worker.do_instruction(request) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 265, in do_instruction > request.instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 281, in process_bundle > delayed_applications = bundle_processor.process_bundle(instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 552, in process_bundle > op.finish() > File "apache_beam/runners/worker/operations.py", line 549, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 550, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 551, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/common.py", line 758, in > apache_beam.runners.common.DoFnRunner.finish > File "apache_beam/runners/common.py", line 752, in > apache_beam.runners.common.DoFnRunner._invoke_bundle_method > File "apache_beam/runners/common.py", line 777, in >
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=361227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361227 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 18/Dec/19 00:25 Start Date: 18/Dec/19 00:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10175: [BEAM-7390] Add code snippet for Max URL: https://github.com/apache/beam/pull/10175 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361227) Time Spent: 8h 10m (was: 8h) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 8h 10m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361210 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 23:34 Start Date: 17/Dec/19 23:34 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359085087 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,708 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static java.lang.String.format; +import static java.util.stream.Collectors.joining; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.Closeable; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; +import javax.annotation.Nullable; +import org.apache.beam.sdk.coders.StringUtf8Coder; +import org.apache.beam.sdk.io.Compression; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser; +import org.apache.beam.sdk.io.thrift.parser.model.BaseType; +import org.apache.beam.sdk.io.thrift.parser.model.Const; +import org.apache.beam.sdk.io.thrift.parser.model.Definition; +import org.apache.beam.sdk.io.thrift.parser.model.Document; +import org.apache.beam.sdk.io.thrift.parser.model.Header; +import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum; +import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField; +import org.apache.beam.sdk.io.thrift.parser.model.ListType; +import org.apache.beam.sdk.io.thrift.parser.model.MapType; +import org.apache.beam.sdk.io.thrift.parser.model.Service; +import org.apache.beam.sdk.io.thrift.parser.model.StringEnum; +import org.apache.beam.sdk.io.thrift.parser.model.Struct; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftException; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftField; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod; +import org.apache.beam.sdk.io.thrift.parser.model.ThriftType; +import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation; +import org.apache.beam.sdk.io.thrift.parser.model.Typedef; +import org.apache.beam.sdk.io.thrift.parser.model.VoidType; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PDone; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing Thrift files. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#read} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection documents = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + *
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361203 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 23:26 Start Date: 17/Dec/19 23:26 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359082635 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361199 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 23:18 Start Date: 17/Dec/19 23:18 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566793062 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361199) Time Spent: 1h 20m (was: 1h 10m) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361198 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 17/Dec/19 23:15 Start Date: 17/Dec/19 23:15 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on pull request #10159: [BEAM-8575] Added a unit test to CombineTest class to test that Combi… URL: https://github.com/apache/beam/pull/10159#discussion_r359079322 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +398,18 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_hot_key_combining_with_accumulation_mode(self): +with TestPipeline() as p: + result = (p +| beam.Create([1, 2, 3, 4, 5]) +| beam.WindowInto(GlobalWindows(), + trigger=Repeatedly(AfterCount(1)), + accumulation_mode= + AccumulationMode.ACCUMULATING) +| beam.CombineGlobally(sum).without_defaults().with_fanout(2)) Review comment: If delete without_defaults(), we get this error: ValueError: PCollection of size 2 with more than one element accessed as a singleton view. First two elements encountered are "1", "3". [while running 'CombineGlobally(sum, fanout=2)/InjectDefault'] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361198) Time Spent: 37.5h (was: 37h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 37.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API
[ https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=361197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361197 ] ASF GitHub Bot logged work on BEAM-8932: Author: ASF GitHub Bot Created on: 17/Dec/19 23:14 Start Date: 17/Dec/19 23:14 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10331: [BEAM-8932] Modify PubsubClient to use the proto message throughout. URL: https://github.com/apache/beam/pull/10331#issuecomment-566791993 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361197) Time Spent: 1h 40m (was: 1.5h) > Expose complete Cloud Pub/Sub messages through PubsubIO API > --- > > Key: BEAM-8932 > URL: https://issues.apache.org/jira/browse/BEAM-8932 > Project: Beam > Issue Type: Bug > Components: beam-model >Reporter: Daniel Collins >Assignee: Daniel Collins >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The PubsubIO API only exposes a subset of the fields in the underlying > PubsubMessage protocol buffer. To accomodate future feature changes as well > as for greater compatability with code using the Cloud Pub/Sub apis, a method > to read and write these protocol messages should be exposed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361193 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 23:05 Start Date: 17/Dec/19 23:05 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#discussion_r359076289 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py ## @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self): h.stop() @patch('apache_beam.runners.interactive.display.pcoll_visualization' - '.PCollectionVisualization.display_facets') + '.PCollectionVisualization._display_dive') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_overview') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_dataframe') def test_dynamic_plotting_update_same_display(self, -mocked_display_facets): +mocked_display_dataframe, Review comment: Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361193) Time Spent: 1h 10m (was: 1h) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361191 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 23:03 Start Date: 17/Dec/19 23:03 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#discussion_r359075735 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py ## @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self): h.stop() @patch('apache_beam.runners.interactive.display.pcoll_visualization' - '.PCollectionVisualization.display_facets') + '.PCollectionVisualization._display_dive') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_overview') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_dataframe') def test_dynamic_plotting_update_same_display(self, -mocked_display_facets): +mocked_display_dataframe, Review comment: SG - once you push the commit and tests pass we can merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361191) Time Spent: 1h (was: 50m) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts
[ https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361189 ] ASF GitHub Bot logged work on BEAM-8481: Author: ASF GitHub Bot Created on: 17/Dec/19 23:01 Start Date: 17/Dec/19 23:01 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a race condition in proto stubs generation. URL: https://github.com/apache/beam/pull/10378#issuecomment-566788097 [BEAM-8979] Remove mypy-protobuf dependency This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361189) Time Spent: 3h 50m (was: 3h 40m) > Python 3.7 Postcommit test -- frequent timeouts > --- > > Key: BEAM-8481 > URL: https://issues.apache.org/jira/browse/BEAM-8481 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Critical > Time Spent: 3h 50m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite > seemingly frequently timing out. Other suites are not affected by these > timeouts. From the history, the issues started before Oct 10 and we cannot > pinpoint because history is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=361188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361188 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 17/Dec/19 23:01 Start Date: 17/Dec/19 23:01 Worklog Time Spent: 10m Work Description: youngoli commented on issue #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#issuecomment-566788122 Heads up, in #10393 I updated the Go proto compiler version and rebuilt the generated code.You'll probably need to update the generated Go code before merging this (and make sure you update the protoc-gen-go version to the most recent one this time). If it says the generated code is using ProtoVersion3 instead of 2, then your change is good to go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361188) Time Spent: 2h 40m (was: 2.5h) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts
[ https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361187 ] ASF GitHub Bot logged work on BEAM-8481: Author: ASF GitHub Bot Created on: 17/Dec/19 23:01 Start Date: 17/Dec/19 23:01 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a race condition in proto stubs generation. URL: https://github.com/apache/beam/pull/10378#issuecomment-566788097 [BEAM-8979] Remove mypy-protobuf dependency This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361187) Time Spent: 3h 40m (was: 3.5h) > Python 3.7 Postcommit test -- frequent timeouts > --- > > Key: BEAM-8481 > URL: https://issues.apache.org/jira/browse/BEAM-8481 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Critical > Time Spent: 3h 40m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite > seemingly frequently timing out. Other suites are not affected by these > timeouts. From the history, the issues started before Oct 10 and we cannot > pinpoint because history is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts
[ https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361190 ] ASF GitHub Bot logged work on BEAM-8481: Author: ASF GitHub Bot Created on: 17/Dec/19 23:01 Start Date: 17/Dec/19 23:01 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a race condition in proto stubs generation. URL: https://github.com/apache/beam/pull/10378#issuecomment-566788153 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361190) Time Spent: 4h (was: 3h 50m) > Python 3.7 Postcommit test -- frequent timeouts > --- > > Key: BEAM-8481 > URL: https://issues.apache.org/jira/browse/BEAM-8481 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Critical > Time Spent: 4h > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite > seemingly frequently timing out. Other suites are not affected by these > timeouts. From the history, the issues started before Oct 10 and we cannot > pinpoint because history is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=361185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361185 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 17/Dec/19 23:00 Start Date: 17/Dec/19 23:00 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10400: [BEAM-8979] Remove mypy-protobuf dependency URL: https://github.com/apache/beam/pull/10400#issuecomment-566787938 Thanks, @kamilwu and @chadrik . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361185) Time Spent: 1h 40m (was: 1.5h) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=361184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361184 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 17/Dec/19 23:00 Start Date: 17/Dec/19 23:00 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10400: [BEAM-8979] Remove mypy-protobuf dependency URL: https://github.com/apache/beam/pull/10400 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361184) Time Spent: 1.5h (was: 1h 20m) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce this error locally. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts
[ https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361183 ] ASF GitHub Bot logged work on BEAM-8481: Author: ASF GitHub Bot Created on: 17/Dec/19 22:56 Start Date: 17/Dec/19 22:56 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a race condition in proto stubs generation. URL: https://github.com/apache/beam/pull/10378#issuecomment-566786851 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361183) Time Spent: 3.5h (was: 3h 20m) > Python 3.7 Postcommit test -- frequent timeouts > --- > > Key: BEAM-8481 > URL: https://issues.apache.org/jira/browse/BEAM-8481 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Ahmet Altay >Assignee: Valentyn Tymofieiev >Priority: Critical > Time Spent: 3.5h > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite > seemingly frequently timing out. Other suites are not affected by these > timeouts. From the history, the issues started before Oct 10 and we cannot > pinpoint because history is lost. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=361178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361178 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 17/Dec/19 22:49 Start Date: 17/Dec/19 22:49 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer a Beam Schema from a protocol buffer class. URL: https://github.com/apache/beam/pull/10356#issuecomment-566784391 Friendly ping. @alexvanboxel do you have any thoughts on this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361178) Time Spent: 17h 20m (was: 17h 10m) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Time Spent: 17h 20m > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361175 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 17/Dec/19 22:45 Start Date: 17/Dec/19 22:45 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405#discussion_r359069339 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py ## @@ -125,6 +125,30 @@ def apply(self, transform, pvalueish, options): def run_pipeline(self, pipeline, options): pipeline_instrument = inst.pin(pipeline, options) +# The user_pipeline analyzed might be None if the pipeline given has nothing +# to be cached and tracing back to the user defined pipeline is impossible. +# When it's None, there is no need to cache including the background +# caching job and no result to track since no background caching job is +# started at all. +user_pipeline = pipeline_instrument.user_pipeline + +background_caching_job_result = ie.current_env().pipeline_result( +user_pipeline, is_main_job=False) +if (not background_caching_job_result or Review comment: Discussed offline. This block of code is a bit hard to read. Ning will do a minor refactor to make this clearer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361175) Time Spent: 48h 50m (was: 48h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 48h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361176 ] ASF GitHub Bot logged work on BEAM-8933: Author: ASF GitHub Bot Created on: 17/Dec/19 22:45 Start Date: 17/Dec/19 22:45 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10384: [WIP] [BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as Rows URL: https://github.com/apache/beam/pull/10384#issuecomment-566783361 Yes exactly as your last proposal, we already did that for ZetaSQL in the past too (making it an extension) that is a dep of GCP IO. Thanks for understanding my point. As an extra point a module module for Arrow related code may end up benefiting other IOs without the tradeoff of leaking too many things into core. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361176) Time Spent: 3.5h (was: 3h 20m) > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.
[ https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361174 ] ASF GitHub Bot logged work on BEAM-8960: Author: ASF GitHub Bot Created on: 17/Dec/19 22:44 Start Date: 17/Dec/19 22:44 Worklog Time Spent: 10m Work Description: yirutang commented on issue #10408: [BEAM-8960]: Add an option for user to opt out of using insert id for… URL: https://github.com/apache/beam/pull/10408#issuecomment-566782511 R: @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361174) Remaining Estimate: 23.5h (was: 23h 40m) Time Spent: 0.5h (was: 20m) > Add an option for user to be able to opt out of using insert id for BigQuery > streaming insert. > -- > > Key: BEAM-8960 > URL: https://issues.apache.org/jira/browse/BEAM-8960 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Yiru Tang >Priority: Minor > Original Estimate: 24h > Time Spent: 0.5h > Remaining Estimate: 23.5h > > BigQuery streaming insert id offers best effort insert deduplication. If user > choose to opt out of using insert ids, they could potentially to be opt into > using our current new streaming backend which gives higher speed and more > quota. Insert id deduplication is best effort and doesn't have ultimate just > once guarantees. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.
[ https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361172 ] ASF GitHub Bot logged work on BEAM-8960: Author: ASF GitHub Bot Created on: 17/Dec/19 22:43 Start Date: 17/Dec/19 22:43 Worklog Time Spent: 10m Work Description: yirutang commented on issue #10408: [BEAM-8960]: Add an option for user to opt out of using insert id for… URL: https://github.com/apache/beam/pull/10408#issuecomment-566782511 R:reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361172) Remaining Estimate: 23h 40m (was: 23h 50m) Time Spent: 20m (was: 10m) > Add an option for user to be able to opt out of using insert id for BigQuery > streaming insert. > -- > > Key: BEAM-8960 > URL: https://issues.apache.org/jira/browse/BEAM-8960 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Yiru Tang >Priority: Minor > Original Estimate: 24h > Time Spent: 20m > Remaining Estimate: 23h 40m > > BigQuery streaming insert id offers best effort insert deduplication. If user > choose to opt out of using insert ids, they could potentially to be opt into > using our current new streaming backend which gives higher speed and more > quota. Insert id deduplication is best effort and doesn't have ultimate just > once guarantees. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361171 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 17/Dec/19 22:43 Start Date: 17/Dec/19 22:43 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r359068383 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java ## @@ -159,14 +158,42 @@ BeamCoGBKJoinRule.INSTANCE, BeamSideInputLookupJoinRule.INSTANCE); + private static final List BEAM_CONVERTERS_CALCITE_SQL_ONLY = + ImmutableList.of(BeamCalcRule.INSTANCE); + private static final List BEAM_TO_ENUMERABLE = ImmutableList.of(BeamEnumerableConverterRule.INSTANCE); public static RuleSet[] getRuleSets() { return new RuleSet[] { RuleSets.ofList( ImmutableList.builder() - .addAll(BEAM_CONVERTERS) + .addAll(BEAM_CONVERTERS_COMMON) + .addAll(BEAM_CONVERTERS_CALCITE_SQL_ONLY) + .addAll(BEAM_TO_ENUMERABLE) + .addAll(LOGICAL_OPTIMIZATIONS) + .build()) +}; + } + + /** Returns the rule sets that allow using ZetaSQL evaluator for Calc. */ Review comment: Done. Thanks for the suggestion! That make the code cleaner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361171) Time Spent: 1h (was: 50m) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.
[ https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361165 ] ASF GitHub Bot logged work on BEAM-8960: Author: ASF GitHub Bot Created on: 17/Dec/19 22:42 Start Date: 17/Dec/19 22:42 Worklog Time Spent: 10m Work Description: yirutang commented on pull request #10408: [BEAM-8960]: Add an option for user to opt out of using insert id for… URL: https://github.com/apache/beam/pull/10408 Expose an option so that user can opt out of using insert id while streaming into BigQuery. Insert id only guarantees best effort insert rows deduplication, without it, user will be able to opt into using new streaming backend with higher quotas and reliabilities. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361165) Remaining Estimate: 23h 50m (was: 24h) Time Spent: 10m > Add an option for user to be able to opt out of using insert id for BigQuery > streaming insert. > -- > > Key: BEAM-8960 > URL: https://issues.apache.org/jira/browse/BEAM-8960 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Yiru Tang >Priority: Minor > Original Estimate: 24h > Time Spent: 10m > Remaining Estimate: 23h 50m > > BigQuery streaming insert id offers best effort insert deduplication. If user > choose to opt out of using insert ids, they could potentially to be opt into > using our current new streaming backend which gives higher speed and more > quota. Insert id deduplication is best effort and doesn't have ultimate just > once guarantees. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361166 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 17/Dec/19 22:42 Start Date: 17/Dec/19 22:42 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r359068139 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java ## @@ -3755,6 +3755,8 @@ private void initializeCalciteEnvironmentWithContext(Context... extraContext) { .defaultSchema(defaultSchemaPlus) .traitDefs(traitDefs) .context(Contexts.of(contexts)) +// TODO[BEAM-8630]: change to BeamRuleSets.getZetaSqlRuleSets() Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361166) Time Spent: 40m (was: 0.5h) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361167 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 17/Dec/19 22:42 Start Date: 17/Dec/19 22:42 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r359068178 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPushDownTest.java ## @@ -187,6 +187,8 @@ private static void initializeCalciteEnvironmentWithContext(Context... extraCont .defaultSchema(defaultSchemaPlus) .traitDefs(traitDefs) .context(Contexts.of(contexts)) +// TODO[BEAM-8630]: change to BeamRuleSets.getZetaSqlRuleSets() Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361167) Time Spent: 50m (was: 40m) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361160 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 22:26 Start Date: 17/Dec/19 22:26 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#discussion_r359061791 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py ## @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self): h.stop() @patch('apache_beam.runners.interactive.display.pcoll_visualization' - '.PCollectionVisualization.display_facets') + '.PCollectionVisualization._display_dive') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_overview') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_dataframe') def test_dynamic_plotting_update_same_display(self, -mocked_display_facets): +mocked_display_dataframe, Review comment: Yes, we are verifying that the same display is being updated. Renaming the test to `test_dynamic_plotting_updates_same_display`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361160) Time Spent: 50m (was: 40m) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361151 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 22:00 Start Date: 17/Dec/19 22:00 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566768401 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361151) Time Spent: 1h 10m (was: 1h) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8296) Containerize the Spark job server
[ https://issues.apache.org/jira/browse/BEAM-8296?focusedWorklogId=361149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361149 ] ASF GitHub Bot logged work on BEAM-8296: Author: ASF GitHub Bot Created on: 17/Dec/19 21:58 Start Date: 17/Dec/19 21:58 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10407: [BEAM-8296] containerize spark job server URL: https://github.com/apache/beam/pull/10407 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361137 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 21:45 Start Date: 17/Dec/19 21:45 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359045633 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.util.List; +import java.util.Objects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; Review comment: For the Beam modules the dependencies are defined in the build.gradle for the respective module. We import the vendored Guava version there and then import that into the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361137) Time Spent: 7h 10m (was: 7h) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 7h 10m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361136 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 21:44 Start Date: 17/Dec/19 21:44 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#discussion_r359044800 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py ## @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self): h.stop() @patch('apache_beam.runners.interactive.display.pcoll_visualization' - '.PCollectionVisualization.display_facets') + '.PCollectionVisualization._display_dive') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_overview') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_dataframe') def test_dynamic_plotting_update_same_display(self, -mocked_display_facets): +mocked_display_dataframe, Review comment: Are we verifying that the same display is being updated? If so consider `s/update/updates` in `test_dynamic_plotting_update_same_display`, if not consider a name that communicates the expected behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361136) Time Spent: 40m (was: 0.5h) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361135 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 21:43 Start Date: 17/Dec/19 21:43 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#discussion_r359044800 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py ## @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self): h.stop() @patch('apache_beam.runners.interactive.display.pcoll_visualization' - '.PCollectionVisualization.display_facets') + '.PCollectionVisualization._display_dive') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_overview') + @patch('apache_beam.runners.interactive.display.pcoll_visualization' + '.PCollectionVisualization._display_dataframe') def test_dynamic_plotting_update_same_display(self, -mocked_display_facets): +mocked_display_dataframe, Review comment: Are we verifying that the same display is being updated? If so consider s/update/updates, if not consider a name that communicates the expected behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361135) Time Spent: 0.5h (was: 20m) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361132 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 21:40 Start Date: 17/Dec/19 21:40 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r359043453 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java ## @@ -0,0 +1,424 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift.parser.model; + +import static java.util.Collections.emptyList; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.GenericRecordBuilder; +import org.apache.avro.reflect.ReflectData; +import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * The {@link Document} class holds the elements of a Thrift file. + * + * A {@link Document} is made up of: + * + * + * {@link Header} - Contains: includes, cppIncludes, namespaces, and defaultNamespace. + * {@link Document#definitions} - Contains list of Thrift {@link Definition}. + * + */ +public class Document implements Serializable { + private Header header; + private List definitions; + + public Document(Header header, List definitions) { +this.header = checkNotNull(header, "header"); +this.definitions = ImmutableList.copyOf(checkNotNull(definitions, "definitions")); + } + + /** Returns an empty {@link Document}. */ + public static Document emptyDocument() { +List includes = emptyList(); +List cppIncludes = emptyList(); +String defaultNamespace = null; +Map namespaces = Collections.emptyMap(); +Header header = new Header(includes, cppIncludes, defaultNamespace, namespaces); +List definitions = emptyList(); +return new Document(header, definitions); + } + + public Document getDocument() { +return this; + } + + public Header getHeader() { +return this.header; + } + + public void setHeader(Header header) { +this.header = header; + } + + public List getDefinitions() { +return definitions; + } + + public void setDefinitions(List definitions) { +this.definitions = definitions; + } + + public void visit(final DocumentVisitor visitor) throws IOException { +Preconditions.checkNotNull(visitor, "the visitor must not be null!"); + +for (Definition definition : definitions) { + if (visitor.accept(definition)) { +definition.visit(visitor); + } +} + } + + /** Gets Avro {@link Schema} for the object. */ + public Schema getSchema() { +return ReflectData.get().getSchema(Document.class); + } + + /** Gets {@link Document} as a {@link GenericRecord}. */ + public GenericRecord getAsGenericRecord() { +GenericRecordBuilder genericRecordBuilder = new GenericRecordBuilder(this.getSchema()); +genericRecordBuilder.set("header", this.getHeader()).set("definitions", this.getDefinitions()); + +return genericRecordBuilder.build(); + } + + /** Adds list of includes to {@link Document#header}. */ + public void addIncludes(List includes) { +checkNotNull(includes, "includes"); +List currentIncludes = new ArrayList<>(this.getHeader().getIncludes()); +currentIncludes.addAll(includes); +
[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=361126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361126 ] ASF GitHub Bot logged work on BEAM-1440: Author: ASF GitHub Bot Created on: 17/Dec/19 21:30 Start Date: 17/Dec/19 21:30 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9772: [BEAM-1440] Create a BigQuery source that implements iobase.BoundedSource for Python URL: https://github.com/apache/beam/pull/9772 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361126) Time Spent: 19h 20m (was: 19h 10m) > Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK > -- > > Key: BEAM-1440 > URL: https://issues.apache.org/jira/browse/BEAM-1440 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 19h 20m > Remaining Estimate: 0h > > Currently we have a BigQuery native source for Python SDK [1]. > This can only be used by Dataflow runner. > We should implement a Beam BigQuery source that implements > iobase.BoundedSource [2] interface so that other runners that try to use > Python SDK can read from BigQuery as well. Java SDK already has a Beam > BigQuery source [3]. > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70 > [3] > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361124 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 21:25 Start Date: 17/Dec/19 21:25 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566755651 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361124) Time Spent: 1h (was: 50m) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361123 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 21:24 Start Date: 17/Dec/19 21:24 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361123) Time Spent: 50m (was: 40m) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361122 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 21:24 Start Date: 17/Dec/19 21:24 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361122) Time Spent: 40m (was: 0.5h) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO
[ https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361120 ] ASF GitHub Bot logged work on BEAM-8975: Author: ASF GitHub Bot Created on: 17/Dec/19 21:23 Start Date: 17/Dec/19 21:23 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add Thrift parser for ThriftIO URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250 Run Python PreCommit Run Portable_Python PreCommit Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361120) Time Spent: 0.5h (was: 20m) > Add Thrift Parser for ThriftIO > -- > > Key: BEAM-8975 > URL: https://issues.apache.org/jira/browse/BEAM-8975 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > This ticket is related to > [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues]. > As there are a large number of files to review for the > [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will > serve as the tracker for the submission of a PR relating to the parser and > document model that will be used by ThriftIO. The aim is to reduce the number > of files submitted with each PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-7065) Unable to use unlifted combine functions
[ https://issues.apache.org/jira/browse/BEAM-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Burke closed BEAM-7065. -- Fix Version/s: Not applicable Resolution: Information Provided Reading through the history, this just never got closed. > Unable to use unlifted combine functions > > > Key: BEAM-7065 > URL: https://issues.apache.org/jira/browse/BEAM-7065 > Project: Beam > Issue Type: Bug > Components: sdk-go >Affects Versions: Not applicable > Environment: Google Cloud Dataflow >Reporter: Alexandre Thenorio >Priority: Major > Fix For: Not applicable > > > I have tried running a simple example to calculate a running average or sum > using the `stats` package however it does not seems to work. > > Here's a reproducer > > {code:java} > package main > import ( > "context" > "encoding/json" > "flag" > "fmt" > "time" > "cloud.google.com/go/pubsub" > "github.com/apache/beam/sdks/go/pkg/beam" > "github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio" > "github.com/apache/beam/sdks/go/pkg/beam/log" > "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts" > "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats" > "github.com/apache/beam/sdks/go/pkg/beam/util/pubsubx" > "github.com/apache/beam/sdks/go/pkg/beam/x/beamx" > "github.com/apache/beam/sdks/go/pkg/beam/x/debug" > ) > var ( > input = flag.String("input", "iot-data", "Pubsub input topic.") > ) > type sensor struct { > name string > value int > } > var ( > data = []sensor{ > {name: "temperature", value: 24}, > {name: "humidity", value: 10}, > {name: "temperature", value: 20}, > {name: "temperature", value: 22}, > {name: "humidity", value: 14}, > {name: "humidity", value: 18}, > } > ) > func main() { > flag.Parse() > beam.Init() > ctx := context.Background() > project := gcpopts.GetProject(ctx) > log.Infof(ctx, "Publishing %v messages to: %v", len(data), *input) > defer pubsubx.CleanupTopic(ctx, project, *input) > sub, err := Publish(ctx, project, *input, data...) > if err != nil { > log.Fatal(ctx, err) > } > log.Infof(ctx, "Running streaming sensor data with subscription: %v", > sub.ID()) > p := beam.NewPipeline() > s := p.Root() > // Reads sensor data from pubsub > // Returns PCollection<[]byte> > col := pubsubio.Read(s, project, *input, > {Subscription: sub.ID()}) > // Transforms incoming bytes from pubsub to a string,int key value > // where the key is the sensor name and the value is the sensor reading > // Accepts PCollection<[]byte> > // Returns PCollection> > data := beam.ParDo(s, extractSensorData, col) > // Calculate running average per sensor > // > // Accpets PCollection> > // Returns PCollection> > sum := stats.MeanPerKey(s, data) > debug.Print(s, sum) > if err := beamx.Run(context.Background(), p); err != nil { > log.Exitf(ctx, "Failed to execute job: %v", err) > } > } > func extractSensorData(msg []byte) (string, int) { > ctx := context.Background() > data := {} > if err := json.Unmarshal(msg, data); err != nil { > log.Fatal(ctx, err) > } > return data.name, data.value > } > func Publish(ctx context.Context, project, topic string, messages ...sensor) > (*pubsub.Subscription, error) { > client, err := pubsub.NewClient(ctx, project) > if err != nil { > return nil, err > } > t, err := pubsubx.EnsureTopic(ctx, client, topic) > if err != nil { > return nil, err > } > sub, err := pubsubx.EnsureSubscription(ctx, client, topic, > fmt.Sprintf("%v.sub.%v", topic, time.Now().Unix())) > if err != nil { > return nil, err > } > for _, msg := range messages { > s := {} > bytes, err := json.Marshal(s) > if err != nil { > return nil, fmt.Errorf("failed to unmarshal '%v': %v", msg, err) > } > m := { > Data: ([]byte)(bytes), > // Attributes: ?? > } > id, err := t.Publish(ctx, m).Get(ctx) > if err != nil { > return nil, fmt.Errorf("failed to publish '%v': %v", msg, err) > } > log.Infof(ctx, "Published %v with id: %v", msg, id) > } > return sub, nil > } > {code} > > I ran this code in the following way > > {noformat} > go run . --project="" --runner dataflow --staging_location > gs:///binaries/ --temp_location gs:///tmp/ > --region "europe-west1" > --worker_harness_container_image=alethenorio/beam-go:v2.11.0{noformat} > > > > The code published to pubsub and then reads the messages and attempts to call > `stats.MeanPerKey` to create a
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361116 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 17/Dec/19 21:06 Start Date: 17/Dec/19 21:06 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405#issuecomment-566749001 R: @davidyan74 R: @rohdesamuel PTAL. Adding Ahmet as the committer, R: @aaltay Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361116) Time Spent: 48h 40m (was: 48.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 48h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361108 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359019634 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import io.github.classgraph.ClassGraph; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** Pipeline options dedicated to detecting classpath resources. */ +public interface PipelineResourcesOptions extends PipelineOptions { + + @Description( + "The class of the pipeline resources detector factory that should be created and used to create " + + "the detector. If not set explicitly, a default class will be used to instantiate the factory.") + @Default.Class(ClasspathScanningResourcesDetectorFactory.class) + Class + getPipelineResourcesDetectorFactoryClass(); + + void setPipelineResourcesDetectorFactoryClass( + Class factoryClass); + + @JsonIgnore + @Description( + "Instance of a pipeline resources detection algorithm. If not set explicitly, a default implementation will be used") + @Default.InstanceFactory(PipelineResourcesDetectorFactory.class) Review comment: its a minor hassle but please duplicate the description as a javadoc comment on the getter so that the javadoc has this information as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361108) Time Spent: 15h (was: 14h 50m) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361114 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359026175 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetectorAbstractFactory.java ## @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +/** Provides pipeline resources detection algorithm. */ +public interface PipelineResourcesDetectorAbstractFactory { Review comment: This isn't an abstract class but an interface, please rename to `PiplineResourcesDetectorFactory` and/or make it an inner interface of `PipelineResourcesDetector` and just call it `Factory` there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361114) Time Spent: 15h 40m (was: 15.5h) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h 40m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=36=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-36 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359023901 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java ## @@ -28,51 +32,44 @@ import java.util.ArrayList; import java.util.Arrays; import java.util.List; -import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; +import org.apache.beam.sdk.options.PipelineOptionsFactory; import org.junit.Rule; import org.junit.Test; -import org.junit.rules.ExpectedException; import org.junit.rules.TemporaryFolder; import org.junit.runner.RunWith; import org.junit.runners.JUnit4; -import org.mockito.Mockito; /** Tests for PipelineResources. */ @RunWith(JUnit4.class) public class PipelineResourcesTest { @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); - @Rule public transient ExpectedException thrown = ExpectedException.none(); @Test - public void detectClassPathResourceWithFileResources() throws Exception { + public void testDetectsResourcesToStage() throws IOException { Review comment: Please re-add that the resources are detected in the order they appear as part of the URLClassLoader. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 36) Time Spent: 15h 20m (was: 15h 10m) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h 20m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361106 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359018486 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import io.github.classgraph.ClassGraph; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** Pipeline options dedicated to detecting classpath resources. */ +public interface PipelineResourcesOptions extends PipelineOptions { + + @Description( + "The class of the pipeline resources detector factory that should be created and used to create " + + "the detector. If not set explicitly, a default class will be used to instantiate the factory.") + @Default.Class(ClasspathScanningResourcesDetectorFactory.class) + Class + getPipelineResourcesDetectorFactoryClass(); + + void setPipelineResourcesDetectorFactoryClass( + Class factoryClass); + + @JsonIgnore + @Description( + "Instance of a pipeline resources detection algorithm. If not set explicitly, a default implementation will be used") + @Default.InstanceFactory(PipelineResourcesDetectorFactory.class) + PipelineResourcesDetector getPipelineResourcesDetector(); + + void setPipelineResourcesDetector(PipelineResourcesDetector pipelineResourcesDetector); + + class PipelineResourcesDetectorFactory implements DefaultValueFactory { Review comment: class comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361106) Time Spent: 14h 50m (was: 14h 40m) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 14h 50m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361110 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359019828 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import io.github.classgraph.ClassGraph; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** Pipeline options dedicated to detecting classpath resources. */ +public interface PipelineResourcesOptions extends PipelineOptions { + + @Description( + "The class of the pipeline resources detector factory that should be created and used to create " + + "the detector. If not set explicitly, a default class will be used to instantiate the factory.") + @Default.Class(ClasspathScanningResourcesDetectorFactory.class) Review comment: Add `@JsonIgnore` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361110) Time Spent: 15h 10m (was: 15h) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h 10m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361109 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359021220 ## File path: runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java ## @@ -0,0 +1,129 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import static org.hamcrest.CoreMatchers.containsString; +import static org.hamcrest.CoreMatchers.hasItem; +import static org.hamcrest.CoreMatchers.hasItems; +import static org.hamcrest.MatcherAssert.assertThat; +import static org.hamcrest.Matchers.not; +import static org.junit.Assert.assertFalse; + +import io.github.classgraph.ClassGraph; +import java.io.File; +import java.io.FileOutputStream; +import java.io.IOException; +import java.net.URL; +import java.net.URLClassLoader; +import java.util.List; +import java.util.jar.JarOutputStream; +import java.util.jar.Manifest; +import java.util.stream.Collectors; +import org.apache.beam.sdk.testing.RestoreSystemProperties; +import org.junit.Before; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; +import org.mockito.Mockito; + +public class ClasspathScanningResourcesDetectorTest { + + @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder(); + + @Rule public transient RestoreSystemProperties systemProperties = new RestoreSystemProperties(); + + private ClasspathScanningResourcesDetector detector; + + private ClassLoader classLoader; Review comment: classLoader and detector are assigned but are not shared outside of the method body so please create them within the test instead as local variables. This would help with test readability. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361109) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361105 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359017865 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetector.java ## @@ -18,10 +18,10 @@ package org.apache.beam.runners.core.construction.resources; import java.io.Serializable; -import java.util.List; +import java.util.stream.Stream; /** Interface for an algorithm detecting classpath resources for pipelines. */ public interface PipelineResourcesDetector extends Serializable { - List detect(ClassLoader classLoader); + Stream detect(ClassLoader classLoader); Review comment: I would have also preferred list since stream can imply that it is infinitely long since there could be a generator function the person implements. Lists are ordered and have finite size. Also, set is the wrong abstraction since we want to maintain the classpath order and we also want to maintain duplicates. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361105) Time Spent: 14h 40m (was: 14.5h) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 14h 40m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361112 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359019575 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import io.github.classgraph.ClassGraph; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** Pipeline options dedicated to detecting classpath resources. */ +public interface PipelineResourcesOptions extends PipelineOptions { + + @Description( + "The class of the pipeline resources detector factory that should be created and used to create " Review comment: its a minor hassle but please duplicate the description as a javadoc comment on the getter so that the javadoc has this information as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361112) Time Spent: 15.5h (was: 15h 20m) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15.5h > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361113 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359024833 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java ## @@ -49,7 +50,19 @@ ClassLoader classLoader, PipelineOptions options) { PipelineResourcesOptions artifactsRelatedOptions = options.as(PipelineResourcesOptions.class); -return artifactsRelatedOptions.getPipelineResourcesDetector().detect(classLoader); +return artifactsRelatedOptions +.getPipelineResourcesDetector() +.detect(classLoader) +.filter(isStageable()) +.collect(Collectors.toList()); + } + + /** + * Returns a predicate for filtering all resources that are impossible to stage (like gradle + * wrapper jars). + */ + private static Predicate isStageable() { +return resourcePath -> !resourcePath.contains("gradle/wrapper"); Review comment: Instead of blacklisting, why does gradle appear on the classpath in the first place? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361113) Time Spent: 15h 40m (was: 15.5h) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 15h 40m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361107 ] ASF GitHub Bot logged work on BEAM-5495: Author: ASF GitHub Bot Created on: 17/Dec/19 21:02 Start Date: 17/Dec/19 21:02 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10268: [BEAM-5495] PipelineResources algorithm is not working in most environments URL: https://github.com/apache/beam/pull/10268#discussion_r359018417 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.core.construction.resources; + +import com.fasterxml.jackson.annotation.JsonIgnore; +import io.github.classgraph.ClassGraph; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.InstanceBuilder; + +/** Pipeline options dedicated to detecting classpath resources. */ +public interface PipelineResourcesOptions extends PipelineOptions { + + @Description( + "The class of the pipeline resources detector factory that should be created and used to create " + + "the detector. If not set explicitly, a default class will be used to instantiate the factory.") + @Default.Class(ClasspathScanningResourcesDetectorFactory.class) + Class + getPipelineResourcesDetectorFactoryClass(); + + void setPipelineResourcesDetectorFactoryClass( + Class factoryClass); + + @JsonIgnore + @Description( + "Instance of a pipeline resources detection algorithm. If not set explicitly, a default implementation will be used") + @Default.InstanceFactory(PipelineResourcesDetectorFactory.class) + PipelineResourcesDetector getPipelineResourcesDetector(); + + void setPipelineResourcesDetector(PipelineResourcesDetector pipelineResourcesDetector); + + class PipelineResourcesDetectorFactory implements DefaultValueFactory { + +@Override +public PipelineResourcesDetector create(PipelineOptions options) { + PipelineResourcesOptions resourcesOptions = options.as(PipelineResourcesOptions.class); + + PipelineResourcesDetectorAbstractFactory resourcesToStage = + InstanceBuilder.ofType(PipelineResourcesDetectorAbstractFactory.class) + .fromClass(resourcesOptions.getPipelineResourcesDetectorFactoryClass()) + .fromFactoryMethod("create") + .build(); + + return resourcesToStage.getPipelineResourcesDetector(); +} + } + + class ClasspathScanningResourcesDetectorFactory Review comment: class comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361107) Time Spent: 14h 50m (was: 14h 40m) > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Assignee: Lukasz Gajowy >Priority: Major > Fix For: 2.19.0 > > Time Spent: 14h 50m > Remaining Estimate: 0h > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361104 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 17/Dec/19 20:57 Start Date: 17/Dec/19 20:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10405: [BEAM-8335] Background caching job URL: https://github.com/apache/beam/pull/10405 1. Added background caching job startup/cancel logic. 2. Updated tests. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-8963) Apache Beam Java GCP dependencies to catch up with GCP libraries-bom
[ https://issues.apache.org/jira/browse/BEAM-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomo Suzuki updated BEAM-8963: -- Description: || ||org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.15.0||org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.16.0||com.google.cloud:libraries-bom:1.0.0||com.google.cloud:libraries-bom:2.0.0||com.google.cloud:libraries-bom:2.5.0||com.google.cloud:libraries-bom:3.1.0|| |avalon-framework:avalon-framework|4.1.5|4.1.5| | | | | |com.fasterxml.jackson.core:jackson-annotations|2.9.9|2.9.10| | | | | |com.fasterxml.jackson.core:jackson-core|2.9.9|2.9.10| | | | | |com.fasterxml.jackson.core:jackson-databind|2.9.9.3|2.9.10| | | | | |com.github.jponge:lzma-java|1.3|1.3| | | | | |com.google.android:android|1.5_r4|1.5_r4| | | | | |com.google.api-client:google-api-client|1.27.0|1.27.0| | | | | |com.google.api-client:google-api-client-jackson2|1.27.0|1.27.0| | | | | |com.google.api-client:google-api-client-java6|1.27.0|1.27.0| | | | | |com.google.api.grpc:grpc-google-cloud-asset-v1| | | |0.56.0|0.67.0|0.80.0| |com.google.api.grpc:grpc-google-cloud-asset-v1beta1| | |0.49.0|0.56.0|0.67.0|0.80.0| |com.google.api.grpc:grpc-google-cloud-asset-v1p2beta1| | | | | |0.80.0| |com.google.api.grpc:grpc-google-cloud-automl-v1| | | | | |0.79.1| |com.google.api.grpc:grpc-google-cloud-automl-v1beta1| | |0.49.0|0.56.0|0.67.0|0.79.1| |com.google.api.grpc:grpc-google-cloud-bigquerydatatransfer-v1| | |0.49.0|0.56.0|0.67.0|0.84.0| |com.google.api.grpc:grpc-google-cloud-bigquerystorage-v1beta1|0.44.0|0.44.0|0.49.0|0.56.0|0.67.0|0.84.0| |com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2|0.38.0|0.38.0|0.49.0|0.56.0|0.67.0|1.7.1| |com.google.api.grpc:grpc-google-cloud-bigtable-v2|0.44.0|0.38.0|0.49.0|0.56.0|0.67.0|1.7.1| |com.google.api.grpc:grpc-google-cloud-billingbudgets-v1beta1| | | | | |0.1.1| |com.google.api.grpc:grpc-google-cloud-build-v1| | | | | |0.1.0| |com.google.api.grpc:grpc-google-cloud-container-v1| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-containeranalysis-v1| | | | |0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-containeranalysis-v1beta1| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-datacatalog-v1beta1| | | |0.4.0-alpha|0.15.0-alpha|0.29.0-alpha| |com.google.api.grpc:grpc-google-cloud-datalabeling-v1beta1| | | |0.56.0|0.67.0|0.81.1| |com.google.api.grpc:grpc-google-cloud-dataproc-v1| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-dataproc-v1beta2| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-dialogflow-v2| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-dialogflow-v2beta1| | |0.49.0|0.56.0|0.67.0|0.83.0| |com.google.api.grpc:grpc-google-cloud-dlp-v2| | |0.49.0|0.56.0|0.67.0|0.81.0| |com.google.api.grpc:grpc-google-cloud-error-reporting-v1beta1| | |0.49.0|0.56.0|0.67.0|0.84.1| |com.google.api.grpc:grpc-google-cloud-firestore-admin-v1| | | |1.3.0|1.14.0|1.32.0| |com.google.api.grpc:grpc-google-cloud-firestore-v1| | |0.49.0|1.3.0|1.14.0|1.32.0| |com.google.api.grpc:grpc-google-cloud-firestore-v1beta1| | |0.49.0|0.56.0|0.67.0|0.85.0| |com.google.api.grpc:grpc-google-cloud-gameservices-v1alpha| | | | |0.3.0|0.18.0| |com.google.api.grpc:grpc-google-cloud-iamcredentials-v1| | |0.11.0-alpha|0.18.0-alpha|0.29.0-alpha|0.43.0-alpha| |com.google.api.grpc:grpc-google-cloud-iot-v1| | |0.49.0|0.56.0|0.67.0|0.81.0| |com.google.api.grpc:grpc-google-cloud-kms-v1| | |0.49.0|0.56.0|0.67.0|0.82.1| |com.google.api.grpc:grpc-google-cloud-language-v1| | |1.48.0|1.55.0|1.66.0|1.81.0| |com.google.api.grpc:grpc-google-cloud-language-v1beta2| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-logging-v2| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-monitoring-v3| | |1.48.0|1.55.0|1.66.0|1.81.0| |com.google.api.grpc:grpc-google-cloud-os-login-v1| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-phishingprotection-v1beta1| | | |0.2.0|0.13.0|0.28.0| |com.google.api.grpc:grpc-google-cloud-pubsub-v1|1.43.0|1.43.0|1.48.0|1.55.0|1.66.0|1.84.0| |com.google.api.grpc:grpc-google-cloud-recaptchaenterprise-v1beta1| | | |0.2.0|0.13.0|0.28.0| |com.google.api.grpc:grpc-google-cloud-recommender-v1beta1| | | | | |0.2.0| |com.google.api.grpc:grpc-google-cloud-redis-v1| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-redis-v1beta1| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-scheduler-v1| | |0.49.0|0.56.0|1.7.0|1.22.0| |com.google.api.grpc:grpc-google-cloud-scheduler-v1beta1| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-securitycenter-v1| | |0.49.0|0.56.0|0.67.0|0.82.0| |com.google.api.grpc:grpc-google-cloud-securitycenter-v1beta1| | |0.49.0|0.56.0|0.67.0|0.82.0|
[jira] [Created] (BEAM-8987) Support reverse/descending order in SortValues
Neville Li created BEAM-8987: Summary: Support reverse/descending order in SortValues Key: BEAM-8987 URL: https://issues.apache.org/jira/browse/BEAM-8987 Project: Beam Issue Type: New Feature Components: extensions-java-sorter Affects Versions: 2.16.0 Reporter: Neville Li It'll be nice to support descending order in {{SortValues}} but this is related to BEAM-8986. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8986) SortValues may not work correct for numerical types
[ https://issues.apache.org/jira/browse/BEAM-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-8986: - Description: {{SortValues}} transform uses lexicographical on encoded binaries and may not work correctly if the encoding is inconsistent with native comparison, e.g. negative integers. (was: `SortValues` transform uses lexicographical on encoded binaries and may not work correctly if the encoding is inconsistent with native comparison, e.g. negative integers.) > SortValues may not work correct for numerical types > --- > > Key: BEAM-8986 > URL: https://issues.apache.org/jira/browse/BEAM-8986 > Project: Beam > Issue Type: Bug > Components: extensions-java-sorter >Affects Versions: 2.16.0 >Reporter: Neville Li >Priority: Minor > > {{SortValues}} transform uses lexicographical on encoded binaries and may not > work correctly if the encoding is inconsistent with native comparison, e.g. > negative integers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8986) SortValues may not work correct for numerical types
Neville Li created BEAM-8986: Summary: SortValues may not work correct for numerical types Key: BEAM-8986 URL: https://issues.apache.org/jira/browse/BEAM-8986 Project: Beam Issue Type: Bug Components: extensions-java-sorter Affects Versions: 2.16.0 Reporter: Neville Li `SortValues` transform uses lexicographical on encoded binaries and may not work correctly if the encoding is inconsistent with native comparison, e.g. negative integers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8985) SortValues should fail if SecondaryKey coder is not deterministic
Neville Li created BEAM-8985: Summary: SortValues should fail if SecondaryKey coder is not deterministic Key: BEAM-8985 URL: https://issues.apache.org/jira/browse/BEAM-8985 Project: Beam Issue Type: Bug Components: extensions-java-sorter Affects Versions: 2.16.0 Reporter: Neville Li {{SortValues}} transform uses lexicographical sorting on encoded binaries and might not work correctly if the coder is not deterministic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361089 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 19:54 Start Date: 17/Dec/19 19:54 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#issuecomment-566723183 R: @tvalentyn PTAL, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361089) Time Spent: 20m (was: 10m) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361088 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 17/Dec/19 19:52 Start Date: 17/Dec/19 19:52 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404 1. Removed test logic depending on execution of asynchronous tasks since there is no control of them in a testing environment. 2. Replaced the dynamic plotting tests with tests directly/indirectly invoking underlying logic of the asynchronous task. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361081 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 17/Dec/19 19:39 Start Date: 17/Dec/19 19:39 Worklog Time Spent: 10m Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-566717207 > Thanks @gsteelman. We would love to have the sample Thrift schema and anything else we can utilize for testing! Who should I have it shared with? Also, https://thrift.apache.org/test/ and this schema https://raw.githubusercontent.com/apache/thrift/master/test/ThriftTest.thrift which might provide additional test cases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361081) Time Spent: 6h 50m (was: 6h 40m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 6h 50m > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8984) Spark uber jar job server: e2e test
[ https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8984: -- Summary: Spark uber jar job server: e2e test (was: Spark uber jar job server: integration test) > Spark uber jar job server: e2e test > --- > > Key: BEAM-8984 > URL: https://issues.apache.org/jira/browse/BEAM-8984 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > > There should be an integration test that tests submitting a simple pipeline > (with artifacts) to a cluster via the Spark uber jar job server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8984) Spark uber jar job server: e2e test
[ https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8984: -- Description: There should be an e2e test that tests submitting a simple pipeline (with artifacts) to a cluster via the Spark uber jar job server. (was: There should be an integration test that tests submitting a simple pipeline (with artifacts) to a cluster via the Spark uber jar job server.) > Spark uber jar job server: e2e test > --- > > Key: BEAM-8984 > URL: https://issues.apache.org/jira/browse/BEAM-8984 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > > There should be an e2e test that tests submitting a simple pipeline (with > artifacts) to a cluster via the Spark uber jar job server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8984) Spark uber jar job server: integration test
Kyle Weaver created BEAM-8984: - Summary: Spark uber jar job server: integration test Key: BEAM-8984 URL: https://issues.apache.org/jira/browse/BEAM-8984 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver There should be an integration test that tests submitting a simple pipeline (with artifacts) to a cluster via the Spark uber jar job server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8984) Spark uber jar job server: integration test
[ https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8984: -- Status: Open (was: Triage Needed) > Spark uber jar job server: integration test > --- > > Key: BEAM-8984 > URL: https://issues.apache.org/jira/browse/BEAM-8984 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > > There should be an integration test that tests submitting a simple pipeline > (with artifacts) to a cluster via the Spark uber jar job server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8983) Spark uber jar job server: query exceptions from master
[ https://issues.apache.org/jira/browse/BEAM-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8983: -- Status: Open (was: Triage Needed) > Spark uber jar job server: query exceptions from master > --- > > Key: BEAM-8983 > URL: https://issues.apache.org/jira/browse/BEAM-8983 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > > As far as I know, the Spark REST API does not return exceptions from the > cluster after a jar is actually run. While these exceptions can be viewed in > Spark's web UI, ideally they would also be visible in Beam's output. > To do this, we will need to find a REST endpoint that does return those > exceptions, and then map the submissionId to its corresponding job id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8983) Spark uber jar job server: query exceptions from master
Kyle Weaver created BEAM-8983: - Summary: Spark uber jar job server: query exceptions from master Key: BEAM-8983 URL: https://issues.apache.org/jira/browse/BEAM-8983 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver As far as I know, the Spark REST API does not return exceptions from the cluster after a jar is actually run. While these exceptions can be viewed in Spark's web UI, ideally they would also be visible in Beam's output. To do this, we will need to find a REST endpoint that does return those exceptions, and then map the submissionId to its corresponding job id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7
[ https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=361076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361076 ] ASF GitHub Bot logged work on BEAM-8671: Author: ASF GitHub Bot Created on: 17/Dec/19 19:29 Start Date: 17/Dec/19 19:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10125: [BEAM-8671] Added ParDo test running on Python 3.7 URL: https://github.com/apache/beam/pull/10125#issuecomment-566713165 > The test fails due to https://issues.apache.org/jira/browse/BEAM-8979. I'll push forward with the PR as soon as the bug is resolved. Is the Job id included in the logs now? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361076) Time Spent: 11h 20m (was: 11h 10m) > Migrate Python version to 3.7 > - > > Key: BEAM-8671 > URL: https://issues.apache.org/jira/browse/BEAM-8671 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 11h 20m > Remaining Estimate: 0h > > Currently, load tests run on Python 2.7. We should migrate to 3.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format
[ https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361075 ] ASF GitHub Bot logged work on BEAM-8933: Author: ASF GitHub Bot Created on: 17/Dec/19 19:21 Start Date: 17/Dec/19 19:21 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10384: [WIP] [BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as Rows URL: https://github.com/apache/beam/pull/10384#issuecomment-566709615 > I suppose we would need to have a separate BQ IO with Arrow support in the extension package then? Oh a better option would be for `:sdks:java:io:google-cloud-platform` to just include a dependency on `:sdks:java:extensions:arrow`, that doesn't seem too onerous This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361075) Time Spent: 3h 20m (was: 3h 10m) > BigQuery IO should support read/write in Arrow format > - > > Key: BEAM-8933 > URL: https://issues.apache.org/jira/browse/BEAM-8933 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > As of right now BigQuery uses Avro format for reading and writing. > We should add a config to BigQueryIO to specify which format to use: Arrow or > Avro (with Avro as default). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
[ https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=361067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361067 ] ASF GitHub Bot logged work on BEAM-8810: Author: ASF GitHub Bot Created on: 17/Dec/19 19:06 Start Date: 17/Dec/19 19:06 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #10311: [BEAM-8810] Detect stuck commits in StreamingDataflowWorker URL: https://github.com/apache/beam/pull/10311 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361067) Time Spent: 2h (was: 1h 50m) > Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs > --- > > Key: BEAM-8810 > URL: https://issues.apache.org/jira/browse/BEAM-8810 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > In several pipelines using streaming engine and thus the streaming commit > rpcs, work became stuck in state COMMITTING indefinitely. Such stuckness > coincided with repeated streaming rpc failures. > The status page shows that the key has work in state COMMITTING, and has 1 > queued work item. > There is a single active commit stream, with 0 pending requests. > The stream could exist past the stream deadline because the StreamCache only > closes stream due to the deadline when a stream is retrieved, which only > occurs if there are other commits. Since the pipeline is stuck due to this > event, there are no other commits. > It seems therefore there is some race on the commitStream between onNewStream > and commitWork that either prevents work from being retried, an exception > that triggers between when the pending request is removed and the callback is > called, or some potential corruption of the activeWork data structure. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361066 ] ASF GitHub Bot logged work on BEAM-8944: Author: ASF GitHub Bot Created on: 17/Dec/19 19:03 Start Date: 17/Dec/19 19:03 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to use single thread in py sdk bundle progress report URL: https://github.com/apache/beam/pull/10387#issuecomment-566321500 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361066) Time Spent: 2h 20m (was: 2h 10m) > Python SDK harness performance degradation with UnboundedThreadPoolExecutor > --- > > Key: BEAM-8944 > URL: https://issues.apache.org/jira/browse/BEAM-8944 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.18.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Blocker > Attachments: profiling.png, profiling_one_thread.png, > profiling_twelve_threads.png > > Time Spent: 2h 20m > Remaining Estimate: 0h > > We are seeing a performance degradation for python streaming word count load > tests. > > After some investigation, it appears to be caused by swapping the original > ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is > that python performance is worse with more threads on cpu-bounded tasks. > > A simple test for comparing the multiple thread pool executor performance: > > {code:python} > def test_performance(self): > def run_perf(executor): > total_number = 100 > q = queue.Queue() > def task(number): > hash(number) > q.put(number + 200) > return number > t = time.time() > count = 0 > for i in range(200): > q.put(i) > while count < total_number: > executor.submit(task, q.get(block=True)) > count += 1 > print('%s uses %s' % (executor, time.time() - t)) > with UnboundedThreadPoolExecutor() as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=1) as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=12) as executor: > run_perf(executor) > {code} > Results: > 0x7fab400dbe50> uses 268.160675049 > uses > 79.904583931 > uses > 191.179054976 > ``` > Profiling: > UnboundedThreadPoolExecutor: > !profiling.png! > 1 Thread ThreadPoolExecutor: > !profiling_one_thread.png! > 12 Threads ThreadPoolExecutor: > !profiling_twelve_threads.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361065 ] ASF GitHub Bot logged work on BEAM-8944: Author: ASF GitHub Bot Created on: 17/Dec/19 19:03 Start Date: 17/Dec/19 19:03 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to use single thread in py sdk bundle progress report URL: https://github.com/apache/beam/pull/10387#issuecomment-566702609 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361065) Time Spent: 2h 10m (was: 2h) > Python SDK harness performance degradation with UnboundedThreadPoolExecutor > --- > > Key: BEAM-8944 > URL: https://issues.apache.org/jira/browse/BEAM-8944 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.18.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Blocker > Attachments: profiling.png, profiling_one_thread.png, > profiling_twelve_threads.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We are seeing a performance degradation for python streaming word count load > tests. > > After some investigation, it appears to be caused by swapping the original > ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is > that python performance is worse with more threads on cpu-bounded tasks. > > A simple test for comparing the multiple thread pool executor performance: > > {code:python} > def test_performance(self): > def run_perf(executor): > total_number = 100 > q = queue.Queue() > def task(number): > hash(number) > q.put(number + 200) > return number > t = time.time() > count = 0 > for i in range(200): > q.put(i) > while count < total_number: > executor.submit(task, q.get(block=True)) > count += 1 > print('%s uses %s' % (executor, time.time() - t)) > with UnboundedThreadPoolExecutor() as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=1) as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=12) as executor: > run_perf(executor) > {code} > Results: > 0x7fab400dbe50> uses 268.160675049 > uses > 79.904583931 > uses > 191.179054976 > ``` > Profiling: > UnboundedThreadPoolExecutor: > !profiling.png! > 1 Thread ThreadPoolExecutor: > !profiling_one_thread.png! > 12 Threads ThreadPoolExecutor: > !profiling_twelve_threads.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails
[ https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998487#comment-16998487 ] Ankur Goenka commented on BEAM-8980: The stack trace is generally not the root cause as it is printed on grpc termination. Please link the jenkins/gradle link for the job for complete logs. > Running GroupByKeyLoadTest on Portable Flink fails > -- > > Key: BEAM-8980 > URL: https://issues.apache.org/jira/browse/BEAM-8980 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michal Walenia >Priority: Major > > When running a GBK Load test using Java harness image and JobServer image > generated from master, the load test fails with a cryptic exception: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Invalid job state: > FAILED. > 11:45:31 at > org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55) > 11:45:31 at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66) > 11:45:31 at > org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169) > {code} > > After some investigation, I found a stacktrace of the error: > {code:java} > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already > cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201) > at > org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at > org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at > org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at > java.lang.Thread.run(Thread.java:748) Suppressed: > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: > CANCELLED: call already cancelled at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524) > at > org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339) > at > org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98) > at > org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202) > at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202) > ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle > failed, TODO: [BEAM-3962] abort bundle. at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320) > ... 8 more > {code} > It seems that the core issue is an IllegalStateException thrown from > SdkHarnessClient.java:320, related to BEAM-3962. > -- This message was sent by Atlassian Jira (v8.3.4#803005)