[jira] [Updated] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Michal Walenia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Walenia updated BEAM-8980:
-
Description: 
When running a GBK Load test using Java harness image and JobServer image 
generated from master, the load test fails with a cryptic exception:
{code:java}
Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
FAILED.
11:45:31at 
org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
{code}
 

After some investigation, I found a stacktrace of the error:
{code:java}
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already 
cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
CANCELLED: call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
 at 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
 at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
 at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
java.lang.Thread.run(Thread.java:748) Suppressed: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
 at 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
 at 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
 ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
failed, TODO: [BEAM-3962] abort bundle. at 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
 ... 8 more
{code}
It seems that the core issue is an IllegalStateException thrown from 
SdkHarnessClient.java:320, related to BEAM-3962.

 It is important to note that the stacktrace above comes from the Flink 
cluster, not from the Gradle job that was executed.

The link to Jenkins job is here: 
[https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console]

  was:
When running a GBK Load test using Java harness image and JobServer image 
generated from master, the load test fails with a cryptic exception:
{code:java}
Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
FAILED.
11:45:31at 
org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
{code}
 

After some investigation, I found a stacktrace of the error:
{code:java}
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already 
cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
CANCELLED: call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 

[jira] [Comment Edited] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Michal Walenia (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998839#comment-16998839
 ] 

Michal Walenia edited comment on BEAM-8980 at 12/18/19 6:52 AM:


[~angoenka] thanks for reminding me, I added the link, though there's not much 
more in the console - the execution took place on the Flink cluster which was 
wiped during the final phase of the job.

 

The second stacktrace is from a Flink cluster I used for investigation of the 
error


was (Author: mwalenia):
[~angoenka] thanks for reminding me, I added the link, though there's not much 
more in the console - the execution took place on the Flink cluster which was 
wiped during the final phase of the job.

> Running GroupByKeyLoadTest on Portable Flink fails
> --
>
> Key: BEAM-8980
> URL: https://issues.apache.org/jira/browse/BEAM-8980
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Priority: Major
>
> When running a GBK Load test using Java harness image and JobServer image 
> generated from master, the load test fails with a cryptic exception:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
> FAILED.
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
> 11:45:31  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
> {code}
>  
> After some investigation, I found a stacktrace of the error:
> {code:java}
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already 
> cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
>  at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>  at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
> java.lang.Thread.run(Thread.java:748) Suppressed: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
>  at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
>  ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
> failed, TODO: [BEAM-3962] abort bundle. at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
>  ... 8 more
> {code}
> It seems that the core issue is an IllegalStateException thrown from 
> SdkHarnessClient.java:320, related to BEAM-3962.
>  It is important to note that the stacktrace above comes from the Flink 
> cluster, not from the Gradle job that was executed.
> The link to Jenkins job is here: 
> 

[jira] [Commented] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Michal Walenia (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998839#comment-16998839
 ] 

Michal Walenia commented on BEAM-8980:
--

[~angoenka] thanks for reminding me, I added the link, though there's not much 
more in the console - the execution took place on the Flink cluster which was 
wiped during the final phase of the job.

> Running GroupByKeyLoadTest on Portable Flink fails
> --
>
> Key: BEAM-8980
> URL: https://issues.apache.org/jira/browse/BEAM-8980
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Priority: Major
>
> When running a GBK Load test using Java harness image and JobServer image 
> generated from master, the load test fails with a cryptic exception:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
> FAILED.
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
> 11:45:31  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
> {code}
>  
> After some investigation, I found a stacktrace of the error:
> {code:java}
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already 
> cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
>  at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>  at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
> java.lang.Thread.run(Thread.java:748) Suppressed: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
>  at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
>  ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
> failed, TODO: [BEAM-3962] abort bundle. at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
>  ... 8 more
> {code}
> It seems that the core issue is an IllegalStateException thrown from 
> SdkHarnessClient.java:320, related to BEAM-3962.
>  
> The link to Jenkins job is here: 
> [https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Michal Walenia (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michal Walenia updated BEAM-8980:
-
Description: 
When running a GBK Load test using Java harness image and JobServer image 
generated from master, the load test fails with a cryptic exception:
{code:java}
Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
FAILED.
11:45:31at 
org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
{code}
 

After some investigation, I found a stacktrace of the error:
{code:java}
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already 
cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
CANCELLED: call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
 at 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
 at 
org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
 at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
java.lang.Thread.run(Thread.java:748) Suppressed: 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
 at 
org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
 at 
org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
 at 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
 at 
org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
 ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
failed, TODO: [BEAM-3962] abort bundle. at 
org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
 ... 8 more
{code}
It seems that the core issue is an IllegalStateException thrown from 
SdkHarnessClient.java:320, related to BEAM-3962.

 

The link to Jenkins job is here: 
[https://builds.apache.org/job/beam_LoadTests_Java_GBK_Flink_Batch_PR/28/console]

  was:
When running a GBK Load test using Java harness image and JobServer image 
generated from master, the load test fails with a cryptic exception:
{code:java}
Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
FAILED.
11:45:31at 
org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
11:45:31at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
11:45:31at 
org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
{code}
 

After some investigation, I found a stacktrace of the error:
{code:java}
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already 
cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
CANCELLED: call already cancelled at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
 at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
 at 

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361291
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Dec/19 04:42
Start Date: 18/Dec/19 04:42
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10159: [BEAM-8575] 
Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#issuecomment-566864658
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361291)
Time Spent: 38h 10m  (was: 38h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 38h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361289
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Dec/19 04:38
Start Date: 18/Dec/19 04:38
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r357487258
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -399,6 +432,108 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  # Test that three different kinds of metrics work with a customized
+  # CounterIncrememtingCombineFn.
+  def test_simple_combine(self):
+p = TestPipeline()
+input = (p
+ | beam.Create([('c', 'b'),
+('c', 'be'),
+('c', 'bea'),
+('d', 'beam'),
+('d', 'apache')]))
+
+# The result of concatenating all values regardless of key.
+global_concat = (input
+ | beam.Values()
+ | beam.CombineGlobally(CounterIncrememtingCombineFn()))
+
+# The (key, concatenated_string) pairs for all keys.
+concat_per_key = (input | beam.CombinePerKey(
+CounterIncrememtingCombineFn()))
+
+result = p.run()
+result.wait_until_finish()
+
+# Verify the concatenated strings are correct.
+expected_concat_per_key = [('c', 'bbebea'), ('d', 'beamapache')]
+assert_that(global_concat, equal_to(['bbebeabeamapache']),
+label='global concat')
+assert_that(concat_per_key, equal_to(expected_concat_per_key),
+label='concat per key')
+
+# Verify the values of metrics are correct.
+word_counter_filter = MetricsFilter().with_name('word_counter')
+query_result = result.metrics().query(word_counter_filter)
+if query_result['counters']:
+  word_counter = query_result['counters'][0]
+  self.assertEqual(word_counter.result, 5)
+
+word_lengths_filter = MetricsFilter().with_name('word_lengths')
+query_result = result.metrics().query(word_lengths_filter)
+if query_result['counters']:
+  word_lengths = query_result['counters'][0]
+  self.assertEqual(word_lengths.result, 16)
+
+word_len_dist_filter = MetricsFilter().with_name('word_len_dist')
+query_result = result.metrics().query(word_len_dist_filter)
+if query_result['distributions']:
+  word_len_dist = query_result['distributions'][0]
+  self.assertEqual(word_len_dist.result.mean, 3.2)
 
 Review comment:
   Done. After using different words as input, the mean is now 3.
   
   self.assertEqual(word_len_dist.result.mean, 3)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361289)
Time Spent: 38h  (was: 37h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 38h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361288
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Dec/19 04:36
Start Date: 18/Dec/19 04:36
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r357473023
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -48,6 +50,37 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+return acc + element
 
 Review comment:
   Done. Uses ''.join() to convert the list to a string.
   
return ''.join(sorted(acc + element))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361288)
Time Spent: 37h 50m  (was: 37h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 37h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361287
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Dec/19 04:35
Start Date: 18/Dec/19 04:35
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10190: 
[BEAM-8575] Added two unit tests to CombineTest class to test that Co…
URL: https://github.com/apache/beam/pull/10190#discussion_r357473023
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -48,6 +50,37 @@
 from apache_beam.utils.timestamp import Timestamp
 
 
+class CounterIncrememtingCombineFn(beam.CombineFn):
+  """CombineFn for incrementing three different counters:
+ counter, distribution, gauge,
+ at the same time concatenating words."""
+
+  def __init__(self):
+beam.CombineFn.__init__(self)
+self.word_counter = Metrics.counter(self.__class__, 'word_counter')
+self.word_lengths_counter = Metrics.counter(
+self.__class__, 'word_lengths')
+self.word_lengths_dist = Metrics.distribution(
+self.__class__, 'word_len_dist')
+self.last_word_len = Metrics.gauge(self.__class__, 'last_word_len')
+
+  def create_accumulator(self):
+return ''
+
+  def add_input(self, acc, element):
+self.word_counter.inc(1)
+self.word_lengths_counter.inc(len(element))
+self.word_lengths_dist.update(len(element))
+self.last_word_len.set(len(element))
+return acc + element
 
 Review comment:
   Done.
   
# ''.join() converts the list to a string.
return ''.join(sorted(acc + element))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361287)
Time Spent: 37h 40m  (was: 37.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 37h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361265
 ]

ASF GitHub Bot logged work on BEAM-8269:


Author: ASF GitHub Bot
Created on: 18/Dec/19 02:16
Start Date: 18/Dec/19 02:16
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9602: [BEAM-8269] 
Convert Py3 type hints to Beam types
URL: https://github.com/apache/beam/pull/9602#discussion_r359123181
 
 

 ##
 File path: sdks/python/apache_beam/typehints/typehints.py
 ##
 @@ -1190,9 +1190,14 @@ def get_yielded_type(type_hint):
   if is_consistent_with(type_hint, Iterator[Any]):
 return type_hint.yielded_type
   if is_consistent_with(type_hint, Tuple[Any, ...]):
-return Union[type_hint.tuple_types]
+if isinstance(type_hint, TupleConstraint):
+  return Union[type_hint.tuple_types]
+else:  # TupleSequenceConstraint
+  return type_hint.inner_type
   if is_consistent_with(type_hint, Iterable[Any]):
 return type_hint.inner_type
+  if is_consistent_with(type_hint, Dict[Any, Any]):
 
 Review comment:
   You're right, this branch has been removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361265)
Time Spent: 2h 20m  (was: 2h 10m)

> IOTypehints.from_callable doesn't convert native type hints to Beam
> ---
>
> Key: BEAM-8269
> URL: https://issues.apache.org/jira/browse/BEAM-8269
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Users typically write type hints using typing module types. We should allow 
> that, be internally convert these type to Beam module types for now.
> In the future, Beam should stop using these internal types (BEAM-8156).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361264
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 18/Dec/19 02:06
Start Date: 18/Dec/19 02:06
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10384: [WIP] 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-566833607
 
 
   Pushed a commit that moved the arrow code into `:sdks:java:extension:arrow`, 
thanks for the suggestion @iemejia. I want to do a little more clean up 
tomorrow before sending this out for review but it should be enough for 
@11moon11 to work from for the gcp io changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361264)
Time Spent: 3h 40m  (was: 3.5h)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361263
 ]

ASF GitHub Bot logged work on BEAM-8269:


Author: ASF GitHub Bot
Created on: 18/Dec/19 02:02
Start Date: 18/Dec/19 02:02
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9602: [BEAM-8269] 
Convert Py3 type hints to Beam types
URL: https://github.com/apache/beam/pull/9602#discussion_r359120339
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators_test_py3.py
 ##
 @@ -33,18 +34,20 @@
 
 decorators._enable_from_callable = True
 T = TypeVariable('T')
+# Name is 'T' so it converts to a beam type with the same name.
+T_typing = typing.TypeVar('T')
 
 
 class IOTypeHintsTest(unittest.TestCase):
 
   def test_from_callable(self):
 def fn(a: int, b: str = None, *args: Tuple[T], foo: List[int],
-   **kwargs: Dict[str, str]) -> Tuple:
+   **kwargs: Dict[str, str]) -> Tuple[Any, ...]:
 
 Review comment:
   Yes if these were typing module types, but in this case this is 
`typehints.Tuple[typehints.Any, ...]`, which requires subscripting to turn it 
into a constraint. I will remove such usage in the future and make the 
`typehints.` prefix explicit.
   
   FYI, the conversion from typing.Tuple to a typehints.Tuple constraint is 
taken care of here: 
https://github.com/apache/beam/pull/10042/files#diff-9b2eac354738047a44814d4b429cR245
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361263)
Time Spent: 2h 10m  (was: 2h)

> IOTypehints.from_callable doesn't convert native type hints to Beam
> ---
>
> Key: BEAM-8269
> URL: https://issues.apache.org/jira/browse/BEAM-8269
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Users typically write type hints using typing module types. We should allow 
> that, be internally convert these type to Beam module types for now.
> In the future, Beam should stop using these internal types (BEAM-8156).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361260
 ]

ASF GitHub Bot logged work on BEAM-8269:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:55
Start Date: 18/Dec/19 01:55
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9602: [BEAM-8269] Convert Py3 
type hints to Beam types
URL: https://github.com/apache/beam/pull/9602#issuecomment-566831210
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361260)
Time Spent: 2h  (was: 1h 50m)

> IOTypehints.from_callable doesn't convert native type hints to Beam
> ---
>
> Key: BEAM-8269
> URL: https://issues.apache.org/jira/browse/BEAM-8269
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Users typically write type hints using typing module types. We should allow 
> that, be internally convert these type to Beam module types for now.
> In the future, Beam should stop using these internal types (BEAM-8156).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361259
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:53
Start Date: 18/Dec/19 01:53
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566830870
 
 
   New errors: https://issues.apache.org/jira/browse/BEAM-8988. 
   Overall, postcommits are running and no longer timing out. I suggest we 
merge this PR since it will help reduce flakiness, which in turn will help 
verify fixes for BEAM-8988. @udim could you please review and/or merge if that 
sounds good to you?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361259)
Time Spent: 4h 10m  (was: 4h)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=361256=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361256
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:51
Start Date: 18/Dec/19 01:51
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9772: [BEAM-1440] Create 
a BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-566830300
 
 
   Postcommits tests are failing with this change: 
https://issues.apache.org/jira/browse/BEAM-8988.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361256)
Time Spent: 19.5h  (was: 19h 20m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361258
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:51
Start Date: 18/Dec/19 01:51
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10405: 
[BEAM-8335] Background caching job
URL: https://github.com/apache/beam/pull/10405#discussion_r359116365
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module to build and run background caching job.
+
+For internal use only; no backwards-compatibility guarantees.
+
+A background caching job is a job that caches events for all unbounded sources
+of a given pipeline. With Interactive Beam, one such job is started when a
+pipeline run happens (which produces a main job in contrast to the background
+caching job) and meets the following conditions:
+
+  #. The pipeline contains unbounded sources.
+  #. No such background job is running.
+  #. No such background job has completed successfully and the cached events 
are
+ still valid (invalidated when unbounded sources change in the pipeline).
+
+Once started, the background caching job runs asynchronously until it hits some
+cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
+will run using the deterministic replay-able cached events until they are
+invalidated.
+"""
+
+from __future__ import absolute_import
+
+import apache_beam as beam
+from apache_beam import runners
+from apache_beam.runners.interactive import interactive_environment as ie
+
+
+def attempt_to_run_background_caching_job(runner, user_pipeline, options=None):
+  """Attempts to run a background caching job for a user-defined pipeline.
+
+  If a background caching job is started, return the pipeline result. 
Otherwise,
+  return None.
+  The pipeline result is automatically tracked by Interactive Beam in case
+  future cancellation/cleanup is needed.
+  """
+  if is_background_caching_job_needed(user_pipeline):
+# Cancel non-terminal jobs if there is any before starting a new one.
+attempt_to_cancel_background_caching_job(user_pipeline)
+# TODO(BEAM-8335): refactor background caching job logic from
+# pipeline_instrument module to this module and aggregate tests.
+from apache_beam.runners.interactive import pipeline_instrument as instr
+runner_pipeline = beam.pipeline.Pipeline.from_runner_api(
+user_pipeline.to_runner_api(use_fake_coders=True),
+runner,
+options)
+background_caching_job_result = beam.pipeline.Pipeline.from_runner_api(
+instr.pin(runner_pipeline).background_caching_pipeline_proto(),
+runner,
+options).run()
+ie.current_env().set_pipeline_result(user_pipeline,
+ background_caching_job_result,
+ is_main_job=False)
+
+def is_background_caching_job_needed(user_pipeline):
+  """Determines if a background caching job needs to be started."""
+  background_caching_job_result = ie.current_env().pipeline_result(
+  user_pipeline, is_main_job=False)
+  from apache_beam.runners.interactive import pipeline_instrument as instr
+  # Checks if the pipeline contains unbounded sources.
+  return (instr.has_unbounded_sources(user_pipeline) and
 
 Review comment:
   This may not be in the scope of this PR, but we also need to let the user 
specify their own sources to be cached, because the user may have a large 
bounded source that they want to capture a small segment of.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361258)
Time 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361257
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:51
Start Date: 18/Dec/19 01:51
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10405: 
[BEAM-8335] Background caching job
URL: https://github.com/apache/beam/pull/10405#discussion_r359117001
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/background_caching_job.py
 ##
 @@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module to build and run background caching job.
+
+For internal use only; no backwards-compatibility guarantees.
+
+A background caching job is a job that caches events for all unbounded sources
+of a given pipeline. With Interactive Beam, one such job is started when a
+pipeline run happens (which produces a main job in contrast to the background
+caching job) and meets the following conditions:
+
+  #. The pipeline contains unbounded sources.
+  #. No such background job is running.
+  #. No such background job has completed successfully and the cached events 
are
+ still valid (invalidated when unbounded sources change in the pipeline).
+
+Once started, the background caching job runs asynchronously until it hits some
+cache size limit. Meanwhile, the main job and future main jobs from the 
pipeline
+will run using the deterministic replay-able cached events until they are
+invalidated.
+"""
+
+from __future__ import absolute_import
+
+import apache_beam as beam
+from apache_beam import runners
+from apache_beam.runners.interactive import interactive_environment as ie
+
+
+def attempt_to_run_background_caching_job(runner, user_pipeline, options=None):
+  """Attempts to run a background caching job for a user-defined pipeline.
+
+  If a background caching job is started, return the pipeline result. 
Otherwise,
+  return None.
 
 Review comment:
   From reading the code, this function doesn't return anything.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361257)
Time Spent: 49h 10m  (was: 49h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 49h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8902) parameterize input type of Java external transform

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8902?focusedWorklogId=361255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361255
 ]

ASF GitHub Bot logged work on BEAM-8902:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:51
Start Date: 18/Dec/19 01:51
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10305: [BEAM-8902] 
parameterize input type of Java external transform
URL: https://github.com/apache/beam/pull/10305#discussion_r359117711
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/ExternalTest.java
 ##
 @@ -139,7 +139,7 @@ public void expandPythonTest() {
   testPipeline
   .apply(Create.of("1", "2", "2", "3", "3", "3"))
   .apply(
-  External.>of(
+  External., KV>of(
 
 Review comment:
   Yeah, we can create a variant of `of()` for the common case. Do you have any 
good naming suggestion? :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361255)
Time Spent: 1h 40m  (was: 1.5h)

> parameterize input type of Java external transform
> --
>
> Key: BEAM-8902
> URL: https://issues.apache.org/jira/browse/BEAM-8902
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> parameterize input type of Java external transform to allow multiple 
> pcollections like PCollectionTuple or PCollectionList.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read

2019-12-17 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998734#comment-16998734
 ] 

Valentyn Tymofieiev commented on BEAM-8988:
---

cc: [~pabloem] [~chamikara]

> apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: 
> BigQuery source must be split before being read
> ---
>
> Key: BEAM-8988
> URL: https://issues.apache.org/jira/browse/BEAM-8988
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Kamil Wasilewski
>Priority: Critical
>
> Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/
> Triggered by https://github.com/apache/beam/pull/9772.
> Stacktrace:
> {noformat}
> Pipeline 
> BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 
> failed in state FAILED: java.lang.RuntimeException: Error received from SDK 
> harness for instruction 96: Traceback (most recent call last):
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 879, in process
> return self.do_fn_invoker.invoke_process(windowed_value)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 669, in invoke_process
> windowed_value, additional_args, additional_kwargs, output_processor)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 747, in _invoke_process_per_window
> windowed_value, self.process_method(*args_for_process))
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
>  line 998, in process_outputs
> for result in results:
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py",
>  line 1256, in process
> yield element, self.restriction_provider.initial_restriction(element)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py",
>  line 1518, in initial_restriction
> range_tracker = self._source.get_range_tracker(None, None)
>   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py",
>  line 652, in get_range_tracker
> raise NotImplementedError('BigQuery source must be split before being 
> read')
> NotImplementedError: BigQuery source must be split before being read
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read

2019-12-17 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8988:
-

 Summary: apache_beam.io.gcp.bigquery_read_it_test failing with: 
NotImplementedError: BigQuery source must be split before being read
 Key: BEAM-8988
 URL: https://issues.apache.org/jira/browse/BEAM-8988
 Project: Beam
  Issue Type: Bug
  Components: io-py-gcp
Reporter: Valentyn Tymofieiev
Assignee: Kamil Wasilewski


Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/

Triggered by https://github.com/apache/beam/pull/9772.

Stacktrace:

{noformat}
Pipeline 
BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 failed 
in state FAILED: java.lang.RuntimeException: Error received from SDK harness 
for instruction 96: Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
 line 879, in process
return self.do_fn_invoker.invoke_process(windowed_value)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
 line 669, in invoke_process
windowed_value, additional_args, additional_kwargs, output_processor)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
 line 747, in _invoke_process_per_window
windowed_value, self.process_method(*args_for_process))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py",
 line 998, in process_outputs
for result in results:
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py",
 line 1256, in process
yield element, self.restriction_provider.initial_restriction(element)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py",
 line 1518, in initial_restriction
range_tracker = self._source.get_range_tracker(None, None)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py",
 line 652, in get_range_tracker
raise NotImplementedError('BigQuery source must be split before being read')
NotImplementedError: BigQuery source must be split before being read
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7991) gradle cleanPython race

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7991?focusedWorklogId=361253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361253
 ]

ASF GitHub Bot logged work on BEAM-7991:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:42
Start Date: 18/Dec/19 01:42
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10388: [BEAM-7991] Fix 
cleanPython race with :clean
URL: https://github.com/apache/beam/pull/10388#issuecomment-566828423
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361253)
Time Spent: 0.5h  (was: 20m)

> gradle cleanPython race
> ---
>
> Key: BEAM-7991
> URL: https://issues.apache.org/jira/browse/BEAM-7991
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Under sdks/python run:
> {code}
> $ ../../gradlew setupVirtualenv
> $ ../../gradlew clean
> {code}
> And you should get with high probability errors about missing modules.
> Running this gives no errors:
> {code}
> $ ../../gradlew setupVirtualenv
> $ ../../gradlew clean --no-parallel
> {code}
> But notice that setup.py is not called in the second example, meaning that 
> some other task has already wiped out the build/ directory and the 
> virtualenvs in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361251
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:33
Start Date: 18/Dec/19 01:33
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10405: [BEAM-8335] 
Background caching job
URL: https://github.com/apache/beam/pull/10405#discussion_r359114098
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -125,6 +125,30 @@ def apply(self, transform, pvalueish, options):
 
   def run_pipeline(self, pipeline, options):
 pipeline_instrument = inst.pin(pipeline, options)
+# The user_pipeline analyzed might be None if the pipeline given has 
nothing
+# to be cached and tracing back to the user defined pipeline is impossible.
+# When it's None, there is no need to cache including the background
+# caching job and no result to track since no background caching job is
+# started at all.
+user_pipeline = pipeline_instrument.user_pipeline
+
+background_caching_job_result = ie.current_env().pipeline_result(
+user_pipeline, is_main_job=False)
+if (not background_caching_job_result or
 
 Review comment:
   Refactored this logic into a separate module that can be invoked here and in 
outer scopes such as `show`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361251)
Time Spent: 49h  (was: 48h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 49h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361244
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:20
Start Date: 18/Dec/19 01:20
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests 
for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-566823427
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361244)
Time Spent: 10h 20m  (was: 10h 10m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361245
 ]

ASF GitHub Bot logged work on BEAM-7961:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:20
Start Date: 18/Dec/19 01:20
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests 
for all runner native transforms for XLang
URL: https://github.com/apache/beam/pull/10051#issuecomment-566699073
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361245)
Time Spent: 10.5h  (was: 10h 20m)

> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite
> --
>
> Key: BEAM-7961
> URL: https://issues.apache.org/jira/browse/BEAM-7961
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Add tests for all runner native transforms and some widely used composite 
> transforms to cross-language validates runner test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=361238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361238
 ]

ASF GitHub Bot logged work on BEAM-8269:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:11
Start Date: 18/Dec/19 01:11
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9602: [BEAM-8269] Convert 
Py3 type hints to Beam types
URL: https://github.com/apache/beam/pull/9602#issuecomment-566821483
 
 
   BTW, I'm not able to reproduce the error, in 2.7 or 3.6, with or without 
cython. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361238)
Time Spent: 1h 50m  (was: 1h 40m)

> IOTypehints.from_callable doesn't convert native type hints to Beam
> ---
>
> Key: BEAM-8269
> URL: https://issues.apache.org/jira/browse/BEAM-8269
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Users typically write type hints using typing module types. We should allow 
> that, be internally convert these type to Beam module types for now.
> In the future, Beam should stop using these internal types (BEAM-8156).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361236
 ]

ASF GitHub Bot logged work on BEAM-8944:


Author: ASF GitHub Bot
Created on: 18/Dec/19 01:00
Start Date: 18/Dec/19 01:00
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to 
use single thread in py sdk bundle progress report
URL: https://github.com/apache/beam/pull/10387#issuecomment-566819037
 
 
   > > > Would it be better to have the runner request progress less frequently?
   > > 
   > > 
   > > I think that helps too. I believe right now JRH requests every 0.1 sec. 
Not exactly sure how the frequency is picked.
   > 
   > 0.1 secs is a lot and doesn't seem right.
   
   I think this is where it is set, we can try to tune that down.
   
https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/fn/control/BeamFnMapTaskExecutor.java#L296
   
   Single thread should be good enough for progress report in python sdk 
harness and shouldn't incur stuckness issues, and it could also help limiting 
its impact on the bundle processing critical path.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361236)
Time Spent: 2h 40m  (was: 2.5h)

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361235
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:50
Start Date: 18/Dec/19 00:50
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359103959
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/ThriftIdlParser.java
 ##
 @@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser;
+
+import java.io.IOException;
+import java.io.Reader;
+import org.antlr.runtime.ANTLRReaderStream;
+import org.antlr.runtime.CommonTokenStream;
+import org.antlr.runtime.RecognitionException;
+import org.antlr.runtime.tree.BufferedTreeNodeStream;
+import org.antlr.runtime.tree.Tree;
+import org.antlr.runtime.tree.TreeNodeStream;
+import org.apache.beam.sdk.io.thrift.parser.antlr.DocumentGenerator;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftLexer;
+import org.apache.beam.sdk.io.thrift.parser.antlr.ThriftParser;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CharSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ThriftIdlParser {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(ThriftIdlParser.class);
+
+  /** Generates {@link Document} from {@link org.antlr.runtime.tree.Tree}. */
+  public static Document parseThriftIdl(CharSource input) throws IOException {
+Tree tree = parseTree(input);
+TreeNodeStream stream = new BufferedTreeNodeStream(tree);
+DocumentGenerator generator = new DocumentGenerator(stream);
+try {
+  return generator.document().value;
+} catch (RecognitionException e) {
+  LOG.error("Failed to generate document: " + e.getMessage());
+  throw new RuntimeException(e);
+}
+  }
+
+  /** Generates {@link org.antlr.runtime.tree.Tree} from input. */
+  static Tree parseTree(CharSource input) throws IOException {
+try (Reader reader = input.openStream()) {
+  ThriftLexer lexer = new ThriftLexer(new ANTLRReaderStream(reader));
+  ThriftParser parser = new ThriftParser(new CommonTokenStream(lexer));
+  try {
+Tree tree = (Tree) parser.document().getTree();
+if (parser.getNumberOfSyntaxErrors() > 0) {
+  LOG.error("Parsing generated " + parser.getNumberOfSyntaxErrors() + 
"errors.");
+  throw new RuntimeException("syntax error");
 
 Review comment:
   That is covered in the `catch` below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361235)
Time Spent: 8h 10m  (was: 8h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361233
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:42
Start Date: 18/Dec/19 00:42
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364
 
 
   @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for 
parser has been opened.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361233)
Time Spent: 8h  (was: 7h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361232
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:40
Start Date: 18/Dec/19 00:40
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359101649
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361231
 ]

ASF GitHub Bot logged work on BEAM-8944:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:34
Start Date: 18/Dec/19 00:34
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10387: [BEAM-8944] Change 
to use single thread in py sdk bundle progress report
URL: https://github.com/apache/beam/pull/10387#issuecomment-566812644
 
 
   > > Would it be better to have the runner request progress less frequently?
   > 
   > I think that helps too. I believe right now JRH requests every 0.1 sec. 
Not exactly sure how the frequency is picked.
   
   0.1 secs is a lot and doesn't seem right.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361231)
Time Spent: 2.5h  (was: 2h 20m)

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361230
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:29
Start Date: 18/Dec/19 00:29
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566811364
 
 
   @gsteelman @chamikaramj [PR](https://github.com/apache/beam/pull/10395) for 
parser has been opened 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361230)
Time Spent: 7h 40m  (was: 7.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"

2019-12-17 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada reassigned BEAM-6860:
---

Assignee: Chamikara Madhusanka Jayalath  (was: Pawel Kordek)

> WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
> -
>
> Key: BEAM-6860
> URL: https://issues.apache.org/jira/browse/BEAM-6860
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: macOS, DirectRunner, python 2.7.15 via 
> pyenv/pyenv-virtualenv
>Reporter: Henrik
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>  Labels: newbie
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Main error:
> > Cannot convert GlobalWindow to 
> > apache_beam.utils.windowed_value._IntervalWindowBase
> This is very hard for me to debug. Doing a DoPar call before, printing the 
> input, gives me just what I want; so the lines of data to serialise are 
> "alright"; just JSON strings, in fact.
> Stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "./okr_end_ride.py", line 254, in 
>     run()
>   File "./okr_end_ride.py", line 250, in run
>     run_pipeline(pipeline_options, known_args)
>   File "./okr_end_ride.py", line 198, in run_pipeline
>     | 'write_all' >> WriteToText(known_args.output, 
> file_name_suffix=".txt")
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>     self.run().wait_until_finish()
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 406, in run
>     self._options).run(False)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 275, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 278, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 354, in run_stages
>     stage_context.safe_coders)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 509, in run_stage
>     data_input, data_output)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1206, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 821, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 265, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 281, in process_bundle
>     delayed_applications = bundle_processor.process_bundle(instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 552, in process_bundle
>     op.finish()
>   File "apache_beam/runners/worker/operations.py", line 549, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 550, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 551, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/common.py", line 758, in 
> apache_beam.runners.common.DoFnRunner.finish
>   File "apache_beam/runners/common.py", line 752, in 
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method
>   File "apache_beam/runners/common.py", line 777, in 
> 

[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=361227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361227
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 18/Dec/19 00:25
Start Date: 18/Dec/19 00:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10175: [BEAM-7390] 
Add code snippet for Max
URL: https://github.com/apache/beam/pull/10175
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361227)
Time Spent: 8h 10m  (was: 8h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361210
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:34
Start Date: 17/Dec/19 23:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359085087
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,708 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static java.lang.String.format;
+import static java.util.stream.Collectors.joining;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.coders.StringUtf8Coder;
+import org.apache.beam.sdk.io.Compression;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.io.thrift.parser.ThriftIdlParser;
+import org.apache.beam.sdk.io.thrift.parser.model.BaseType;
+import org.apache.beam.sdk.io.thrift.parser.model.Const;
+import org.apache.beam.sdk.io.thrift.parser.model.Definition;
+import org.apache.beam.sdk.io.thrift.parser.model.Document;
+import org.apache.beam.sdk.io.thrift.parser.model.Header;
+import org.apache.beam.sdk.io.thrift.parser.model.IdentifierType;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.IntegerEnumField;
+import org.apache.beam.sdk.io.thrift.parser.model.ListType;
+import org.apache.beam.sdk.io.thrift.parser.model.MapType;
+import org.apache.beam.sdk.io.thrift.parser.model.Service;
+import org.apache.beam.sdk.io.thrift.parser.model.StringEnum;
+import org.apache.beam.sdk.io.thrift.parser.model.Struct;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftException;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftField;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftMethod;
+import org.apache.beam.sdk.io.thrift.parser.model.ThriftType;
+import org.apache.beam.sdk.io.thrift.parser.model.TypeAnnotation;
+import org.apache.beam.sdk.io.thrift.parser.model.Typedef;
+import org.apache.beam.sdk.io.thrift.parser.model.VoidType;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.options.ValueProvider.StaticValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PDone;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Charsets;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.ByteSource;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing Thrift files.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#read} with the desired file 
pattern to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection documents = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361203
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:26
Start Date: 17/Dec/19 23:26
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359082635
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361199
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:18
Start Date: 17/Dec/19 23:18
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566793062
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361199)
Time Spent: 1h 20m  (was: 1h 10m)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361198
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:15
Start Date: 17/Dec/19 23:15
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10159: 
[BEAM-8575] Added a unit test to CombineTest class to test that Combi…
URL: https://github.com/apache/beam/pull/10159#discussion_r359079322
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +398,18 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_hot_key_combining_with_accumulation_mode(self):
+with TestPipeline() as p:
+  result = (p
+| beam.Create([1, 2, 3, 4, 5])
+| beam.WindowInto(GlobalWindows(),
+  trigger=Repeatedly(AfterCount(1)),
+  accumulation_mode=
+  AccumulationMode.ACCUMULATING)
+| beam.CombineGlobally(sum).without_defaults().with_fanout(2))
 
 Review comment:
   If delete without_defaults(), we get this error:
   ValueError: PCollection of size 2 with more than one element accessed as a 
singleton view. First two elements encountered are "1", "3". [while running 
'CombineGlobally(sum, fanout=2)/InjectDefault']
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361198)
Time Spent: 37.5h  (was: 37h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 37.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8932) Expose complete Cloud Pub/Sub messages through PubsubIO API

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8932?focusedWorklogId=361197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361197
 ]

ASF GitHub Bot logged work on BEAM-8932:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:14
Start Date: 17/Dec/19 23:14
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10331: [BEAM-8932]  Modify 
PubsubClient to use the proto message throughout.
URL: https://github.com/apache/beam/pull/10331#issuecomment-566791993
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361197)
Time Spent: 1h 40m  (was: 1.5h)

> Expose complete Cloud Pub/Sub messages through PubsubIO API
> ---
>
> Key: BEAM-8932
> URL: https://issues.apache.org/jira/browse/BEAM-8932
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Daniel Collins
>Assignee: Daniel Collins
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The PubsubIO API only exposes a subset of the fields in the underlying 
> PubsubMessage protocol buffer. To accomodate future feature changes as well 
> as for greater compatability with code using the Cloud Pub/Sub apis, a method 
> to read and write these protocol messages should be exposed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361193
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:05
Start Date: 17/Dec/19 23:05
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404#discussion_r359076289
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self):
 h.stop()
 
   @patch('apache_beam.runners.interactive.display.pcoll_visualization'
- '.PCollectionVisualization.display_facets')
+ '.PCollectionVisualization._display_dive')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_overview')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_dataframe')
   def test_dynamic_plotting_update_same_display(self,
-mocked_display_facets):
+mocked_display_dataframe,
 
 Review comment:
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361193)
Time Spent: 1h 10m  (was: 1h)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361191
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:03
Start Date: 17/Dec/19 23:03
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404#discussion_r359075735
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self):
 h.stop()
 
   @patch('apache_beam.runners.interactive.display.pcoll_visualization'
- '.PCollectionVisualization.display_facets')
+ '.PCollectionVisualization._display_dive')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_overview')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_dataframe')
   def test_dynamic_plotting_update_same_display(self,
-mocked_display_facets):
+mocked_display_dataframe,
 
 Review comment:
   SG - once you push the commit and tests pass we can merge this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361191)
Time Spent: 1h  (was: 50m)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361189
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:01
Start Date: 17/Dec/19 23:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566788097
 
 
   [BEAM-8979] Remove mypy-protobuf dependency
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361189)
Time Spent: 3h 50m  (was: 3h 40m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=361188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361188
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:01
Start Date: 17/Dec/19 23:01
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #10183: [BEAM-7850] Makes 
environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#issuecomment-566788122
 
 
   Heads up, in #10393 I updated the Go proto compiler version and rebuilt the 
generated code.You'll probably need to update the generated Go code before 
merging this (and make sure you update the protoc-gen-go version to the most 
recent one this time). If it says the generated code is using ProtoVersion3 
instead of 2, then your change is good to go.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361188)
Time Spent: 2h 40m  (was: 2.5h)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361187
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:01
Start Date: 17/Dec/19 23:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566788097
 
 
   [BEAM-8979] Remove mypy-protobuf dependency
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361187)
Time Spent: 3h 40m  (was: 3.5h)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361190=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361190
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:01
Start Date: 17/Dec/19 23:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566788153
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361190)
Time Spent: 4h  (was: 3h 50m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=361185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361185
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:00
Start Date: 17/Dec/19 23:00
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10400: [BEAM-8979] Remove 
mypy-protobuf dependency
URL: https://github.com/apache/beam/pull/10400#issuecomment-566787938
 
 
   Thanks, @kamilwu and @chadrik .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361185)
Time Spent: 1h 40m  (was: 1.5h)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=361184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361184
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 17/Dec/19 23:00
Start Date: 17/Dec/19 23:00
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10400: [BEAM-8979] 
Remove mypy-protobuf dependency
URL: https://github.com/apache/beam/pull/10400
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361184)
Time Spent: 1.5h  (was: 1h 20m)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce this error locally.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=361183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361183
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:56
Start Date: 17/Dec/19 22:56
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10378: [BEAM-8481] Fix a 
race condition in proto stubs generation.
URL: https://github.com/apache/beam/pull/10378#issuecomment-566786851
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361183)
Time Spent: 3.5h  (was: 3h 20m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Critical
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=361178=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361178
 ]

ASF GitHub Bot logged work on BEAM-7274:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:49
Start Date: 17/Dec/19 22:49
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer 
a Beam Schema from a protocol buffer class.
URL: https://github.com/apache/beam/pull/10356#issuecomment-566784391
 
 
   Friendly ping. @alexvanboxel do you have any thoughts on this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361178)
Time Spent: 17h 20m  (was: 17h 10m)

> Protobuf Beam Schema support
> 
>
> Key: BEAM-7274
> URL: https://issues.apache.org/jira/browse/BEAM-7274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Minor
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> Add support for the new Beam Schema to the Protobuf extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361175
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:45
Start Date: 17/Dec/19 22:45
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10405: 
[BEAM-8335] Background caching job
URL: https://github.com/apache/beam/pull/10405#discussion_r359069339
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -125,6 +125,30 @@ def apply(self, transform, pvalueish, options):
 
   def run_pipeline(self, pipeline, options):
 pipeline_instrument = inst.pin(pipeline, options)
+# The user_pipeline analyzed might be None if the pipeline given has 
nothing
+# to be cached and tracing back to the user defined pipeline is impossible.
+# When it's None, there is no need to cache including the background
+# caching job and no result to track since no background caching job is
+# started at all.
+user_pipeline = pipeline_instrument.user_pipeline
+
+background_caching_job_result = ie.current_env().pipeline_result(
+user_pipeline, is_main_job=False)
+if (not background_caching_job_result or
 
 Review comment:
   Discussed offline. This block of code is a bit hard to read. Ning will do a 
minor refactor to make this clearer.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361175)
Time Spent: 48h 50m  (was: 48h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 48h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361176
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:45
Start Date: 17/Dec/19 22:45
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10384: [WIP] [BEAM-8933] 
Utilities for converting Arrow schemas and reading Arrow batches as Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-566783361
 
 
   Yes exactly as your last proposal, we already did that for ZetaSQL in the 
past too (making it an extension) that is a dep of GCP IO. Thanks for 
understanding my point. As an extra point a module module for Arrow related 
code may end up benefiting other IOs without the tradeoff of leaking too many 
things into core.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361176)
Time Spent: 3.5h  (was: 3h 20m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361174
 ]

ASF GitHub Bot logged work on BEAM-8960:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:44
Start Date: 17/Dec/19 22:44
Worklog Time Spent: 10m 
  Work Description: yirutang commented on issue #10408: [BEAM-8960]: Add an 
option for user to opt out of using insert id for…
URL: https://github.com/apache/beam/pull/10408#issuecomment-566782511
 
 
   R: @reuvenlax
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361174)
Remaining Estimate: 23.5h  (was: 23h 40m)
Time Spent: 0.5h  (was: 20m)

> Add an option for user to be able to opt out of using insert id for BigQuery 
> streaming insert.
> --
>
> Key: BEAM-8960
> URL: https://issues.apache.org/jira/browse/BEAM-8960
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 0.5h
>  Remaining Estimate: 23.5h
>
> BigQuery streaming insert id offers best effort insert deduplication. If user 
> choose to opt out of using insert ids, they could potentially to be opt into 
> using our current new streaming backend which gives higher speed and more 
> quota. Insert id deduplication is best effort and doesn't have ultimate just 
> once guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361172
 ]

ASF GitHub Bot logged work on BEAM-8960:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:43
Start Date: 17/Dec/19 22:43
Worklog Time Spent: 10m 
  Work Description: yirutang commented on issue #10408: [BEAM-8960]: Add an 
option for user to opt out of using insert id for…
URL: https://github.com/apache/beam/pull/10408#issuecomment-566782511
 
 
   R:reuvenlax
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361172)
Remaining Estimate: 23h 40m  (was: 23h 50m)
Time Spent: 20m  (was: 10m)

> Add an option for user to be able to opt out of using insert id for BigQuery 
> streaming insert.
> --
>
> Key: BEAM-8960
> URL: https://issues.apache.org/jira/browse/BEAM-8960
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 20m
>  Remaining Estimate: 23h 40m
>
> BigQuery streaming insert id offers best effort insert deduplication. If user 
> choose to opt out of using insert ids, they could potentially to be opt into 
> using our current new streaming backend which gives higher speed and more 
> quota. Insert id deduplication is best effort and doesn't have ultimate just 
> once guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361171
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:43
Start Date: 17/Dec/19 22:43
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r359068383
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java
 ##
 @@ -159,14 +158,42 @@
   BeamCoGBKJoinRule.INSTANCE,
   BeamSideInputLookupJoinRule.INSTANCE);
 
+  private static final List BEAM_CONVERTERS_CALCITE_SQL_ONLY =
+  ImmutableList.of(BeamCalcRule.INSTANCE);
+
   private static final List BEAM_TO_ENUMERABLE =
   ImmutableList.of(BeamEnumerableConverterRule.INSTANCE);
 
   public static RuleSet[] getRuleSets() {
 return new RuleSet[] {
   RuleSets.ofList(
   ImmutableList.builder()
-  .addAll(BEAM_CONVERTERS)
+  .addAll(BEAM_CONVERTERS_COMMON)
+  .addAll(BEAM_CONVERTERS_CALCITE_SQL_ONLY)
+  .addAll(BEAM_TO_ENUMERABLE)
+  .addAll(LOGICAL_OPTIMIZATIONS)
+  .build())
+};
+  }
+
+  /** Returns the rule sets that allow using ZetaSQL evaluator for Calc. */
 
 Review comment:
   Done. Thanks for the suggestion! That make the code cleaner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361171)
Time Spent: 1h  (was: 50m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8960) Add an option for user to be able to opt out of using insert id for BigQuery streaming insert.

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8960?focusedWorklogId=361165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361165
 ]

ASF GitHub Bot logged work on BEAM-8960:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:42
Start Date: 17/Dec/19 22:42
Worklog Time Spent: 10m 
  Work Description: yirutang commented on pull request #10408: [BEAM-8960]: 
Add an option for user to opt out of using insert id for…
URL: https://github.com/apache/beam/pull/10408
 
 
   Expose an option so that user can opt out of using insert id while streaming 
into BigQuery. Insert id only guarantees best effort insert rows deduplication, 
without it, user will be able to opt into using new streaming backend with 
higher quotas and reliabilities.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361165)
Remaining Estimate: 23h 50m  (was: 24h)
Time Spent: 10m

> Add an option for user to be able to opt out of using insert id for BigQuery 
> streaming insert.
> --
>
> Key: BEAM-8960
> URL: https://issues.apache.org/jira/browse/BEAM-8960
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Yiru Tang
>Priority: Minor
>   Original Estimate: 24h
>  Time Spent: 10m
>  Remaining Estimate: 23h 50m
>
> BigQuery streaming insert id offers best effort insert deduplication. If user 
> choose to opt out of using insert ids, they could potentially to be opt into 
> using our current new streaming backend which gives higher speed and more 
> quota. Insert id deduplication is best effort and doesn't have ultimate just 
> once guarantees.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361166=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361166
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:42
Start Date: 17/Dec/19 22:42
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r359068139
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLDialectSpecTest.java
 ##
 @@ -3755,6 +3755,8 @@ private void 
initializeCalciteEnvironmentWithContext(Context... extraContext) {
 .defaultSchema(defaultSchemaPlus)
 .traitDefs(traitDefs)
 .context(Contexts.of(contexts))
+// TODO[BEAM-8630]: change to BeamRuleSets.getZetaSqlRuleSets()
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361166)
Time Spent: 40m  (was: 0.5h)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361167
 ]

ASF GitHub Bot logged work on BEAM-8630:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:42
Start Date: 17/Dec/19 22:42
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9913: [BEAM-8630] 
Prototype of BeamZetaSqlCalcRel
URL: https://github.com/apache/beam/pull/9913#discussion_r359068178
 
 

 ##
 File path: 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLPushDownTest.java
 ##
 @@ -187,6 +187,8 @@ private static void 
initializeCalciteEnvironmentWithContext(Context... extraCont
 .defaultSchema(defaultSchemaPlus)
 .traitDefs(traitDefs)
 .context(Contexts.of(contexts))
+// TODO[BEAM-8630]: change to BeamRuleSets.getZetaSqlRuleSets()
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361167)
Time Spent: 50m  (was: 40m)

> Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
> 
>
> Key: BEAM-8630
> URL: https://issues.apache.org/jira/browse/BEAM-8630
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361160
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:26
Start Date: 17/Dec/19 22:26
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404#discussion_r359061791
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self):
 h.stop()
 
   @patch('apache_beam.runners.interactive.display.pcoll_visualization'
- '.PCollectionVisualization.display_facets')
+ '.PCollectionVisualization._display_dive')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_overview')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_dataframe')
   def test_dynamic_plotting_update_same_display(self,
-mocked_display_facets):
+mocked_display_dataframe,
 
 Review comment:
   Yes, we are verifying that the same display is being updated.
   Renaming the test to `test_dynamic_plotting_updates_same_display`.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361160)
Time Spent: 50m  (was: 40m)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361151
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 22:00
Start Date: 17/Dec/19 22:00
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566768401
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361151)
Time Spent: 1h 10m  (was: 1h)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8296) Containerize the Spark job server

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8296?focusedWorklogId=361149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361149
 ]

ASF GitHub Bot logged work on BEAM-8296:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:58
Start Date: 17/Dec/19 21:58
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10407: [BEAM-8296] 
containerize spark job server
URL: https://github.com/apache/beam/pull/10407
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361137
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:45
Start Date: 17/Dec/19 21:45
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359045633
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/IntegerEnum.java
 ##
 @@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.util.List;
+import java.util.Objects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
 
 Review comment:
   For the Beam modules the dependencies are defined in the build.gradle for 
the respective module. We import the vendored Guava version there and then 
import that into the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361137)
Time Spent: 7h 10m  (was: 7h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361136
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:44
Start Date: 17/Dec/19 21:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404#discussion_r359044800
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self):
 h.stop()
 
   @patch('apache_beam.runners.interactive.display.pcoll_visualization'
- '.PCollectionVisualization.display_facets')
+ '.PCollectionVisualization._display_dive')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_overview')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_dataframe')
   def test_dynamic_plotting_update_same_display(self,
-mocked_display_facets):
+mocked_display_dataframe,
 
 Review comment:
   Are we verifying that the same display is being updated? If so consider 
`s/update/updates` in `test_dynamic_plotting_update_same_display`, if not 
consider a name that communicates the expected behavior.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361136)
Time Spent: 40m  (was: 0.5h)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361135
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:43
Start Date: 17/Dec/19 21:43
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404#discussion_r359044800
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -88,44 +87,40 @@ def test_dynamic_plotting_return_handle(self):
 h.stop()
 
   @patch('apache_beam.runners.interactive.display.pcoll_visualization'
- '.PCollectionVisualization.display_facets')
+ '.PCollectionVisualization._display_dive')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_overview')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollectionVisualization._display_dataframe')
   def test_dynamic_plotting_update_same_display(self,
-mocked_display_facets):
+mocked_display_dataframe,
 
 Review comment:
   Are we verifying that the same display is being updated? If so consider 
s/update/updates, if not consider a name that communicates the expected 
behavior.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361135)
Time Spent: 0.5h  (was: 20m)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361132
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:40
Start Date: 17/Dec/19 21:40
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r359043453
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/parser/model/Document.java
 ##
 @@ -0,0 +1,424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift.parser.model;
+
+import static java.util.Collections.emptyList;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.avro.reflect.ReflectData;
+import org.apache.beam.sdk.io.thrift.parser.visitor.DocumentVisitor;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.MoreObjects;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+
+/**
+ * The {@link Document} class holds the elements of a Thrift file.
+ *
+ * A {@link Document} is made up of:
+ *
+ * 
+ *   {@link Header} - Contains: includes, cppIncludes, namespaces, and 
defaultNamespace.
+ *   {@link Document#definitions} - Contains list of Thrift {@link 
Definition}.
+ * 
+ */
+public class Document implements Serializable {
+  private Header header;
+  private List definitions;
+
+  public Document(Header header, List definitions) {
+this.header = checkNotNull(header, "header");
+this.definitions = ImmutableList.copyOf(checkNotNull(definitions, 
"definitions"));
+  }
+
+  /** Returns an empty {@link Document}. */
+  public static Document emptyDocument() {
+List includes = emptyList();
+List cppIncludes = emptyList();
+String defaultNamespace = null;
+Map namespaces = Collections.emptyMap();
+Header header = new Header(includes, cppIncludes, defaultNamespace, 
namespaces);
+List definitions = emptyList();
+return new Document(header, definitions);
+  }
+
+  public Document getDocument() {
+return this;
+  }
+
+  public Header getHeader() {
+return this.header;
+  }
+
+  public void setHeader(Header header) {
+this.header = header;
+  }
+
+  public List getDefinitions() {
+return definitions;
+  }
+
+  public void setDefinitions(List definitions) {
+this.definitions = definitions;
+  }
+
+  public void visit(final DocumentVisitor visitor) throws IOException {
+Preconditions.checkNotNull(visitor, "the visitor must not be null!");
+
+for (Definition definition : definitions) {
+  if (visitor.accept(definition)) {
+definition.visit(visitor);
+  }
+}
+  }
+
+  /** Gets Avro {@link Schema} for the object. */
+  public Schema getSchema() {
+return ReflectData.get().getSchema(Document.class);
+  }
+
+  /** Gets {@link Document} as a {@link GenericRecord}. */
+  public GenericRecord getAsGenericRecord() {
+GenericRecordBuilder genericRecordBuilder = new 
GenericRecordBuilder(this.getSchema());
+genericRecordBuilder.set("header", this.getHeader()).set("definitions", 
this.getDefinitions());
+
+return genericRecordBuilder.build();
+  }
+
+  /** Adds list of includes to {@link Document#header}. */
+  public void addIncludes(List includes) {
+checkNotNull(includes, "includes");
+List currentIncludes = new 
ArrayList<>(this.getHeader().getIncludes());
+currentIncludes.addAll(includes);
+

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=361126=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361126
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:30
Start Date: 17/Dec/19 21:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9772: [BEAM-1440] 
Create a BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361126)
Time Spent: 19h 20m  (was: 19h 10m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361124=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361124
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:25
Start Date: 17/Dec/19 21:25
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566755651
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361124)
Time Spent: 1h  (was: 50m)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361123
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:24
Start Date: 17/Dec/19 21:24
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250
 
 
   Run Python PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361123)
Time Spent: 50m  (was: 40m)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361122=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361122
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:24
Start Date: 17/Dec/19 21:24
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250
 
 
   Run Python PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361122)
Time Spent: 40m  (was: 0.5h)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8975) Add Thrift Parser for ThriftIO

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8975?focusedWorklogId=361120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361120
 ]

ASF GitHub Bot logged work on BEAM-8975:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:23
Start Date: 17/Dec/19 21:23
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10395: [BEAM-8975] Add 
Thrift parser for ThriftIO
URL: https://github.com/apache/beam/pull/10395#issuecomment-566755250
 
 
   Run Python PreCommit
   Run Portable_Python PreCommit
   Run Java PreCommit
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361120)
Time Spent: 0.5h  (was: 20m)

> Add Thrift Parser for ThriftIO
> --
>
> Key: BEAM-8975
> URL: https://issues.apache.org/jira/browse/BEAM-8975
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This ticket is related to 
> [BEAM-8561|https://issues.apache.org/jira/projects/BEAM/issues/BEAM-8561?filter=allissues].
>  As there are a large number of files to review for the 
> [PR|https://github.com/apache/beam/pull/10290] for ThriftIO this ticket will 
> serve as the tracker for the submission of a PR relating to the parser and 
> document model that will be used by ThriftIO. The aim is to reduce the number 
> of files submitted with each PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7065) Unable to use unlifted combine functions

2019-12-17 Thread Robert Burke (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Burke closed BEAM-7065.
--
Fix Version/s: Not applicable
   Resolution: Information Provided

Reading through the history, this just never got closed.

> Unable to use unlifted combine functions
> 
>
> Key: BEAM-7065
> URL: https://issues.apache.org/jira/browse/BEAM-7065
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Affects Versions: Not applicable
> Environment: Google Cloud Dataflow
>Reporter: Alexandre Thenorio
>Priority: Major
> Fix For: Not applicable
>
>
> I have tried running a simple example to calculate a running average or sum 
> using the `stats` package however it does not seems to work.
>  
> Here's a reproducer
>  
> {code:java}
> package main
> import (
> "context"
> "encoding/json"
> "flag"
> "fmt"
> "time"
> "cloud.google.com/go/pubsub"
> "github.com/apache/beam/sdks/go/pkg/beam"
> "github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio"
> "github.com/apache/beam/sdks/go/pkg/beam/log"
> "github.com/apache/beam/sdks/go/pkg/beam/options/gcpopts"
> "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
> "github.com/apache/beam/sdks/go/pkg/beam/util/pubsubx"
> "github.com/apache/beam/sdks/go/pkg/beam/x/beamx"
> "github.com/apache/beam/sdks/go/pkg/beam/x/debug"
> )
> var (
> input = flag.String("input", "iot-data", "Pubsub input topic.")
> )
> type sensor struct {
> name  string
> value int
> }
> var (
> data = []sensor{
> {name: "temperature", value: 24},
> {name: "humidity", value: 10},
> {name: "temperature", value: 20},
> {name: "temperature", value: 22},
> {name: "humidity", value: 14},
> {name: "humidity", value: 18},
> }
> )
> func main() {
> flag.Parse()
> beam.Init()
> ctx := context.Background()
> project := gcpopts.GetProject(ctx)
> log.Infof(ctx, "Publishing %v messages to: %v", len(data), *input)
> defer pubsubx.CleanupTopic(ctx, project, *input)
> sub, err := Publish(ctx, project, *input, data...)
> if err != nil {
> log.Fatal(ctx, err)
> }
> log.Infof(ctx, "Running streaming sensor data with subscription: %v", 
> sub.ID())
> p := beam.NewPipeline()
> s := p.Root()
> // Reads sensor data from pubsub
> // Returns PCollection<[]byte>
> col := pubsubio.Read(s, project, *input, 
> {Subscription: sub.ID()})
> // Transforms incoming bytes from pubsub to a string,int key value
> // where the key is the sensor name and the value is the sensor reading
> // Accepts PCollection<[]byte>
> // Returns PCollection>
> data := beam.ParDo(s, extractSensorData, col)
> // Calculate running average per sensor
> //
> // Accpets PCollection>
> // Returns PCollection>
> sum := stats.MeanPerKey(s, data)
> debug.Print(s, sum)
> if err := beamx.Run(context.Background(), p); err != nil {
> log.Exitf(ctx, "Failed to execute job: %v", err)
> }
> }
> func extractSensorData(msg []byte) (string, int) {
> ctx := context.Background()
> data := {}
> if err := json.Unmarshal(msg, data); err != nil {
> log.Fatal(ctx, err)
> }
> return data.name, data.value
> }
> func Publish(ctx context.Context, project, topic string, messages ...sensor) 
> (*pubsub.Subscription, error) {
> client, err := pubsub.NewClient(ctx, project)
> if err != nil {
> return nil, err
> }
> t, err := pubsubx.EnsureTopic(ctx, client, topic)
> if err != nil {
> return nil, err
> }
> sub, err := pubsubx.EnsureSubscription(ctx, client, topic, 
> fmt.Sprintf("%v.sub.%v", topic, time.Now().Unix()))
> if err != nil {
> return nil, err
> }
> for _, msg := range messages {
> s := {}
> bytes, err := json.Marshal(s)
> if err != nil {
> return nil, fmt.Errorf("failed to unmarshal '%v': %v", msg, err)
> }
> m := {
> Data: ([]byte)(bytes),
> // Attributes: ??
> }
> id, err := t.Publish(ctx, m).Get(ctx)
> if err != nil {
> return nil, fmt.Errorf("failed to publish '%v': %v", msg, err)
> }
> log.Infof(ctx, "Published %v with id: %v", msg, id)
> }
> return sub, nil
> }
> {code}
>  
> I ran this code in the following way
>  
> {noformat}
> go run . --project="" --runner dataflow  --staging_location 
> gs:///binaries/ --temp_location gs:///tmp/ 
> --region "europe-west1" 
> --worker_harness_container_image=alethenorio/beam-go:v2.11.0{noformat}
>  
>  
>  
> The code published to pubsub and then reads the messages and attempts to call 
> `stats.MeanPerKey` to create a 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361116=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361116
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:06
Start Date: 17/Dec/19 21:06
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10405: [BEAM-8335] 
Background caching job
URL: https://github.com/apache/beam/pull/10405#issuecomment-566749001
 
 
   R: @davidyan74 
   R: @rohdesamuel 
   PTAL.
   
   Adding Ahmet as the committer,
   R: @aaltay 
   
   Thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361116)
Time Spent: 48h 40m  (was: 48.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 48h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361108
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359019634
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import io.github.classgraph.ClassGraph;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.InstanceBuilder;
+
+/** Pipeline options dedicated to detecting classpath resources. */
+public interface PipelineResourcesOptions extends PipelineOptions {
+
+  @Description(
+  "The class of the pipeline resources detector factory that should be 
created and used to create "
+  + "the detector. If not set explicitly, a default class will be used 
to instantiate the factory.")
+  @Default.Class(ClasspathScanningResourcesDetectorFactory.class)
+  Class
+  getPipelineResourcesDetectorFactoryClass();
+
+  void setPipelineResourcesDetectorFactoryClass(
+  Class factoryClass);
+
+  @JsonIgnore
+  @Description(
+  "Instance of a pipeline resources detection algorithm. If not set 
explicitly, a default implementation will be used")
+  @Default.InstanceFactory(PipelineResourcesDetectorFactory.class)
 
 Review comment:
   its a minor hassle but please duplicate the description as a javadoc comment 
on the getter so that the javadoc has this information as well
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361108)
Time Spent: 15h  (was: 14h 50m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361114
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359026175
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetectorAbstractFactory.java
 ##
 @@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+/** Provides pipeline resources detection algorithm. */
+public interface PipelineResourcesDetectorAbstractFactory {
 
 Review comment:
   This isn't an abstract class but an interface, please rename to 
`PiplineResourcesDetectorFactory` and/or make it an inner interface of 
`PipelineResourcesDetector` and just call it `Factory` there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361114)
Time Spent: 15h 40m  (was: 15.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=36=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-36
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359023901
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesTest.java
 ##
 @@ -28,51 +32,44 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
-import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
 import org.junit.Rule;
 import org.junit.Test;
-import org.junit.rules.ExpectedException;
 import org.junit.rules.TemporaryFolder;
 import org.junit.runner.RunWith;
 import org.junit.runners.JUnit4;
-import org.mockito.Mockito;
 
 /** Tests for PipelineResources. */
 @RunWith(JUnit4.class)
 public class PipelineResourcesTest {
 
   @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder();
-  @Rule public transient ExpectedException thrown = ExpectedException.none();
 
   @Test
-  public void detectClassPathResourceWithFileResources() throws Exception {
+  public void testDetectsResourcesToStage() throws IOException {
 
 Review comment:
   Please re-add that the resources are detected in the order they appear as 
part of the URLClassLoader.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 36)
Time Spent: 15h 20m  (was: 15h 10m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361106
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359018486
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import io.github.classgraph.ClassGraph;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.InstanceBuilder;
+
+/** Pipeline options dedicated to detecting classpath resources. */
+public interface PipelineResourcesOptions extends PipelineOptions {
+
+  @Description(
+  "The class of the pipeline resources detector factory that should be 
created and used to create "
+  + "the detector. If not set explicitly, a default class will be used 
to instantiate the factory.")
+  @Default.Class(ClasspathScanningResourcesDetectorFactory.class)
+  Class
+  getPipelineResourcesDetectorFactoryClass();
+
+  void setPipelineResourcesDetectorFactoryClass(
+  Class factoryClass);
+
+  @JsonIgnore
+  @Description(
+  "Instance of a pipeline resources detection algorithm. If not set 
explicitly, a default implementation will be used")
+  @Default.InstanceFactory(PipelineResourcesDetectorFactory.class)
+  PipelineResourcesDetector getPipelineResourcesDetector();
+
+  void setPipelineResourcesDetector(PipelineResourcesDetector 
pipelineResourcesDetector);
+
+  class PipelineResourcesDetectorFactory implements 
DefaultValueFactory {
 
 Review comment:
   class comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361106)
Time Spent: 14h 50m  (was: 14h 40m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361110
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359019828
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import io.github.classgraph.ClassGraph;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.InstanceBuilder;
+
+/** Pipeline options dedicated to detecting classpath resources. */
+public interface PipelineResourcesOptions extends PipelineOptions {
+
+  @Description(
+  "The class of the pipeline resources detector factory that should be 
created and used to create "
+  + "the detector. If not set explicitly, a default class will be used 
to instantiate the factory.")
+  @Default.Class(ClasspathScanningResourcesDetectorFactory.class)
 
 Review comment:
   Add `@JsonIgnore`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361110)
Time Spent: 15h 10m  (was: 15h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361109
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359021220
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/resources/ClasspathScanningResourcesDetectorTest.java
 ##
 @@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.hamcrest.CoreMatchers.hasItem;
+import static org.hamcrest.CoreMatchers.hasItems;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.hamcrest.Matchers.not;
+import static org.junit.Assert.assertFalse;
+
+import io.github.classgraph.ClassGraph;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.List;
+import java.util.jar.JarOutputStream;
+import java.util.jar.Manifest;
+import java.util.stream.Collectors;
+import org.apache.beam.sdk.testing.RestoreSystemProperties;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+import org.mockito.Mockito;
+
+public class ClasspathScanningResourcesDetectorTest {
+
+  @Rule public transient TemporaryFolder tmpFolder = new TemporaryFolder();
+
+  @Rule public transient RestoreSystemProperties systemProperties = new 
RestoreSystemProperties();
+
+  private ClasspathScanningResourcesDetector detector;
+
+  private ClassLoader classLoader;
 
 Review comment:
   classLoader and detector are assigned but are not shared outside of the 
method body so please create them within the test instead as local variables. 
This would help with test readability.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361109)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361105
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359017865
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesDetector.java
 ##
 @@ -18,10 +18,10 @@
 package org.apache.beam.runners.core.construction.resources;
 
 import java.io.Serializable;
-import java.util.List;
+import java.util.stream.Stream;
 
 /** Interface for an algorithm detecting classpath resources for pipelines. */
 public interface PipelineResourcesDetector extends Serializable {
 
-  List detect(ClassLoader classLoader);
+  Stream detect(ClassLoader classLoader);
 
 Review comment:
   I would have also preferred list since stream can imply that it is 
infinitely long since there could be a generator function the person 
implements. Lists are ordered and have finite size.
   
   Also, set is the wrong abstraction since we want to maintain the classpath 
order and we also want to maintain duplicates.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361105)
Time Spent: 14h 40m  (was: 14.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361112
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359019575
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import io.github.classgraph.ClassGraph;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.InstanceBuilder;
+
+/** Pipeline options dedicated to detecting classpath resources. */
+public interface PipelineResourcesOptions extends PipelineOptions {
+
+  @Description(
+  "The class of the pipeline resources detector factory that should be 
created and used to create "
 
 Review comment:
   its a minor hassle but please duplicate the description as a javadoc comment 
on the getter so that the javadoc has this information as well
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361112)
Time Spent: 15.5h  (was: 15h 20m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361113=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361113
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359024833
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResources.java
 ##
 @@ -49,7 +50,19 @@
   ClassLoader classLoader, PipelineOptions options) {
 
 PipelineResourcesOptions artifactsRelatedOptions = 
options.as(PipelineResourcesOptions.class);
-return 
artifactsRelatedOptions.getPipelineResourcesDetector().detect(classLoader);
+return artifactsRelatedOptions
+.getPipelineResourcesDetector()
+.detect(classLoader)
+.filter(isStageable())
+.collect(Collectors.toList());
+  }
+
+  /**
+   * Returns a predicate for filtering all resources that are impossible to 
stage (like gradle
+   * wrapper jars).
+   */
+  private static Predicate isStageable() {
+return resourcePath -> !resourcePath.contains("gradle/wrapper");
 
 Review comment:
   Instead of blacklisting, why does gradle appear on the classpath in the 
first place?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361113)
Time Spent: 15h 40m  (was: 15.5h)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?focusedWorklogId=361107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361107
 ]

ASF GitHub Bot logged work on BEAM-5495:


Author: ASF GitHub Bot
Created on: 17/Dec/19 21:02
Start Date: 17/Dec/19 21:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10268: [BEAM-5495] 
PipelineResources algorithm is not working in most environments
URL: https://github.com/apache/beam/pull/10268#discussion_r359018417
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/resources/PipelineResourcesOptions.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction.resources;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import io.github.classgraph.ClassGraph;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.InstanceBuilder;
+
+/** Pipeline options dedicated to detecting classpath resources. */
+public interface PipelineResourcesOptions extends PipelineOptions {
+
+  @Description(
+  "The class of the pipeline resources detector factory that should be 
created and used to create "
+  + "the detector. If not set explicitly, a default class will be used 
to instantiate the factory.")
+  @Default.Class(ClasspathScanningResourcesDetectorFactory.class)
+  Class
+  getPipelineResourcesDetectorFactoryClass();
+
+  void setPipelineResourcesDetectorFactoryClass(
+  Class factoryClass);
+
+  @JsonIgnore
+  @Description(
+  "Instance of a pipeline resources detection algorithm. If not set 
explicitly, a default implementation will be used")
+  @Default.InstanceFactory(PipelineResourcesDetectorFactory.class)
+  PipelineResourcesDetector getPipelineResourcesDetector();
+
+  void setPipelineResourcesDetector(PipelineResourcesDetector 
pipelineResourcesDetector);
+
+  class PipelineResourcesDetectorFactory implements 
DefaultValueFactory {
+
+@Override
+public PipelineResourcesDetector create(PipelineOptions options) {
+  PipelineResourcesOptions resourcesOptions = 
options.as(PipelineResourcesOptions.class);
+
+  PipelineResourcesDetectorAbstractFactory resourcesToStage =
+  
InstanceBuilder.ofType(PipelineResourcesDetectorAbstractFactory.class)
+  
.fromClass(resourcesOptions.getPipelineResourcesDetectorFactoryClass())
+  .fromFactoryMethod("create")
+  .build();
+
+  return resourcesToStage.getPipelineResourcesDetector();
+}
+  }
+
+  class ClasspathScanningResourcesDetectorFactory
 
 Review comment:
   class comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361107)
Time Spent: 14h 50m  (was: 14h 40m)

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361104
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 17/Dec/19 20:57
Start Date: 17/Dec/19 20:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10405: [BEAM-8335] 
Background caching job
URL: https://github.com/apache/beam/pull/10405
 
 
   1. Added background caching job startup/cancel logic.
   2. Updated tests.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8963) Apache Beam Java GCP dependencies to catch up with GCP libraries-bom

2019-12-17 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8963:
--
Description: 
|| 
||org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.15.0||org.apache.beam:beam-sdks-java-io-google-cloud-platform:2.16.0||com.google.cloud:libraries-bom:1.0.0||com.google.cloud:libraries-bom:2.0.0||com.google.cloud:libraries-bom:2.5.0||com.google.cloud:libraries-bom:3.1.0||
|avalon-framework:avalon-framework|4.1.5|4.1.5| | | | |
|com.fasterxml.jackson.core:jackson-annotations|2.9.9|2.9.10| | | | |
|com.fasterxml.jackson.core:jackson-core|2.9.9|2.9.10| | | | |
|com.fasterxml.jackson.core:jackson-databind|2.9.9.3|2.9.10| | | | |
|com.github.jponge:lzma-java|1.3|1.3| | | | |
|com.google.android:android|1.5_r4|1.5_r4| | | | |
|com.google.api-client:google-api-client|1.27.0|1.27.0| | | | |
|com.google.api-client:google-api-client-jackson2|1.27.0|1.27.0| | | | |
|com.google.api-client:google-api-client-java6|1.27.0|1.27.0| | | | |
|com.google.api.grpc:grpc-google-cloud-asset-v1| | | |0.56.0|0.67.0|0.80.0|
|com.google.api.grpc:grpc-google-cloud-asset-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.80.0|
|com.google.api.grpc:grpc-google-cloud-asset-v1p2beta1| | | | | |0.80.0|
|com.google.api.grpc:grpc-google-cloud-automl-v1| | | | | |0.79.1|
|com.google.api.grpc:grpc-google-cloud-automl-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.79.1|
|com.google.api.grpc:grpc-google-cloud-bigquerydatatransfer-v1| | 
|0.49.0|0.56.0|0.67.0|0.84.0|
|com.google.api.grpc:grpc-google-cloud-bigquerystorage-v1beta1|0.44.0|0.44.0|0.49.0|0.56.0|0.67.0|0.84.0|
|com.google.api.grpc:grpc-google-cloud-bigtable-admin-v2|0.38.0|0.38.0|0.49.0|0.56.0|0.67.0|1.7.1|
|com.google.api.grpc:grpc-google-cloud-bigtable-v2|0.44.0|0.38.0|0.49.0|0.56.0|0.67.0|1.7.1|
|com.google.api.grpc:grpc-google-cloud-billingbudgets-v1beta1| | | | | |0.1.1|
|com.google.api.grpc:grpc-google-cloud-build-v1| | | | | |0.1.0|
|com.google.api.grpc:grpc-google-cloud-container-v1| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-containeranalysis-v1| | | | 
|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-containeranalysis-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-datacatalog-v1beta1| | | 
|0.4.0-alpha|0.15.0-alpha|0.29.0-alpha|
|com.google.api.grpc:grpc-google-cloud-datalabeling-v1beta1| | | 
|0.56.0|0.67.0|0.81.1|
|com.google.api.grpc:grpc-google-cloud-dataproc-v1| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-dataproc-v1beta2| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-dialogflow-v2| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-dialogflow-v2beta1| | 
|0.49.0|0.56.0|0.67.0|0.83.0|
|com.google.api.grpc:grpc-google-cloud-dlp-v2| | |0.49.0|0.56.0|0.67.0|0.81.0|
|com.google.api.grpc:grpc-google-cloud-error-reporting-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.84.1|
|com.google.api.grpc:grpc-google-cloud-firestore-admin-v1| | | 
|1.3.0|1.14.0|1.32.0|
|com.google.api.grpc:grpc-google-cloud-firestore-v1| | 
|0.49.0|1.3.0|1.14.0|1.32.0|
|com.google.api.grpc:grpc-google-cloud-firestore-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.85.0|
|com.google.api.grpc:grpc-google-cloud-gameservices-v1alpha| | | | 
|0.3.0|0.18.0|
|com.google.api.grpc:grpc-google-cloud-iamcredentials-v1| | 
|0.11.0-alpha|0.18.0-alpha|0.29.0-alpha|0.43.0-alpha|
|com.google.api.grpc:grpc-google-cloud-iot-v1| | |0.49.0|0.56.0|0.67.0|0.81.0|
|com.google.api.grpc:grpc-google-cloud-kms-v1| | |0.49.0|0.56.0|0.67.0|0.82.1|
|com.google.api.grpc:grpc-google-cloud-language-v1| | 
|1.48.0|1.55.0|1.66.0|1.81.0|
|com.google.api.grpc:grpc-google-cloud-language-v1beta2| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-logging-v2| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-monitoring-v3| | 
|1.48.0|1.55.0|1.66.0|1.81.0|
|com.google.api.grpc:grpc-google-cloud-os-login-v1| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-phishingprotection-v1beta1| | | 
|0.2.0|0.13.0|0.28.0|
|com.google.api.grpc:grpc-google-cloud-pubsub-v1|1.43.0|1.43.0|1.48.0|1.55.0|1.66.0|1.84.0|
|com.google.api.grpc:grpc-google-cloud-recaptchaenterprise-v1beta1| | | 
|0.2.0|0.13.0|0.28.0|
|com.google.api.grpc:grpc-google-cloud-recommender-v1beta1| | | | | |0.2.0|
|com.google.api.grpc:grpc-google-cloud-redis-v1| | |0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-redis-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-scheduler-v1| | 
|0.49.0|0.56.0|1.7.0|1.22.0|
|com.google.api.grpc:grpc-google-cloud-scheduler-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-securitycenter-v1| | 
|0.49.0|0.56.0|0.67.0|0.82.0|
|com.google.api.grpc:grpc-google-cloud-securitycenter-v1beta1| | 
|0.49.0|0.56.0|0.67.0|0.82.0|

[jira] [Created] (BEAM-8987) Support reverse/descending order in SortValues

2019-12-17 Thread Neville Li (Jira)
Neville Li created BEAM-8987:


 Summary: Support reverse/descending order in SortValues
 Key: BEAM-8987
 URL: https://issues.apache.org/jira/browse/BEAM-8987
 Project: Beam
  Issue Type: New Feature
  Components: extensions-java-sorter
Affects Versions: 2.16.0
Reporter: Neville Li


It'll be nice to support descending order in {{SortValues}} but this is related 
to BEAM-8986.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8986) SortValues may not work correct for numerical types

2019-12-17 Thread Neville Li (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Li updated BEAM-8986:
-
Description: {{SortValues}} transform uses lexicographical on encoded 
binaries and may not work correctly if the encoding is inconsistent with native 
comparison, e.g. negative integers.  (was: `SortValues` transform uses 
lexicographical on encoded binaries and may not work correctly if the encoding 
is inconsistent with native comparison, e.g. negative integers.)

> SortValues may not work correct for numerical types
> ---
>
> Key: BEAM-8986
> URL: https://issues.apache.org/jira/browse/BEAM-8986
> Project: Beam
>  Issue Type: Bug
>  Components: extensions-java-sorter
>Affects Versions: 2.16.0
>Reporter: Neville Li
>Priority: Minor
>
> {{SortValues}} transform uses lexicographical on encoded binaries and may not 
> work correctly if the encoding is inconsistent with native comparison, e.g. 
> negative integers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8986) SortValues may not work correct for numerical types

2019-12-17 Thread Neville Li (Jira)
Neville Li created BEAM-8986:


 Summary: SortValues may not work correct for numerical types
 Key: BEAM-8986
 URL: https://issues.apache.org/jira/browse/BEAM-8986
 Project: Beam
  Issue Type: Bug
  Components: extensions-java-sorter
Affects Versions: 2.16.0
Reporter: Neville Li


`SortValues` transform uses lexicographical on encoded binaries and may not 
work correctly if the encoding is inconsistent with native comparison, e.g. 
negative integers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8985) SortValues should fail if SecondaryKey coder is not deterministic

2019-12-17 Thread Neville Li (Jira)
Neville Li created BEAM-8985:


 Summary: SortValues should fail if SecondaryKey coder is not 
deterministic
 Key: BEAM-8985
 URL: https://issues.apache.org/jira/browse/BEAM-8985
 Project: Beam
  Issue Type: Bug
  Components: extensions-java-sorter
Affects Versions: 2.16.0
Reporter: Neville Li


{{SortValues}} transform uses lexicographical sorting on encoded binaries and 
might not work correctly if the coder is not deterministic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361089=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361089
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:54
Start Date: 17/Dec/19 19:54
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10404: [BEAM-8977] Resolve 
test flakiness
URL: https://github.com/apache/beam/pull/10404#issuecomment-566723183
 
 
   R: @tvalentyn 
   PTAL, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361089)
Time Spent: 20m  (was: 10m)

> apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display
>  is flaky
> 
>
> Key: BEAM-8977
> URL: https://issues.apache.org/jira/browse/BEAM-8977
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Sample failure: 
>  
> [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/]
> Error Message
>  IndexError: list index out of range
> Stacktrace
>  self = 
>   testMethod=test_dynamic_plotting_update_same_display>
>  mocked_display_facets =  id='139889868386376'>
> @patch('apache_beam.runners.interactive.display.pcoll_visualization'
>  '.PCollectionVisualization.display_facets')
>  def test_dynamic_plotting_update_same_display(self,
>  mocked_display_facets):
>  fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
>  ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
>  # Starts async dynamic plotting that never ends in this test.
>  h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001)
>  # Blocking so the above async task can execute some iterations.
>  time.sleep(1)
>  # The first iteration doesn't provide updating_pv to display_facets.
>  _, first_kwargs = mocked_display_facets.call_args_list[0]
>  self.assertEqual(first_kwargs, {})
>  # The following iterations use the same updating_pv to display_facets and so
>  # on.
>  > _, second_kwargs = mocked_display_facets.call_args_list[1]
>  E IndexError: list index out of range
> apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: 
> IndexError
> Standard Output
> 
>  Standard Error
>  WARNING:apache_beam.runners.interactive.interactive_environment:You cannot 
> use Interactive Beam features when you are not in an interactive environment 
> such as a Jupyter notebook or ipython terminal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361088=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361088
 ]

ASF GitHub Bot logged work on BEAM-8977:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:52
Start Date: 17/Dec/19 19:52
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10404: [BEAM-8977] 
Resolve test flakiness
URL: https://github.com/apache/beam/pull/10404
 
 
   1. Removed test logic depending on execution of asynchronous tasks since
   there is no control of them in a testing environment.
   2. Replaced the dynamic plotting tests with tests directly/indirectly
   invoking underlying logic of the asynchronous task.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=361081=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361081
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:39
Start Date: 17/Dec/19 19:39
Worklog Time Spent: 10m 
  Work Description: gsteelman commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-566717207
 
 
   > Thanks @gsteelman. We would love to have the sample Thrift schema and 
anything else we can utilize for testing!
   
   Who should I have it shared with? Also, https://thrift.apache.org/test/ and 
this schema 
https://raw.githubusercontent.com/apache/thrift/master/test/ThriftTest.thrift 
which might provide additional test cases. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361081)
Time Spent: 6h 50m  (was: 6h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8984) Spark uber jar job server: e2e test

2019-12-17 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8984:
--
Summary: Spark uber jar job server: e2e test  (was: Spark uber jar job 
server: integration test)

> Spark uber jar job server: e2e test
> ---
>
> Key: BEAM-8984
> URL: https://issues.apache.org/jira/browse/BEAM-8984
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> There should be an integration test that tests submitting a simple pipeline 
> (with artifacts) to a cluster via the Spark uber jar job server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8984) Spark uber jar job server: e2e test

2019-12-17 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8984:
--
Description: There should be an e2e test that tests submitting a simple 
pipeline (with artifacts) to a cluster via the Spark uber jar job server.  
(was: There should be an integration test that tests submitting a simple 
pipeline (with artifacts) to a cluster via the Spark uber jar job server.)

> Spark uber jar job server: e2e test
> ---
>
> Key: BEAM-8984
> URL: https://issues.apache.org/jira/browse/BEAM-8984
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> There should be an e2e test that tests submitting a simple pipeline (with 
> artifacts) to a cluster via the Spark uber jar job server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8984) Spark uber jar job server: integration test

2019-12-17 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8984:
-

 Summary: Spark uber jar job server: integration test
 Key: BEAM-8984
 URL: https://issues.apache.org/jira/browse/BEAM-8984
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


There should be an integration test that tests submitting a simple pipeline 
(with artifacts) to a cluster via the Spark uber jar job server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8984) Spark uber jar job server: integration test

2019-12-17 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8984:
--
Status: Open  (was: Triage Needed)

> Spark uber jar job server: integration test
> ---
>
> Key: BEAM-8984
> URL: https://issues.apache.org/jira/browse/BEAM-8984
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> There should be an integration test that tests submitting a simple pipeline 
> (with artifacts) to a cluster via the Spark uber jar job server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8983) Spark uber jar job server: query exceptions from master

2019-12-17 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8983:
--
Status: Open  (was: Triage Needed)

> Spark uber jar job server: query exceptions from master
> ---
>
> Key: BEAM-8983
> URL: https://issues.apache.org/jira/browse/BEAM-8983
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> As far as I know, the Spark REST API does not return exceptions from the 
> cluster after a jar is actually run. While these exceptions can be viewed in 
> Spark's web UI, ideally they would also be visible in Beam's output.
> To do this, we will need to find a REST endpoint that does return those 
> exceptions, and then map the submissionId to its corresponding job id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8983) Spark uber jar job server: query exceptions from master

2019-12-17 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8983:
-

 Summary: Spark uber jar job server: query exceptions from master
 Key: BEAM-8983
 URL: https://issues.apache.org/jira/browse/BEAM-8983
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


As far as I know, the Spark REST API does not return exceptions from the 
cluster after a jar is actually run. While these exceptions can be viewed in 
Spark's web UI, ideally they would also be visible in Beam's output.

To do this, we will need to find a REST endpoint that does return those 
exceptions, and then map the submissionId to its corresponding job id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=361076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361076
 ]

ASF GitHub Bot logged work on BEAM-8671:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:29
Start Date: 17/Dec/19 19:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10125: [BEAM-8671] Added 
ParDo test running on Python 3.7
URL: https://github.com/apache/beam/pull/10125#issuecomment-566713165
 
 
   > The test fails due to https://issues.apache.org/jira/browse/BEAM-8979. 
I'll push forward with the PR as soon as the bug is resolved.
   
   Is the Job id included in the logs now? Thank you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361076)
Time Spent: 11h 20m  (was: 11h 10m)

> Migrate Python version to 3.7
> -
>
> Key: BEAM-8671
> URL: https://issues.apache.org/jira/browse/BEAM-8671
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Currently, load tests run on Python 2.7. We should migrate to 3.7



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8933) BigQuery IO should support read/write in Arrow format

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8933?focusedWorklogId=361075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361075
 ]

ASF GitHub Bot logged work on BEAM-8933:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:21
Start Date: 17/Dec/19 19:21
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10384: [WIP] 
[BEAM-8933] Utilities for converting Arrow schemas and reading Arrow batches as 
Rows
URL: https://github.com/apache/beam/pull/10384#issuecomment-566709615
 
 
   > I suppose we would need to have a separate BQ IO with Arrow support in the 
extension package then?
   
   Oh a better option would be for `:sdks:java:io:google-cloud-platform` to 
just include a dependency on `:sdks:java:extensions:arrow`, that doesn't seem 
too onerous
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361075)
Time Spent: 3h 20m  (was: 3h 10m)

> BigQuery IO should support read/write in Arrow format
> -
>
> Key: BEAM-8933
> URL: https://issues.apache.org/jira/browse/BEAM-8933
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> As of right now BigQuery uses Avro format for reading and writing.
> We should add a config to BigQueryIO to specify which format to use: Arrow or 
> Avro (with Avro as default).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8810) Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8810?focusedWorklogId=361067=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361067
 ]

ASF GitHub Bot logged work on BEAM-8810:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:06
Start Date: 17/Dec/19 19:06
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #10311: [BEAM-8810] 
Detect stuck commits in StreamingDataflowWorker
URL: https://github.com/apache/beam/pull/10311
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361067)
Time Spent: 2h  (was: 1h 50m)

> Dataflow runner - Work stuck in state COMMITTING with streaming commit rpcs
> ---
>
> Key: BEAM-8810
> URL: https://issues.apache.org/jira/browse/BEAM-8810
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In several pipelines using streaming engine and thus the streaming commit 
> rpcs, work became stuck in state COMMITTING indefinitely.  Such stuckness 
> coincided with repeated streaming rpc failures.
> The status page shows that the key has work in state COMMITTING, and has 1 
> queued work item.
> There is a single active commit stream, with 0 pending requests.
> The stream could exist past the stream deadline because the StreamCache only 
> closes stream due to the deadline when a stream is retrieved, which only 
> occurs if there are other commits.  Since the pipeline is stuck due to this 
> event, there are no other commits.
> It seems therefore there is some race on the commitStream between onNewStream 
> and commitWork that either prevents work from being retried, an exception 
> that triggers between when the pending request is removed and the callback is 
> called, or some potential corruption of the activeWork data structure. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361066=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361066
 ]

ASF GitHub Bot logged work on BEAM-8944:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:03
Start Date: 17/Dec/19 19:03
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to 
use single thread in py sdk bundle progress report
URL: https://github.com/apache/beam/pull/10387#issuecomment-566321500
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361066)
Time Spent: 2h 20m  (was: 2h 10m)

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor

2019-12-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8944?focusedWorklogId=361065=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361065
 ]

ASF GitHub Bot logged work on BEAM-8944:


Author: ASF GitHub Bot
Created on: 17/Dec/19 19:03
Start Date: 17/Dec/19 19:03
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10387: [BEAM-8944] Change to 
use single thread in py sdk bundle progress report
URL: https://github.com/apache/beam/pull/10387#issuecomment-566702609
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 361065)
Time Spent: 2h 10m  (was: 2h)

> Python SDK harness performance degradation with UnboundedThreadPoolExecutor
> ---
>
> Key: BEAM-8944
> URL: https://issues.apache.org/jira/browse/BEAM-8944
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.18.0
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Blocker
> Attachments: profiling.png, profiling_one_thread.png, 
> profiling_twelve_threads.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We are seeing a performance degradation for python streaming word count load 
> tests.
>  
> After some investigation, it appears to be caused by swapping the original 
> ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is 
> that python performance is worse with more threads on cpu-bounded tasks.
>  
> A simple test for comparing the multiple thread pool executor performance:
>  
> {code:python}
> def test_performance(self):
>    def run_perf(executor):
>      total_number = 100
>      q = queue.Queue()
>     def task(number):
>        hash(number)
>        q.put(number + 200)
>        return number
>     t = time.time()
>      count = 0
>      for i in range(200):
>        q.put(i)
>     while count < total_number:
>        executor.submit(task, q.get(block=True))
>        count += 1
>      print('%s uses %s' % (executor, time.time() - t))
>    with UnboundedThreadPoolExecutor() as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=1) as executor:
>      run_perf(executor)
>    with futures.ThreadPoolExecutor(max_workers=12) as executor:
>      run_perf(executor)
> {code}
> Results:
>  0x7fab400dbe50> uses 268.160675049
>   uses 
> 79.904583931
>   uses 
> 191.179054976
>  ```
> Profiling:
> UnboundedThreadPoolExecutor:
>  !profiling.png! 
> 1 Thread ThreadPoolExecutor:
>  !profiling_one_thread.png! 
> 12 Threads ThreadPoolExecutor:
>  !profiling_twelve_threads.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8980) Running GroupByKeyLoadTest on Portable Flink fails

2019-12-17 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998487#comment-16998487
 ] 

Ankur Goenka commented on BEAM-8980:


The stack trace is generally not the root cause as it is printed on grpc 
termination.
Please link the jenkins/gradle link for the job for complete logs.

> Running GroupByKeyLoadTest on Portable Flink fails
> --
>
> Key: BEAM-8980
> URL: https://issues.apache.org/jira/browse/BEAM-8980
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Priority: Major
>
> When running a GBK Load test using Java harness image and JobServer image 
> generated from master, the load test fails with a cryptic exception:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Invalid job state: 
> FAILED.
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.JobFailure.handleFailure(JobFailure.java:55)
> 11:45:31  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:106)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.run(CombineLoadTest.java:66)
> 11:45:31  at 
> org.apache.beam.sdk.loadtests.CombineLoadTest.main(CombineLoadTest.java:169)
> {code}
>  
> After some investigation, I found a stacktrace of the error:
> {code:java}
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already 
> cancelledorg.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.flush(BeamFnDataSizeBasedBufferingOutboundObserver.java:90)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.accept(BeamFnDataSizeBasedBufferingOutboundObserver.java:102)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.processElements(FlinkExecutableStageFunction.java:278)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:201)
>  at 
> org.apache.flink.runtime.operators.MapPartitionDriver.run(MapPartitionDriver.java:103)
>  at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:504) at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369) at 
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
> java.lang.Thread.run(Thread.java:748) Suppressed: 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: 
> CANCELLED: call already cancelled at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
>  at 
> org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:339)
>  at 
> org.apache.beam.sdk.fn.stream.DirectStreamObserver.onNext(DirectStreamObserver.java:98)
>  at 
> org.apache.beam.sdk.fn.data.BeamFnDataSizeBasedBufferingOutboundObserver.close(BeamFnDataSizeBasedBufferingOutboundObserver.java:84)
>  at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:298)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.$closeResource(FlinkExecutableStageFunction.java:202)
>  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.mapPartition(FlinkExecutableStageFunction.java:202)
>  ... 6 more Suppressed: java.lang.IllegalStateException: Processing bundle 
> failed, TODO: [BEAM-3962] abort bundle. at 
> org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:320)
>  ... 8 more
> {code}
> It seems that the core issue is an IllegalStateException thrown from 
> SdkHarnessClient.java:320, related to BEAM-3962.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >