date:20191104

[jira] [Work logged] (BEAM-7303) Move Portable Runner and other of reference runner.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7303?focusedWorklogId=338579=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338579
 ]

ASF GitHub Bot logged work on BEAM-7303:


Author: ASF GitHub Bot
Created on: 05/Nov/19 07:36
Start Date: 05/Nov/19 07:36
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #9936: [BEAM-7303] Move 
PortableRunner from runners.reference to java.sdks.portability package
URL: https://github.com/apache/beam/pull/9936#issuecomment-549699876
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338579)
Time Spent: 1h 50m  (was: 1h 40m)

> Move Portable Runner and other of reference runner.
> ---
>
> Key: BEAM-7303
> URL: https://issues.apache.org/jira/browse/BEAM-7303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> PortableRunner is used by all Flink, Spark ... . 
> We should move it out of Reference Runner package to stream line the 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=338574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338574
 ]

ASF GitHub Bot logged work on BEAM-7951:


Author: ASF GitHub Bot
Created on: 05/Nov/19 07:24
Start Date: 05/Nov/19 07:24
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9979: [BEAM-7951] 
Allow runner to configure customization WindowedValue coder.
URL: https://github.com/apache/beam/pull/9979#issuecomment-549696672
 
 
   R: @lukecwik @mxm 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338574)
Time Spent: 40m  (was: 0.5h)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338569
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 05/Nov/19 07:21
Start Date: 05/Nov/19 07:21
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9987: 
[BEAM-8504] Fix a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338569)
Time Spent: 1.5h  (was: 1h 20m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338493
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 05/Nov/19 05:08
Start Date: 05/Nov/19 05:08
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-549664815
 
 
   run java precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338493)
Time Spent: 1.5h  (was: 1h 20m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338492
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 05/Nov/19 05:04
Start Date: 05/Nov/19 05:04
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-549664113
 
 
   Trying to repro the Jenkins failure locally so I can iterate quickly. So far 
I cannot reproduce it. I suspect the Gradle configuration may have some 
inadvertent dependency on context.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338492)
Time Spent: 1h 20m  (was: 1h 10m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=338466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338466
 ]

ASF GitHub Bot logged work on BEAM-7951:


Author: ASF GitHub Bot
Created on: 05/Nov/19 02:53
Start Date: 05/Nov/19 02:53
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9979: [BEAM-7951] 
Allow runner to configure customization WindowedValue coder.
URL: https://github.com/apache/beam/pull/9979#issuecomment-549334516
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338466)
Time Spent: 0.5h  (was: 20m)

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338453
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:59
Start Date: 05/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9988: [BEAM-8294] 
reuse Spark context in portableValidatesRunner test
URL: https://github.com/apache/beam/pull/9988
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338453)
Time Spent: 40m  (was: 0.5h)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338452
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:59
Start Date: 05/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9988: [BEAM-8294] reuse Spark 
context in portableValidatesRunner test
URL: https://github.com/apache/beam/pull/9988#issuecomment-549629395
 
 
   Looks like this still causes memory errors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338452)
Time Spent: 0.5h  (was: 20m)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=338450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338450
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:54
Start Date: 05/Nov/19 01:54
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9955: [BEAM-2572] 
Python SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-549628347
 
 
   R: @pabloem will you be able to review ?
   
   Also cc: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338450)
Time Spent: 50m  (was: 40m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8556) Portable Spark timer tests failing

2019-11-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8556:
--
Status: Open  (was: Triage Needed)

> Portable Spark timer tests failing
> --
>
> Key: BEAM-8556
> URL: https://issues.apache.org/jira/browse/BEAM-8556
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> Being covered up by BEAM-8294
> *15:31:29* org.apache.beam.sdk.transforms.ParDoTest$TimerTests > 
> testEventTimeTimerOrderingWithCreate FAILED*15:31:29* 
> java.lang.AssertionError at ParDoTest.java:3641*15:32:12* *15:32:12* 
> org.apache.beam.sdk.transforms.ParDoTest$TimerTests > 
> testTwoTimersSettingEachOtherWithCreateAsInput FAILED*15:32:12* 
> java.lang.AssertionError at ParDoTest.java:3729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8555) Document Dataflow template support in Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8555?focusedWorklogId=338449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338449
 ]

ASF GitHub Bot logged work on BEAM-8555:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:48
Start Date: 05/Nov/19 01:48
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9989: [BEAM-8555] 
Update Python streaming limitations.
URL: https://github.com/apache/beam/pull/9989
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338449)
Time Spent: 0.5h  (was: 20m)

> Document Dataflow template support in Python SDK
> 
>
> Key: BEAM-8555
> URL: https://issues.apache.org/jira/browse/BEAM-8555
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Valentyn Tymofieiev
>Assignee: Rose Nguyen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://beam.apache.org/documentation/sdks/python-streaming/#unsupported-features]
>  mentions that DF runner offering does not support Python templates.  Since 
> templates are supported, I'll remove these lines, but also opening this issue 
> in case we need more documentation for templates on the website.
> cc: [~dcavazos]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8556) Portable Spark timer tests failing

2019-11-04 Thread Kyle Weaver (Jira)

Kyle Weaver created BEAM-8556:
-

 Summary: Portable Spark timer tests failing
 Key: BEAM-8556
 URL: https://issues.apache.org/jira/browse/BEAM-8556
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Being covered up by BEAM-8294

*15:31:29* org.apache.beam.sdk.transforms.ParDoTest$TimerTests > 
testEventTimeTimerOrderingWithCreate FAILED*15:31:29* 
java.lang.AssertionError at ParDoTest.java:3641*15:32:12* *15:32:12* 
org.apache.beam.sdk.transforms.ParDoTest$TimerTests > 
testTwoTimersSettingEachOtherWithCreateAsInput FAILED*15:32:12* 
java.lang.AssertionError at ParDoTest.java:3729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338447
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:41
Start Date: 05/Nov/19 01:41
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9960: [BEAM-3288] Guard 
against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#issuecomment-549625617
 
 
   Doc is https://s.apache.org/finishing-triggers-drop-data. I will edit the 
error message to link to it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338447)
Time Spent: 1h 10m  (was: 1h)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a fire-once trigger fires everything downstream only once, 
> but that expectation appears impossible to satisfy safely.
> - Make the continuation trigger of some triggers be the "invalid" trigger, 
> i.e. require the user to set it explicitly: there's in general no good and 
> safe way to infer what a trigger on a second GBK "truly" should be, based on 
> the trigger of the PCollection input into a first GBK. This is especially 
> true for terminating triggers.
> - Prohibit top-level terminating triggers entirely. This will ensure that the 
> only data that ever gets dropped is "droppably late" data.
> CC: [~bchambers] [~kenn] [~tgroh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=338445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338445
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:36
Start Date: 05/Nov/19 01:36
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9955: [BEAM-2572] Python SDK 
S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-549624651
 
 
   @MattMorgis thank you for the contribution. The general path of adding [aws] 
as an extra package sounds reasonable.
   
   @chamikaramj could help with reviews or find a person to review.
   @yifanzou could help with CI related questions.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338445)
Time Spent: 40m  (was: 0.5h)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8555) Document Dataflow template support in Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8555?focusedWorklogId=338443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338443
 ]

ASF GitHub Bot logged work on BEAM-8555:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:31
Start Date: 05/Nov/19 01:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9989: [BEAM-8555] Update 
Python streaming limitations.
URL: https://github.com/apache/beam/pull/9989#issuecomment-549623524
 
 
   r: @aaltay cc: @rosetn 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338443)
Time Spent: 20m  (was: 10m)

> Document Dataflow template support in Python SDK
> 
>
> Key: BEAM-8555
> URL: https://issues.apache.org/jira/browse/BEAM-8555
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Valentyn Tymofieiev
>Assignee: Rose Nguyen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://beam.apache.org/documentation/sdks/python-streaming/#unsupported-features]
>  mentions that DF runner offering does not support Python templates.  Since 
> templates are supported, I'll remove these lines, but also opening this issue 
> in case we need more documentation for templates on the website.
> cc: [~dcavazos]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8555) Document Dataflow template support in Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8555?focusedWorklogId=338440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338440
 ]

ASF GitHub Bot logged work on BEAM-8555:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:30
Start Date: 05/Nov/19 01:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9989: [BEAM-8555] 
Update Python streaming limitations.
URL: https://github.com/apache/beam/pull/9989
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338438
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:30
Start Date: 05/Nov/19 01:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9988: [BEAM-8294] 
reuse Spark context
URL: https://github.com/apache/beam/pull/9988
 
 
   In #8444, I set reuse to false because it seemed to partially alleviate 
memory constraints. However, I later discovered that #8722 was the main culprit 
behind the memory issues. I want to see if reusing the Spark context is okay 
now, and if helps the tests run faster.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8294) Spark portable validates runner tests timing out

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8294?focusedWorklogId=338439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338439
 ]

ASF GitHub Bot logged work on BEAM-8294:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:30
Start Date: 05/Nov/19 01:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9988: [BEAM-8294] reuse Spark 
context
URL: https://github.com/apache/beam/pull/9988#issuecomment-549623341
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338439)
Time Spent: 20m  (was: 10m)

> Spark portable validates runner tests timing out
> 
>
> Key: BEAM-8294
> URL: https://issues.apache.org/jira/browse/BEAM-8294
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark, test-failures, testing
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing, portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This postcommit has been timing out for 11 days. 
> [https://github.com/apache/beam/pull/9095] has been merged for about 11 days. 
> Coincidence? I think NOT! .. .Seriously, though, I wonder what about the SDK 
> worker management stack caused this to slow down.
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=338437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338437
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:29
Start Date: 05/Nov/19 01:29
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9986: Merge pull request 
#9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version 0.15.1
URL: https://github.com/apache/beam/pull/9986#issuecomment-549623146
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338437)
Time Spent: 3h 50m  (was: 3h 40m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338436
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:29
Start Date: 05/Nov/19 01:29
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-549623033
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338436)
Time Spent: 1h 20m  (was: 1h 10m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8555) Document Dataflow template support in Python SDK

2019-11-04 Thread Valentyn Tymofieiev (Jira)

Valentyn Tymofieiev created BEAM-8555:
-

 Summary: Document Dataflow template support in Python SDK
 Key: BEAM-8555
 URL: https://issues.apache.org/jira/browse/BEAM-8555
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Valentyn Tymofieiev
Assignee: Rose Nguyen


[https://beam.apache.org/documentation/sdks/python-streaming/#unsupported-features]
 mentions that DF runner offering does not support Python templates.  Since 
templates are supported, I'll remove these lines, but also opening this issue 
in case we need more documentation for templates on the website.

cc: [~dcavazos]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8220) Use released docker images by default

2019-11-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8220.
---
Fix Version/s: Not applicable
   Resolution: Fixed

This was resolved at some point.

> Use released docker images by default
> -
>
> Key: BEAM-8220
> URL: https://issues.apache.org/jira/browse/BEAM-8220
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>
> Now that we have official released docker images [1], we should consider 
> pulling them as the default instead of requiring the user to first build 
> their own docker images, which is more cumbersome and error-prone. Also, all 
> documentation would need to be updated accordingly.
>  [1] [https://hub.docker.com/u/apachebeam]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8516) sdist build fails when artifacts from different versions are present

2019-11-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8516.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> sdist build fails when artifacts from different versions are present
> 
>
> Key: BEAM-8516
> URL: https://issues.apache.org/jira/browse/BEAM-8516
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
> Fix For: 2.18.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I was developing on some Beam branch, then I switched git branches to a 
> different branch. These two branches had different python_sdk_version 
> properties set [1], say version A and version B. When I run Gradle task 
> :sdks:python:sdist, it will fail with error:
> Expected directory ... to contain exactly one file, however, it contains more 
> than one file.
> This happens because the tarball labeled version A is still present, along 
> with the new tarball for version B. I can get around this by cleaning the 
> build directory, but this is an extra step and is not immediately obvious. It 
> would be better if it just worked.
> [1] [https://github.com/ibzib/beam/blob/default-region/gradle.properties#L27]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-11-04 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8254.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> (Java SDK) Add workerRegion and workerZone options
> --
>
> Key: BEAM-8254
> URL: https://issues.apache.org/jira/browse/BEAM-8254
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=338431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338431
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 05/Nov/19 01:06
Start Date: 05/Nov/19 01:06
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9974: [BEAM-8472] Get default 
GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974#issuecomment-549617602
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338431)
Time Spent: 2h 10m  (was: 2h)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-3288) Guard against unsafe triggers at construction time

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-3288?focusedWorklogId=338428=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338428
 ]

ASF GitHub Bot logged work on BEAM-3288:


Author: ASF GitHub Bot
Created on: 05/Nov/19 00:58
Start Date: 05/Nov/19 00:58
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9960: 
[BEAM-3288] Guard against unsafe triggers at construction time
URL: https://github.com/apache/beam/pull/9960#discussion_r342342429
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupByKey.java
 ##
 @@ -162,6 +164,44 @@ public static void applicableTo(PCollection input) {
   throw new IllegalStateException(
   "GroupByKey must have a valid Window merge function.  " + "Invalid 
because: " + cause);
 }
+
+// Validate that the trigger does not finish before garbage collection time
+if (!triggerIsSafe(windowingStrategy)) {
+  throw new IllegalArgumentException(
+  String.format(
+  "Unsafe trigger may finish and drop remaining data: %s",
 
 Review comment:
   Interesting idea. Yes, I can make a world-readable doc that is a short 
summary of the issue and why this is disabled and how to contact us for those 
rare cases where someone wants it back. We could give them a pipeline option. 
But I would wait until someone asks for it, to avoid implementing something 
dangerous.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338428)
Time Spent: 1h  (was: 50m)

> Guard against unsafe triggers at construction time 
> ---
>
> Key: BEAM-3288
> URL: https://issues.apache.org/jira/browse/BEAM-3288
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core, sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Current Beam trigger semantics are rather confusing and in some cases 
> extremely unsafe, especially if the pipeline includes multiple chained GBKs. 
> One example of that is https://issues.apache.org/jira/browse/BEAM-3169 .
> There's multiple issues:
> The API allows users to specify terminating top-level triggers (e.g. "trigger 
> a pane after receiving 1 elements in the window, and that's it"), but 
> experience from user support shows that this is nearly always a mistake and 
> the user did not intend to drop all further data.
> In general, triggers are the only place in Beam where data is being dropped 
> without making a lot of very loud noise about it - a practice for which the 
> PTransform style guide uses the language: "never, ever, ever do this".
> Continuation triggers are still worse. For context: continuation trigger is 
> the trigger that's set on the output of a GBK and controls further 
> aggregation of the results of this aggregation by downstream GBKs. The output 
> shouldn't just use the same trigger as the input, because e.g. if the input 
> trigger said "wait for an hour before emitting a pane", that doesn't mean 
> that we should wait for another hour before emitting a result of aggregating 
> the result of the input trigger. Continuation triggers try to simulate the 
> behavior "as if a pane of the input propagated through the entire pipeline", 
> but the implementation of individual continuation triggers doesn't do that. 
> E.g. the continuation of "first N elements in pane" trigger is "first 1 
> element in pane", and if the results of a first GBK are further grouped by a 
> second GBK onto more coarse key (e.g. if everything is grouped onto the same 
> key), that effectively means that, of the keys of the first GBK, only one 
> survives and all others are dropped (what happened in the data loss bug).
> The ultimate fix to all of these things is 
> https://s.apache.org/beam-sink-triggers . However, it is a huge model change, 
> and meanwhile we have to do something. The options are, in order of 
> increasing backward incompatibility (but incompatibility in a "rejecting 
> something that previously was accepted but extremely dangerous" kind of way):
> - Make the continuation trigger of most triggers be the "always-fire" 
> trigger. Seems that this should be the case for all triggers except the 
> watermark trigger. This will definitely increase safety, but lead to more 
> eager firing of downstream aggregations. It also will violate a user's 
> expectation that a

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=338429=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338429
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 05/Nov/19 00:58
Start Date: 05/Nov/19 00:58
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r342342507
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -124,11 +126,20 @@ public void onMatch(RelOptRuleCall call) {
 RelDataType calcInputType =
 CalciteUtils.toCalciteRowType(newSchema, 
ioSourceRel.getCluster().getTypeFactory());
 
+// TODO: Check if an IO supports field reordering and drop a Calc when it 
does (1).
 // Check if the calc can be dropped:
-// 1. Calc only does projects and renames.
+// 1. Calc only does projects and renames of fields in the same order.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program, beamSqlTable.supportsProjects())
 
 Review comment:
   Another case when `Calc` should not be dropped is when an IO does not 
support field reordering and fields are not projected in the same order they 
are listed in the IO Schema.
   Did not get cough by tests, because TestTable providers uses Select, which 
does field reordering for us.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338429)
Time Spent: 2h 10m  (was: 2h)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=338427=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338427
 ]

ASF GitHub Bot logged work on BEAM-8254:


Author: ASF GitHub Bot
Created on: 05/Nov/19 00:52
Start Date: 05/Nov/19 00:52
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9961: [BEAM-8254] add 
workerRegion and workerZone options to Java SDK
URL: https://github.com/apache/beam/pull/9961
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338427)
Time Spent: 50m  (was: 40m)

> (Java SDK) Add workerRegion and workerZone options
> --
>
> Key: BEAM-8254
> URL: https://issues.apache.org/jira/browse/BEAM-8254
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8547) Portable Wordcount fails with on stadalone Flink cluster

2019-11-04 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967121#comment-16967121
 ] 

Kyle Weaver commented on BEAM-8547:
---

Because of the basically transient/fungible nature of Dockerized SDK workers 
(as well as the fact that in "real" use cases, processing would usually be 
distributed), we generally recommend against directly using their local 
filesystems (eg BEAM-8396).

In this particular case, it looks like we're starting up 3 Docker worker 
containers, so naturally they would not share the same filesystem. What I'm not 
entirely sure of is why there are multiple workers given these pipeline 
options. (Specifically, there are 3 worker factories [1], and we expect only 
one to exist per classloader per job.)

[1] 
[https://github.com/apache/beam/blob/6c27e3dd76c24453c94f789aa96610d58f4ca6de/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/functions/FlinkExecutableStageContextFactory.java#L39]

> Portable Wordcount fails with on stadalone Flink cluster 
> -
>
> Key: BEAM-8547
> URL: https://issues.apache.org/jira/browse/BEAM-8547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Repro:
>  # git checkout origin/release-2.16.0
>  # ./flink-1.8.2/bin/start-cluster.sh
>  # gradlew :runners:flink:1.8:job-server:runShadow 
> -PflinkMasterUrl=localhost:8081
>  # python -m apache_beam.examples.wordcount --input=/etc/profile 
> --output=/tmp/py-wordcount-direct --runner=PortableRunner 
> --experiments=worker_threads=100 --parallelism=1 
> --shutdown_sources_on_final_watermark --sdk_worker_parallelism=1 
> --environment_cache_millis=6 --job_endpoint=localhost:8099
> This causes the runner to crash with:
> {noformat}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 158, in _execute
> response = task()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 191, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 343, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 369, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 663, in process_bundle
> data.ptransform_id].process_encoded(data.data)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output
>   File "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output
>   File "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
>   File "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process
>   File "apache_beam/runners/worker/operations.py", line 594, in 
> apache_beam.runners.worker.operations.DoOperation.process
>   File "apache_beam/runners/common.py", line 776, in 
> apache_beam.runners.common.DoFnRunner.receive
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>   File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", 
> line 421, in raise_with_traceback
> raise exc.with_traceback(traceback)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 660, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", 
> line 1042, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 186, in open_writer
> return FileBasedSinkWriter(self, writer_path)
>

[jira] [Work logged] (BEAM-8508) [SQL] Support predicate push-down without project push-down

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8508?focusedWorklogId=338421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338421
 ]

ASF GitHub Bot logged work on BEAM-8508:


Author: ASF GitHub Bot
Created on: 05/Nov/19 00:10
Start Date: 05/Nov/19 00:10
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9943: [BEAM-8508] 
[SQL] Standalone filter push down
URL: https://github.com/apache/beam/pull/9943#discussion_r342331288
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamIOPushDownRule.java
 ##
 @@ -123,7 +129,12 @@ public void onMatch(RelOptRuleCall call) {
 // 1. Calc only does projects and renames.
 //And
 // 2. Predicate can be completely pushed-down to IO level.
-if (isProjectRenameOnlyProgram(program) && 
tableFilter.getNotSupported().isEmpty()) {
+//And
+// 3. And IO supports project push-down OR all fields are projected by a 
Calc.
+if (isProjectRenameOnlyProgram(program)
+&& tableFilter.getNotSupported().isEmpty()
+&& (beamSqlTable.supportsProjects()
+|| calc.getRowType().getFieldCount() == 
calcInputRowType.getFieldCount())) {
 
 Review comment:
   Still working on checking the order, do not merge yet.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338421)
Time Spent: 2h  (was: 1h 50m)

> [SQL] Support predicate push-down without project push-down
> ---
>
> Key: BEAM-8508
> URL: https://issues.apache.org/jira/browse/BEAM-8508
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In this PR: [https://github.com/apache/beam/pull/9863]
> Support for Predicate push-down is added, but only for IOs that support 
> project push-down.
> In order to accomplish that some checks need to be added to not perform 
> certain Calc and IO manipulations when only filter push-down is needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338413
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:45
Start Date: 04/Nov/19 23:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9987: [BEAM-8504] Fix a 
bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#issuecomment-549598378
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338413)
Time Spent: 1h 20m  (was: 1h 10m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338412
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:43
Start Date: 04/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: aryann commented on pull request #9987: [BEAM-8504] Fix 
a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#discussion_r342324713
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
 ##
 @@ -233,17 +233,17 @@ private synchronized boolean readNextRecord() throws 
IOException {
 fractionConsumedFromCurrentResponse = 
getFractionConsumed(currentResponse);
 
 Preconditions.checkArgument(
-totalRowCountFromCurrentResponse > 0L,
-"Row count from current response (%s) must be greater than one.",
+totalRowCountFromCurrentResponse >= 0L,
+"Row count from current response (%s) must be greater than or 
equal to one.",
 totalRowCountFromCurrentResponse);
 Preconditions.checkArgument(
 0f <= fractionConsumedFromCurrentResponse && 
fractionConsumedFromCurrentResponse <= 1f,
 "Fraction consumed from current response (%s) is not in the range 
[0.0, 1.0].",
 fractionConsumedFromCurrentResponse);
 Preconditions.checkArgument(
-fractionConsumedFromPreviousResponse < 
fractionConsumedFromCurrentResponse,
-"Fraction consumed from previous response (%s) is not less than 
fraction consumed "
-+ "from current response (%s).",
+fractionConsumedFromPreviousResponse <= 
fractionConsumedFromCurrentResponse,
+"Fraction consumed from previous response (%s) is not less than or 
equal to the "
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338412)
Time Spent: 1h 10m  (was: 1h)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by:

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338411
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:42
Start Date: 04/Nov/19 23:42
Worklog Time Spent: 10m 
  Work Description: aryann commented on pull request #9987: [BEAM-8504] Fix 
a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#discussion_r342324695
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
 ##
 @@ -233,17 +233,17 @@ private synchronized boolean readNextRecord() throws 
IOException {
 fractionConsumedFromCurrentResponse = 
getFractionConsumed(currentResponse);
 
 Preconditions.checkArgument(
-totalRowCountFromCurrentResponse > 0L,
-"Row count from current response (%s) must be greater than one.",
+totalRowCountFromCurrentResponse >= 0L,
+"Row count from current response (%s) must be greater than or 
equal to one.",
 
 Review comment:
   And this is why we do code reviews! Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338411)
Time Spent: 1h  (was: 50m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by

[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=338410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338410
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:42
Start Date: 04/Nov/19 23:42
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9974: [BEAM-8472] Get 
default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974#discussion_r342324597
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 ##
 @@ -357,6 +357,50 @@ public static DataflowRunner fromOptions(PipelineOptions 
options) {
 return new DataflowRunner(dataflowOptions);
   }
 
+  /**
+   * Get a default value for Google Cloud region according to
+   * https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. 
If no other default
+   * can be found, returns "us-central1".
+   */
+  static String getDefaultGcpRegion() {
+String environmentRegion = System.getenv("CLOUDSDK_COMPUTE_REGION");
+if (environmentRegion != null && !environmentRegion.isEmpty()) {
+  LOG.info("Using default GCP region {} from $CLOUDSDK_COMPUTE_REGION", 
environmentRegion);
+  return environmentRegion;
+}
+try {
+  ProcessBuilder pb =
+  new ProcessBuilder(Arrays.asList("gcloud", "config", "get-value", 
"compute/region"));
+  Process process = pb.start();
+  BufferedReader reader =
+  new BufferedReader(
+  new InputStreamReader(process.getInputStream(), 
StandardCharsets.UTF_8));
+  BufferedReader errorReader =
+  new BufferedReader(
+  new InputStreamReader(process.getErrorStream(), 
StandardCharsets.UTF_8));
+  process.waitFor(1, TimeUnit.SECONDS);
 
 Review comment:
   > Is 1 second enough?
   
   Tests on my machine (in debug mode) indicated it took on average around 0.25 
seconds, including reading the output stream after. I have bumped the timeout 
to 2s just to make double sure.
   
   > Should we check the return value of this call?
   
   Not sure it matters, since I think it should interrupt the process after the 
time limit has passed, but it couldn't hurt to check anyway.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338410)
Time Spent: 2h  (was: 1h 50m)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338406
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:23
Start Date: 04/Nov/19 23:23
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9987: 
[BEAM-8504] Fix a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#discussion_r342317700
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
 ##
 @@ -233,17 +233,17 @@ private synchronized boolean readNextRecord() throws 
IOException {
 fractionConsumedFromCurrentResponse = 
getFractionConsumed(currentResponse);
 
 Preconditions.checkArgument(
-totalRowCountFromCurrentResponse > 0L,
-"Row count from current response (%s) must be greater than one.",
+totalRowCountFromCurrentResponse >= 0L,
+"Row count from current response (%s) must be greater than or 
equal to one.",
 
 Review comment:
   Seems like message does not match with the condition ? Do you just want to 
make sure that the row count is not negative for some reason ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338406)
Time Spent: 40m  (was: 0.5h)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
>

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338407=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338407
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:23
Start Date: 04/Nov/19 23:23
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9987: 
[BEAM-8504] Fix a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#discussion_r342319543
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageStreamSource.java
 ##
 @@ -233,17 +233,17 @@ private synchronized boolean readNextRecord() throws 
IOException {
 fractionConsumedFromCurrentResponse = 
getFractionConsumed(currentResponse);
 
 Preconditions.checkArgument(
-totalRowCountFromCurrentResponse > 0L,
-"Row count from current response (%s) must be greater than one.",
+totalRowCountFromCurrentResponse >= 0L,
+"Row count from current response (%s) must be greater than or 
equal to one.",
 totalRowCountFromCurrentResponse);
 Preconditions.checkArgument(
 0f <= fractionConsumedFromCurrentResponse && 
fractionConsumedFromCurrentResponse <= 1f,
 "Fraction consumed from current response (%s) is not in the range 
[0.0, 1.0].",
 fractionConsumedFromCurrentResponse);
 Preconditions.checkArgument(
-fractionConsumedFromPreviousResponse < 
fractionConsumedFromCurrentResponse,
-"Fraction consumed from previous response (%s) is not less than 
fraction consumed "
-+ "from current response (%s).",
+fractionConsumedFromPreviousResponse <= 
fractionConsumedFromCurrentResponse,
+"Fraction consumed from previous response (%s) is not less than or 
equal to the "
 
 Review comment:
   Probably re-word to "Fraction consumed from the current response has to be 
larger than or equal to the fraction consumed from the previous response. But 
received %s and %s respectively."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338407)
Time Spent: 50m  (was: 40m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
>

[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=338405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338405
 ]

ASF GitHub Bot logged work on BEAM-8254:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:18
Start Date: 04/Nov/19 23:18
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9961: [BEAM-8254] add 
workerRegion and workerZone options to Java SDK
URL: https://github.com/apache/beam/pull/9961#issuecomment-549591834
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338405)
Time Spent: 40m  (was: 0.5h)

> (Java SDK) Add workerRegion and workerZone options
> --
>
> Key: BEAM-8254
> URL: https://issues.apache.org/jira/browse/BEAM-8254
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Aryan Naraghi (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967082#comment-16967082
 ] 

Aryan Naraghi commented on BEAM-8504:
-

Fix: [https://github.com/apache/beam/pull/9987]

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338403
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:06
Start Date: 04/Nov/19 23:06
Worklog Time Spent: 10m 
  Work Description: aryann commented on issue #9987: [BEAM-8504] Fix a bug 
related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#issuecomment-549588520
 
 
   @chamikaramj @kanterov 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338403)
Time Spent: 20m  (was: 10m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338401
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:05
Start Date: 04/Nov/19 23:05
Worklog Time Spent: 10m 
  Work Description: aryann commented on pull request #9987: [BEAM-8504] Fix 
a bug related to zero-row responses
URL: https://github.com/apache/beam/pull/9987
 
 
   In some rare cases, the server can included zero rows in the response 
messages,
   leading to a zero count of rows and a progress report that does not
   increase. This change relaxes some of our preconditions.
   
   ---
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=338404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338404
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:06
Start Date: 04/Nov/19 23:06
Worklog Time Spent: 10m 
  Work Description: aryann commented on issue #9987: [BEAM-8504] Fix a bug 
related to zero-row responses
URL: https://github.com/apache/beam/pull/9987#issuecomment-549588520
 
 
   R: @chamikaramj @kanterov 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338404)
Time Spent: 0.5h  (was: 20m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338402
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 23:05
Start Date: 04/Nov/19 23:05
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #9953: [BEAM-8335] Adds 
support for multi-output TestStream
URL: https://github.com/apache/beam/pull/9953#issuecomment-549588473
 
 
   Thanks @rohdesamuel ! Sam and I went over this offline and here is the 
summary.
   
   - Add another unit test with timestamped event.
   - Seek feedback from other Beam committers on verification of watermark 
propagation.
   - Add comments to explain parts of the code that are not immediately obvious.
   - Possible abstraction of the different special purpose watermark 
WATERMARK_CONTROL_TAG events.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338402)
Time Spent: 18h 10m  (was: 18h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 18h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8554) Use WorkItemCommitRequest protobuf fields to signal that a WorkItem needs to be broken up

2019-11-04 Thread Steve Koonce (Jira)

Steve Koonce created BEAM-8554:
--

 Summary: Use WorkItemCommitRequest protobuf fields to signal that 
a WorkItem needs to be broken up
 Key: BEAM-8554
 URL: https://issues.apache.org/jira/browse/BEAM-8554
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Steve Koonce


+Background:+

When a WorkItemCommitRequest is generated that's bigger than the permitted size 
(> ~180 MB), a KeyCommitTooLargeException is logged (_not thrown_) and the 
request is still sent to the service.  The service rejects the commit, but 
breaks up input messages that were bundled together and adds them to new, 
smaller work items that will later be pulled and re-tried - likely without 
generating another commit that is too large.

When a WorkItemCommitRequest is generated that's too large to be sent back to 
the service (> 2 GB), a KeyCommitTooLargeException is thrown and nothing is 
sent back to the service.

 

+Proposed Improvement+

In both cases, prevent the doomed, large commit item from being sent back to 
the service.  Instead send flags in the commit request signaling that the 
current work item led to a commit that is too large and the work item should be 
broken up.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=338397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338397
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 04/Nov/19 22:54
Start Date: 04/Nov/19 22:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9986: Merge pull 
request #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version 
0.15.1
URL: https://github.com/apache/beam/pull/9986
 
 
   [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version 0.15.1 (#9970)
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=338393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338393
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 04/Nov/19 22:41
Start Date: 04/Nov/19 22:41
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338393)
Time Spent: 3.5h  (was: 3h 20m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=338388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338388
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 04/Nov/19 22:28
Start Date: 04/Nov/19 22:28
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on pull request #9899: 
[BEAM-8511] [WIP] KinesisIO enhanced fanout
URL: https://github.com/apache/beam/pull/9899#discussion_r342302772
 
 

 ##
 File path: 
sdks/java/io/kinesis2/src/main/java/org/apache/beam/sdk/io/kinesis2/BasicKinesisProvider.java
 ##
 @@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kinesis2;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.net.URI;
+import javax.annotation.Nullable;
+import software.amazon.awssdk.auth.credentials.AwsBasicCredentials;
+import software.amazon.awssdk.auth.credentials.AwsCredentialsProvider;
+import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
+import software.amazon.awssdk.regions.Region;
+import software.amazon.awssdk.services.cloudwatch.CloudWatchClient;
+import software.amazon.awssdk.services.cloudwatch.CloudWatchClientBuilder;
+import software.amazon.awssdk.services.kinesis.KinesisAsyncClient;
+import software.amazon.awssdk.services.kinesis.KinesisAsyncClientBuilder;
+import software.amazon.awssdk.services.kinesis.KinesisClient;
+import software.amazon.awssdk.services.kinesis.KinesisClientBuilder;
+
+/** Basic implementation of {@link AWSClientsProvider} used by default in 
{@link KinesisIO}. */
+class BasicKinesisProvider implements AWSClientsProvider {
+  private final String accessKey;
+  private final String secretKey;
+  private final String region;
+  @Nullable private final String serviceEndpoint;
+
+  BasicKinesisProvider(
+  String accessKey, String secretKey, Region region, @Nullable String 
serviceEndpoint) {
+checkArgument(accessKey != null, "accessKey can not be null");
+checkArgument(secretKey != null, "secretKey can not be null");
+checkArgument(region != null, "region can not be null");
+this.accessKey = accessKey;
+this.secretKey = secretKey;
+this.region = region.toString();
+this.serviceEndpoint = serviceEndpoint;
+  }
+
+  private AwsCredentialsProvider getCredentialsProvider() {
+return 
StaticCredentialsProvider.create(AwsBasicCredentials.create(accessKey, 
secretKey));
+  }
+
+  @Override
+  public KinesisClient getKinesisClient() {
+KinesisClientBuilder clientBuilder =
+KinesisClient.builder()
+.credentialsProvider(getCredentialsProvider())
+.region(Region.of(region));
+if (serviceEndpoint != null) {
+  clientBuilder.endpointOverride(URI.create(serviceEndpoint));
+}
+return clientBuilder.build();
+  }
+
+  @Override
+  public KinesisAsyncClient getKinesisAsyncClient() {
+KinesisAsyncClientBuilder clientBuilder =
+KinesisAsyncClient.builder()
+.credentialsProvider(getCredentialsProvider())
+.region(Region.of(region));
+if (serviceEndpoint != null) {
+  clientBuilder.endpointOverride(URI.create(serviceEndpoint));
+}
+return clientBuilder.build();
+  }
+
+  @Override
+  public CloudWatchClient getCloudWatchClient() {
+CloudWatchClientBuilder clientBuilder =
+CloudWatchClient.builder()
+.credentialsProvider(getCredentialsProvider())
+.region(Region.of(region));
+if (serviceEndpoint != null) {
+  clientBuilder.endpointOverride(URI.create(serviceEndpoint));
+}
+return clientBuilder.build();
+  }
+}
 
 Review comment:
   .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog

[jira] [Comment Edited] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Aryan Naraghi (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967032#comment-16967032
 ] 

Aryan Naraghi edited comment on BEAM-8504 at 11/4/19 10:00 PM:
---

Okay, I just confirmed with a colleague that there is a case where this might 
happen. It's rare, but it's possible, so the precondition check should be: 
{code:java}
fractionConsumedFromPreviousResponse <= fractionConsumedFromCurrentResponse 
{code}
I'll try to get a PR out shortly.


was (Author: aryann):
Okay, I just confirmed with a colleague that there is a case where this might 
happen. It's rare, but it's possible, so the precondition check should be:

 
{code:java}
fractionConsumedFromPreviousResponse <= fractionConsumedFromCurrentResponse 
{code}

 I'll try to get a PR out shortly.

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Aryan Naraghi (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967032#comment-16967032
 ] 

Aryan Naraghi edited comment on BEAM-8504 at 11/4/19 10:00 PM:
---

Okay, I just confirmed with a colleague that there is a case where this might 
happen. It's rare, but it's possible, so the precondition check should be:

 
{code:java}
fractionConsumedFromPreviousResponse <= fractionConsumedFromCurrentResponse 
{code}

 I'll try to get a PR out shortly.


was (Author: aryann):
Okay, I just confirmed with a colleague that there is a case where this might 
happen. It's rare, but it's possible, so the precondition check should be:
fractionConsumedFromPreviousResponse <= fractionConsumedFromCurrentResponse
I'll try to get a PR out shortly.

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Aryan Naraghi (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967032#comment-16967032
 ] 

Aryan Naraghi commented on BEAM-8504:
-

Okay, I just confirmed with a colleague that there is a case where this might 
happen. It's rare, but it's possible, so the precondition check should be:
fractionConsumedFromPreviousResponse <= fractionConsumedFromCurrentResponse
I'll try to get a PR out shortly.

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8432?focusedWorklogId=338369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338369
 ]

ASF GitHub Bot logged work on BEAM-8432:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:58
Start Date: 04/Nov/19 21:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9984: [BEAM-8432] Enable 
building kotlin examples with different java versions
URL: https://github.com/apache/beam/pull/9984#issuecomment-549566191
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338369)
Time Spent: 1h 20m  (was: 1h 10m)

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8432?focusedWorklogId=338370=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338370
 ]

ASF GitHub Bot logged work on BEAM-8432:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:58
Start Date: 04/Nov/19 21:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9984: [BEAM-8432] Enable 
building kotlin examples with different java versions
URL: https://github.com/apache/beam/pull/9984#issuecomment-549566225
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338370)
Time Spent: 1.5h  (was: 1h 20m)

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8532?focusedWorklogId=338365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338365
 ]

ASF GitHub Bot logged work on BEAM-8532:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:49
Start Date: 04/Nov/19 21:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9958: [BEAM-8532] Output 
timestamp should be the inclusive, not exclusive, end of window.
URL: https://github.com/apache/beam/pull/9958#issuecomment-549562947
 
 
   Yes, I agree it's severe enough to consider cherry-picking into 2.17. 
@Ardagan  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338365)
Time Spent: 0.5h  (was: 20m)

> Beam Python trigger driver sets incorrect timestamp for output windows.
> ---
>
> Key: BEAM-8532
> URL: https://issues.apache.org/jira/browse/BEAM-8532
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The timestamp should lie in the window, otherwise re-windowing will not be 
> idempotent. 
> https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183
>  should be using {{window.max_timestamp()}} rather than {{.end}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8532?focusedWorklogId=338366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338366
 ]

ASF GitHub Bot logged work on BEAM-8532:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:49
Start Date: 04/Nov/19 21:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9958: [BEAM-8532] 
Output timestamp should be the inclusive, not exclusive, end of window.
URL: https://github.com/apache/beam/pull/9958
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338366)
Time Spent: 40m  (was: 0.5h)

> Beam Python trigger driver sets incorrect timestamp for output windows.
> ---
>
> Key: BEAM-8532
> URL: https://issues.apache.org/jira/browse/BEAM-8532
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The timestamp should lie in the window, otherwise re-windowing will not be 
> idempotent. 
> https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183
>  should be using {{window.max_timestamp()}} rather than {{.end}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=338363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338363
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:44
Start Date: 04/Nov/19 21:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9966: [BEAM-8544] 
Use ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338363)
Time Spent: 1.5h  (was: 1h 20m)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338359
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:28
Start Date: 04/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-548991708
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338359)
Time Spent: 1h  (was: 50m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=338360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338360
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:28
Start Date: 04/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-549555135
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338360)
Time Spent: 1h 10m  (was: 1h)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=338357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338357
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 04/Nov/19 21:22
Start Date: 04/Nov/19 21:22
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #9899: [BEAM-8511] 
[WIP] KinesisIO enhanced fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-549552814
 
 
   @aromanenko-dev and @iemejia , as we have discussed about this IO migration, 
we may need to consider:
   1. A separate subscribe api/function for the IO, so the existing users don’t 
“have to be migrated” to enhanced-fan-out. Using enhanced-fan-out is quite 
expensive, so not sure everyone want it until they need speed and latency
   2. Keep existing IO’s read and just convert it to V2, to support backward 
compatibility
   3. Move KinesisIO V2 to amazon-web-service2 submodel. 
   
   However, this PR seems to support a "fall back" strategy, which means if 
users don't provide consumer's ARN, it will fall back and use the standard way 
of pull records. I don't have any preferences, for which one would work better 
or cleaner, as long as we support backward compatibility. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338357)
Time Spent: 1h 10m  (was: 1h)

> Support for enhanced fan-out in KinesisIO
> -
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Gleb Kanterov (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966996#comment-16966996
 ] 

Gleb Kanterov commented on BEAM-8504:
-

[~aryann] yes, it looks like a backend or floating-point precision issue, it's 
hard to tell for me because I didn't read proto or familiar with backend code. 
The table is over 1B rows, and I can consistently reproduce it. I can do a PR 
with revert, but I don't feel confident changing precondition without 
understanding the codebase. It would be great if you can do that because I 
think you know better.

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7434) RabbitMqIO uses a deprecated API

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7434?focusedWorklogId=338346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338346
 ]

ASF GitHub Bot logged work on BEAM-7434:


Author: ASF GitHub Bot
Created on: 04/Nov/19 20:30
Start Date: 04/Nov/19 20:30
Worklog Time Spent: 10m 
  Work Description: drobert commented on pull request #9977: [BEAM-7434] 
[BEAM-5895] and [BEAM-5894] Fix upgrade to rabbit amqp-client 5.x
URL: https://github.com/apache/beam/pull/9977#discussion_r342255231
 
 

 ##
 File path: 
sdks/java/io/rabbitmq/src/test/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIOTest.java
 ##
 @@ -203,25 +204,45 @@ private void doExchangeTest(ExchangeTestPlan testPlan, 
boolean simulateIncompati
 connectionFactory.setUri(uri);
 Connection connection = null;
 Channel channel = null;
+Thread publisher = null;
+
+// for debugging log messages
+Map testNameComponents = new HashMap<>();
+testNameComponents.put("exchange", exchange);
+testNameComponents.put("exchangeType", exchangeType);
+testNameComponents.put("queue", 
Optional.ofNullable(read.queue()).orElse(""));
+final String testName = Joiner.on(", 
").withKeyValueSeparator("=").join(testNameComponents);
 
 try {
   connection = connectionFactory.newConnection();
   channel = connection.createChannel();
   channel.exchangeDeclare(exchange, exchangeType);
   final Channel finalChannel = channel;
-  Thread publisher =
+  final String finalExchangeType = exchangeType;
+
+  publisher =
   new Thread(
   () -> {
 try {
   Thread.sleep(5000);
+} catch (InterruptedException e) {
+  // allow the test construct to cancel the thread
+  LOG.info("Test {} interrupted before beginning publishing", 
testName);
 
 Review comment:
   There was some exception noise in test output when tests fail-fast but the 
publisher thread continues attempting to publish on a closed channel. The 
Thread now checks for interrupt status and aborts if interrupted.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338346)
Time Spent: 1h 50m  (was: 1h 40m)

> RabbitMqIO uses a deprecated API
> 
>
> Key: BEAM-7434
> URL: https://issues.apache.org/jira/browse/BEAM-7434
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Reporter: Nicolas Delsaux
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The RabbitMqIo class reader (UnboundedRabbitMqReader) uses the 
> QueueingConsumer, which is denoted as deprecated on RabbitMq side. RabbitMqIo 
> should replace this consumer with the DefaultConsumer provided by RabbitMq.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Aryan Naraghi (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966982#comment-16966982
 ] 

Aryan Naraghi commented on BEAM-8504:
-

Gleb, do you want to send me a PR with your change that relaxes the 
precondition check? I'm aryann on GitHub.

 

I believe this is a server-side issue we'll have to investigate.

 

How big is the table you're reading?

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=338337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338337
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 04/Nov/19 20:02
Start Date: 04/Nov/19 20:02
Worklog Time Spent: 10m 
  Work Description: MattMorgis commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-549523153
 
 
   R: @pabloem @robertwb @aaltay @charlesccychen 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338337)
Time Spent: 0.5h  (was: 20m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=338336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338336
 ]

ASF GitHub Bot logged work on BEAM-2572:


Author: ASF GitHub Bot
Created on: 04/Nov/19 20:01
Start Date: 04/Nov/19 20:01
Worklog Time Spent: 10m 
  Work Description: MattMorgis commented on issue #9955: [BEAM-2572] Python 
SDK S3 Filesystem
URL: https://github.com/apache/beam/pull/9955#issuecomment-549522747
 
 
   Hi,
   
   We are running into trouble getting the unit tests to pass in the CI 
environment, and I think we can use help from a core team member. 
   
   We added a new set of extra dependencies when using this new S3 filesystem - 
we followed the same pattern that GCP did: 
https://github.com/apache/beam/pull/9955/files#diff-e9d0ab71f74dc10309a29b697ee99330R239
   
   This allows the user to install with `pip install beam[gcp]` or `pip install 
beam[aws]` in our case. 
   
   Our unit tests are completely mocked out and do not require any of the AWS 
extra packages, however, we set it up behind a flag so you can bypass the mock 
and talk to a real S3 bucket over the wire. Because of this, the extra 
dependencies *do* need to installed when running these new unit tests. 
   
   Again, following the lead of how GCP implemented this, they also skip the 
unit tests if their extra dependencies are not installed: 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/gcsio_test.py#L240
   
   Our question: _**How do we configured CI to install the AWS deps to run the 
tests?**_ 
   
   I have poked around a bit and found one setting in `tox.ini` that appears to 
install both the test and gcp deps 
(https://github.com/apache/beam/blob/master/sdks/python/tox.ini#L200). 
Addtionally, at the root level of the project, 
(https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1799)
 I found a `installGcpTest` Gradle task that seems to also install both. This 
task only seems to be referenced inside of the `test-suites/dataflow` but not 
`direct` or `portable`. 
   
   Any guidance here would be greatly appreciated! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338336)
Time Spent: 20m  (was: 10m)

> Implement an S3 filesystem for Python SDK
> -
>
> Key: BEAM-2572
> URL: https://issues.apache.org/jira/browse/BEAM-2572
> Project: Beam
>  Issue Type: Task
>  Components: sdk-py-core
>Reporter: Dmitry Demeshchuk
>Priority: Minor
>  Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are two paths worth exploring, to my understanding:
> 1. Sticking to the HDFS-based approach (like it's done in Java).
> 2. Using boto/boto3 for accessing S3 through its common API endpoints.
> I personally prefer the second approach, for a few reasons:
> 1. In real life, HDFS and S3 have different consistency guarantees, therefore 
> their behaviors may contradict each other in some edge cases (say, we write 
> something to S3, but it's not immediately accessible for reading from another 
> end).
> 2. There are other AWS-based sources and sinks we may want to create in the 
> future: DynamoDB, Kinesis, SQS, etc.
> 3. boto3 already provides somewhat good logic for basic things like 
> reattempting.
> Whatever path we choose, there's another problem related to this: we 
> currently cannot pass any global settings (say, pipeline options, or just an 
> arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the 
> runner nodes to have AWS keys set up in the environment, which is not trivial 
> to achieve and doesn't look too clean either (I'd rather see one single place 
> for configuring the runner options).
> Also, it's worth mentioning that I already have a janky S3 filesystem 
> implementation that only supports DirectRunner at the moment (because of the 
> previous paragraph). I'm perfectly fine finishing it myself, with some 
> guidance from the maintainers.
> Where should I move on from here, and whose input should I be looking for?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=338333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338333
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 04/Nov/19 19:54
Start Date: 04/Nov/19 19:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-549519877
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338333)
Time Spent: 3h 20m  (was: 3h 10m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-04 Thread Chamikara Madhusanka Jayalath (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath reassigned BEAM-8504:
---

Assignee: Aryan Naraghi

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8536) Migrate usage of DelayedBundleApplication.requested_execution_time to time duration

2019-11-04 Thread Boyuan Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966931#comment-16966931
 ] 

Boyuan Zhang commented on BEAM-8536:


No, this only affects Java. Currently we only have Java and Python SDK working 
with DelayedBundleApplication. Python is still in development, so we can 
directly use time duration.

> Migrate usage of DelayedBundleApplication.requested_execution_time to time 
> duration 
> 
>
> Key: BEAM-8536
> URL: https://issues.apache.org/jira/browse/BEAM-8536
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow, sdk-java-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> In DelayedBundleApplication, we used to use an absolute time to represent 
> reshceduling time. We want to switch to use a relative time duration,  which 
> requires a migration in Java SDK and dataflow java runner harness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2019-11-04 Thread Boyuan Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang updated BEAM-8537:
---
Description: 
This is a follow up for in-progress PR:  
https://github.com/apache/beam/pull/9794.
Current implementation in PR9794 provides a default implementation of 
WatermarkEstimator. For further work, we want to let WatermarkEstimator to be a 
pure Interface. We'll provide a WatermarkEstimatorProvider to be able to create 
a custom WatermarkEstimator per windowed value. It should be similar to how we 
track restriction for SDF: WatermarkEstimator <---> RestrictionTracker, 
WatermarkEstimatorProvider <---> RestrictionTrackerProvider, 
_WatermarkEstimatorParam <---> _RestrictionDoFnParam

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: WatermarkEstimator <---> 
> RestrictionTracker, WatermarkEstimatorProvider <---> 
> RestrictionTrackerProvider, _WatermarkEstimatorParam <---> 
> _RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2019-11-04 Thread Boyuan Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyuan Zhang updated BEAM-8537:
---
Description: 
This is a follow up for in-progress PR:  
https://github.com/apache/beam/pull/9794.
Current implementation in PR9794 provides a default implementation of 
WatermarkEstimator. For further work, we want to let WatermarkEstimator to be a 
pure Interface. We'll provide a WatermarkEstimatorProvider to be able to create 
a custom WatermarkEstimator per windowed value. It should be similar to how we 
track restriction for SDF: 
WatermarkEstimator <---> RestrictionTracker 
WatermarkEstimatorProvider <---> RestrictionTrackerProvider
WatermarkEstimatorParam <---> RestrictionDoFnParam

  was:
This is a follow up for in-progress PR:  
https://github.com/apache/beam/pull/9794.
Current implementation in PR9794 provides a default implementation of 
WatermarkEstimator. For further work, we want to let WatermarkEstimator to be a 
pure Interface. We'll provide a WatermarkEstimatorProvider to be able to create 
a custom WatermarkEstimator per windowed value. It should be similar to how we 
track restriction for SDF: WatermarkEstimator <---> RestrictionTracker, 
WatermarkEstimatorProvider <---> RestrictionTrackerProvider, 
_WatermarkEstimatorParam <---> _RestrictionDoFnParam


> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338304=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338304
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:56
Start Date: 04/Nov/19 18:56
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9953: [BEAM-8335] Adds 
support for multi-output TestStream
URL: https://github.com/apache/beam/pull/9953#issuecomment-549496668
 
 
   R: @davidyan74 can you review this please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338304)
Time Spent: 18h  (was: 17h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 18h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338303
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:55
Start Date: 04/Nov/19 18:55
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9953: [BEAM-8335] Adds 
support for multi-output TestStream
URL: https://github.com/apache/beam/pull/9953#issuecomment-549496493
 
 
   R: @KevinGG can you review this please?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338303)
Time Spent: 17h 50m  (was: 17h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=338301=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338301
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:45
Start Date: 04/Nov/19 18:45
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r342208431
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/main/java/org/apache/beam/sdk/extensions/zetasketch/HllCount.java
 ##
 @@ -107,6 +109,20 @@
   // Cannot be instantiated. This class is intended to be a namespace only.
   private HllCount() {}
 
+  /**
+   * Returns the sketch stored as bytes in the input {@code ByteBuffer}. If 
the input {@code
 
 Review comment:
   Done. Agree that this does sounds more clear!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338301)
Time Spent: 36h 40m  (was: 36.5h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 36h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=338296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338296
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:42
Start Date: 04/Nov/19 18:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9966: [BEAM-8544] Use 
ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#issuecomment-549491314
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338296)
Time Spent: 1h 20m  (was: 1h 10m)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=338295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338295
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:42
Start Date: 04/Nov/19 18:42
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9966: [BEAM-8544] Use 
ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#issuecomment-549491287
 
 
   Timed out, but on inspection doesn't look relevant to this change. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338295)
Time Spent: 1h 10m  (was: 1h)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8532?focusedWorklogId=338291=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338291
 ]

ASF GitHub Bot logged work on BEAM-8532:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:39
Start Date: 04/Nov/19 18:39
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9958: [BEAM-8532] Output 
timestamp should be the inclusive, not exclusive, end of window.
URL: https://github.com/apache/beam/pull/9958#issuecomment-549490174
 
 
   Yeah, amazing (and scary) what you can find when you add some testing. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338291)
Time Spent: 20m  (was: 10m)

> Beam Python trigger driver sets incorrect timestamp for output windows.
> ---
>
> Key: BEAM-8532
> URL: https://issues.apache.org/jira/browse/BEAM-8532
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The timestamp should lie in the window, otherwise re-windowing will not be 
> idempotent. 
> https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183
>  should be using {{window.max_timestamp()}} rather than {{.end}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.

2019-11-04 Thread Robert Bradshaw (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966897#comment-16966897
 ] 

Robert Bradshaw commented on BEAM-8434:
---

The original intent was to allow these to be executed in every SDK. However, 
now that triggering is the responsibility of the Runner rather than the SDK, 
that's not as necessary--these can be run as validates runner tests. 

> Allow trigger transcript tests to be run as ValidatesRunner tests. 
> ---
>
> Key: BEAM-8434
> URL: https://issues.apache.org/jira/browse/BEAM-8434
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338285
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:25
Start Date: 04/Nov/19 18:25
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#issuecomment-549484349
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338285)
Time Spent: 17h 40m  (was: 17.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8516) sdist build fails when artifacts from different versions are present

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8516?focusedWorklogId=338283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338283
 ]

ASF GitHub Bot logged work on BEAM-8516:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:24
Start Date: 04/Nov/19 18:24
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9930: 
[BEAM-8516] only consider current version in sdist build
URL: https://github.com/apache/beam/pull/9930
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338283)
Time Spent: 20m  (was: 10m)

> sdist build fails when artifacts from different versions are present
> 
>
> Key: BEAM-8516
> URL: https://issues.apache.org/jira/browse/BEAM-8516
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I was developing on some Beam branch, then I switched git branches to a 
> different branch. These two branches had different python_sdk_version 
> properties set [1], say version A and version B. When I run Gradle task 
> :sdks:python:sdist, it will fail with error:
> Expected directory ... to contain exactly one file, however, it contains more 
> than one file.
> This happens because the tarball labeled version A is still present, along 
> with the new tarball for version B. I can get around this by cleaning the 
> build directory, but this is an extra step and is not immediately obvious. It 
> would be better if it just worked.
> [1] [https://github.com/ibzib/beam/blob/default-region/gradle.properties#L27]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338284
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:25
Start Date: 04/Nov/19 18:25
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342199448
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
 
 Review comment:
   In a different comment, I wrote that I don't know if I want to subclass the 
PCollection Cache yet. So I assume here that it will be a simple file based 
cache of strings that I will read row-by-row without any assumption of 
underlying format. I may change that assumption in the future when the 
PCollection Cache code is merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338284)
Time Spent: 17.5h  (was: 17h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338282
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:23
Start Date: 04/Nov/19 18:23
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342198794
 
 

 ##
 File path: model/interactive/OWNERS
 ##
 @@ -0,0 +1,7 @@
+# See the OWNERS docs at https://s.apache.org/beam-owners
+
+reviewers:
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338282)
Time Spent: 17h 20m  (was: 17h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338281=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338281
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:23
Start Date: 04/Nov/19 18:23
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342198600
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
+
+  def stop(self):
+self._server.stop(0)
+self._server.wait_for_termination()
+
+  def Events(self, request, context):
+events = self._reader.read()
+if events:
+  for e in events:
+# Here we assume that the first event is the processing_time_event so
+# that we can sleep and then emit the element. Thereby, trying to
+# emulate the original stream.
+if e.HasField('processing_time_event'):
+  sleep_duration = (
 
 Review comment:
   Ack, PTAL in the StreamingCache. I moved the logic to add in 
AdvanceProcessingTime events into there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338281)
Time Spent: 17h 10m  (was: 17h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=338280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338280
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:22
Start Date: 04/Nov/19 18:22
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9686: 
[WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions 
with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9686#discussion_r342198241
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -106,8 +106,7 @@ def get_version():
 'avro>=1.8.1,<2.0.0; python_version < "3.0"',
 'avro-python3>=1.8.1,<2.0.0; python_version >= "3.0"',
 'crcmod>=1.7,<2.0',
-# Dill doesn't guarantee comatibility between releases within minor 
version.
-'dill>=0.3.0,<0.3.1',
+'dill>=0.3.1.1,<0.4.0',
 
 Review comment:
   Also, please keep the comment, and link 
https://github.com/uqfoundation/dill/issues/347 in the comment. Thank you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338280)
Time Spent: 15h 50m  (was: 15h 40m)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338279=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338279
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:22
Start Date: 04/Nov/19 18:22
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342198148
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
+
+  def stop(self):
+self._server.stop(0)
+self._server.wait_for_termination()
+
+  def Events(self, request, context):
+events = self._reader.read()
+if events:
+  for e in events:
+# Here we assume that the first event is the processing_time_event so
+# that we can sleep and then emit the element. Thereby, trying to
+# emulate the original stream.
+if e.HasField('processing_time_event'):
+  sleep_duration = (
+  e.processing_time_event.advance_duration * self._playback_speed
+  ) * 10**-6
+  if sleep_duration > 0.001:
 
 Review comment:
   Removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338279)
Time Spent: 17h  (was: 16h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338275=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338275
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:21
Start Date: 04/Nov/19 18:21
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342197767
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
+
+  def stop(self):
+self._server.stop(0)
+self._server.wait_for_termination()
+
+  def Events(self, request, context):
+events = self._reader.read()
+if events:
+  for e in events:
+# Here we assume that the first event is the processing_time_event so
+# that we can sleep and then emit the element. Thereby, trying to
+# emulate the original stream.
+if e.HasField('processing_time_event'):
+  sleep_duration = (
+  e.processing_time_event.advance_duration * self._playback_speed
+  ) * 10**-6
 
 Review comment:
   To convert from us to secs
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338275)
Time Spent: 16h 40m  (was: 16.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338277
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:21
Start Date: 04/Nov/19 18:21
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342198087
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
+
+  def stop(self):
+self._server.stop(0)
+self._server.wait_for_termination()
+
+  def Events(self, request, context):
+events = self._reader.read()
+if events:
+  for e in events:
+# Here we assume that the first event is the processing_time_event so
 
 Review comment:
   Simplified the code quite a bit and moved the logic into the StreamingCache. 
Removed this assumption.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338277)
Time Spent: 16h 50m  (was: 16h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338274
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:21
Start Date: 04/Nov/19 18:21
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342197661
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
+
+  def start(self):
+self._server.start()
+self._reader = self._streaming_cache.reader()
 
 Review comment:
   Yeah, that's a good point. I changed it from an implicit session (which only 
supported a single job) to explicit session which can support an arbitrary 
number of jobs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338274)
Time Spent: 16.5h  (was: 16h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8500) PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is incorrect

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8500?focusedWorklogId=338273=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338273
 ]

ASF GitHub Bot logged work on BEAM-8500:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:19
Start Date: 04/Nov/19 18:19
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9983: 
[BEAM-8500] update trigger phrase for beam_PreCommit_Website_Stage_GCS in README
URL: https://github.com/apache/beam/pull/9983
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338273)
Time Spent: 0.5h  (was: 20m)

> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in README is 
> incorrect
> ---
>
> Key: BEAM-8500
> URL: https://issues.apache.org/jira/browse/BEAM-8500
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: yoshiki obata
>Assignee: yoshiki obata
>Priority: Trivial
>  Labels: easyfix, starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> PR Trigger Phrase of beam_PreCommit_Website_Stage_GCS listed in 
> ./test-infra/jenkins/README.md is *Run Website PreCommit* [1]
>  But correct phrase is *Run Website_Stage_GCS PreCommit*
> [1] https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338270=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338270
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:19
Start Date: 04/Nov/19 18:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342196995
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py
 ##
 @@ -0,0 +1,167 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+from google.protobuf import timestamp_pb2
+
+from apache_beam import coders
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.caching.streaming_cache import 
StreamingCache
+from apache_beam.utils import timestamp
+
+
+def to_timestamp_proto(timestamp_secs):
+  """Converts seconds since epoch to a google.protobuf.Timestamp.
+  """
+  seconds = int(timestamp_secs)
+  nanos = int((timestamp_secs - seconds) * 10**9)
+  return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos)
+
+
+class InMemoryReader(object):
+  def __init__(self, tag=None):
+self._records = [InteractiveStreamHeader(tag=tag).SerializeToString()]
 
 Review comment:
   The code for the underlying PCollection Cache isn't merged yet, and I 
haven't decided if I want to create a new subclass or use it directly. This 
code assumes that the PCollection cache will just writes strings to a file. I 
can change it later once the PCollection Cache code is merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338270)
Time Spent: 16h  (was: 15h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338271
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:19
Start Date: 04/Nov/19 18:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342196995
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache_test.py
 ##
 @@ -0,0 +1,167 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import unittest
+
+from google.protobuf import timestamp_pb2
+
+from apache_beam import coders
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.runners.interactive.caching.streaming_cache import 
StreamingCache
+from apache_beam.utils import timestamp
+
+
+def to_timestamp_proto(timestamp_secs):
+  """Converts seconds since epoch to a google.protobuf.Timestamp.
+  """
+  seconds = int(timestamp_secs)
+  nanos = int((timestamp_secs - seconds) * 10**9)
+  return timestamp_pb2.Timestamp(seconds=seconds, nanos=nanos)
+
+
+class InMemoryReader(object):
+  def __init__(self, tag=None):
+self._records = [InteractiveStreamHeader(tag=tag).SerializeToString()]
 
 Review comment:
   I agree it would be cleaner, but the code for the underlying PCollection 
Cache isn't merged yet, and I haven't decided if I want to create a new 
subclass or use it directly. This code assumes that the PCollection cache will 
just writes strings to a file. I can change it later once the PCollection Cache 
code is merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338271)
Time Spent: 16h 10m  (was: 16h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338272
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:19
Start Date: 04/Nov/19 18:19
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342197137
 
 

 ##
 File path: sdks/python/apache_beam/testing/interactive_stream.py
 ##
 @@ -0,0 +1,72 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import time
+from concurrent.futures import ThreadPoolExecutor
+
+import grpc
+
+from apache_beam.portability.api import beam_interactive_api_pb2
+from apache_beam.portability.api import beam_interactive_api_pb2_grpc
+from apache_beam.portability.api.beam_interactive_api_pb2_grpc import 
InteractiveServiceServicer
+
+
+class InteractiveStreamController(InteractiveServiceServicer):
+  def __init__(self, streaming_cache, endpoint=None):
+self._server = grpc.server(ThreadPoolExecutor(max_workers=10))
+
+if endpoint:
+  self.endpoint = endpoint
+  self._server.add_insecure_port(self.endpoint)
+else:
+  port = self._server.add_insecure_port('[::]:0')
+  self.endpoint = '[::]:{}'.format(port)
+
+beam_interactive_api_pb2_grpc.add_InteractiveServiceServicer_to_server(
+self, self._server)
+self._streaming_cache = streaming_cache
+self._playback_speed = 1 / 100.0
 
 Review comment:
   Removed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338272)
Time Spent: 16h 20m  (was: 16h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338267
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:17
Start Date: 04/Nov/19 18:17
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342195990
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = {}
+  for tag, r in self._readers.items():
 
 Review comment:
   Good catch, I changed it to be able to read from multiple underlying caches. 
PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338267)
Time Spent: 15h 40m  (was: 15.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira

[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=338269=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338269
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:18
Start Date: 04/Nov/19 18:18
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9686: 
[WIP][BEAM-5878] update dill min version to 0.3.1.1 and add test for functions 
with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9686#discussion_r342196700
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -106,8 +106,7 @@ def get_version():
 'avro>=1.8.1,<2.0.0; python_version < "3.0"',
 'avro-python3>=1.8.1,<2.0.0; python_version >= "3.0"',
 'crcmod>=1.7,<2.0',
-# Dill doesn't guarantee comatibility between releases within minor 
version.
-'dill>=0.3.0,<0.3.1',
+'dill>=0.3.1.1,<0.4.0',
 
 Review comment:
   Let's restrict this to < 0.3.2 due to possible backwards-incompatibility 
issues. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338269)
Time Spent: 15h 40m  (was: 15.5h)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338268
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:17
Start Date: 04/Nov/19 18:17
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342196163
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = {}
+  for tag, r in self._readers.items():
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records[tag] = record
+except StopIteration:
+  pass
+
+  events = []
+  if not records:
+self.advance_watermark(timestamp.MAX_TIMESTAMP, events)
+
+  records = sorted(records.items(), key=lambda x: x[1].processing_time)
+  for tag, r in records:
+# We always send the processing time event first so that the TestStream
+# can sleep so as to emulate the original stream.
+self.advance_processing_time(
+Timestamp.from_proto(r.processing_time), events)
+self.advance_watermark(Timestamp.from_proto(r.watermark), events,
+   tag=tag)
+
+events.append(TestStreamPayload.Event(
+element_event=TestStreamPayload.Event.AddElements(
+elements=[r.element], tag=tag)))
+  return events
+
+def advance_processing_time(self, processing_time, events):
+  """Advances the internal clock state and injects an AdvanceProcessingTime
+ event.
+  """
+  if self._timestamp != processing_time:
+duration = timestamp.Duration(
+micros=processing_time.micros - self._timestamp.micros)
+
+self._timestamp = processing_time
+processing_time_event = TestStreamPayload.Event.AdvanceProcessingTime(
+advance_duration=duration.micros)
+events.append(TestStreamPayload.Event(
+

[jira] [Work logged] (BEAM-8526) Turn on metrics publishing in load tests on Flink

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8526?focusedWorklogId=338265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338265
 ]

ASF GitHub Bot logged work on BEAM-8526:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:16
Start Date: 04/Nov/19 18:16
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9948: [BEAM-8526] 
Turn on metrics publishing in load tests on Flink
URL: https://github.com/apache/beam/pull/9948
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338265)
Time Spent: 2h 20m  (was: 2h 10m)

> Turn on metrics publishing in load tests on Flink
> -
>
> Key: BEAM-8526
> URL: https://issues.apache.org/jira/browse/BEAM-8526
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> After [https://github.com/apache/beam/pull/9843] being merged, we are now 
> able to gather metrics on portable runners as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8526) Turn on metrics publishing in load tests on Flink

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8526?focusedWorklogId=338264=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338264
 ]

ASF GitHub Bot logged work on BEAM-8526:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:16
Start Date: 04/Nov/19 18:16
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9948: [BEAM-8526] Turn on 
metrics publishing in load tests on Flink
URL: https://github.com/apache/beam/pull/9948#issuecomment-549480877
 
 
   This is great news!  Thank you @kamilwu for finishing this. 
   
   From what I understand - the last steps to have metrics from Python Flink's 
tests visualized is to simply create the dashboards in PerfkitExplorer, right? 
Could you do that? Let me know if you need help.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338264)
Time Spent: 2h 10m  (was: 2h)

> Turn on metrics publishing in load tests on Flink
> -
>
> Key: BEAM-8526
> URL: https://issues.apache.org/jira/browse/BEAM-8526
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> After [https://github.com/apache/beam/pull/9843] being merged, we are now 
> able to gather metrics on portable runners as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338266=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338266
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:16
Start Date: 04/Nov/19 18:16
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342195722
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -0,0 +1,124 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamHeader
+from apache_beam.portability.api.beam_interactive_api_pb2 import 
InteractiveStreamRecord
+from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload
+from apache_beam.utils import timestamp
+from apache_beam.utils.timestamp import Timestamp
+
+
+class StreamingCache(object):
+  """Abstraction that holds the logic for reading and writing to cache.
+  """
+  def __init__(self, readers):
+self._readers = readers
+
+  class Reader(object):
+"""Abstraction that reads from PCollection readers.
+
+This class is an Abstraction layer over multiple PCollection readers to be
+used for supplying the Interactive Service with TestStream events.
+
+This class is also responsible for holding the state of the clock, 
injecting
+clock advancement events, and watermark advancement events.
+"""
+def __init__(self, readers):
+  self._timestamp = Timestamp.of(0)
+  self._readers = {}
+  self._headers = {}
+  readers = [r.read() for r in readers]
+
+  # The header allows for metadata about an entire stream, so that the data
+  # isn't copied per record.
+  for r in readers:
+header = InteractiveStreamHeader()
+header.ParseFromString(next(r))
+
+# Main PCollections in Beam have a tag as None. Deserializing a Proto
+# with an empty tag becomes an empty string. Here we normalize to what
+# Beam expects.
+self._headers[header.tag if header.tag else None] = header
+self._readers[header.tag if header.tag else None] = r
+
+  self._watermarks = {tag: timestamp.MIN_TIMESTAMP for tag in 
self._headers}
+
+def read(self):
+  """Reads records from PCollection readers.
+  """
+  records = {}
+  for tag, r in self._readers.items():
+try:
+  record = InteractiveStreamRecord()
+  record.ParseFromString(next(r))
+  records[tag] = record
+except StopIteration:
+  pass
+
+  events = []
+  if not records:
+self.advance_watermark(timestamp.MAX_TIMESTAMP, events)
+
+  records = sorted(records.items(), key=lambda x: x[1].processing_time)
+  for tag, r in records:
+# We always send the processing time event first so that the TestStream
+# can sleep so as to emulate the original stream.
+self.advance_processing_time(
 
 Review comment:
   ack, changed to return the event.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338266)
Time Spent: 15.5h  (was: 15h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> This issue tracks

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338263
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:15
Start Date: 04/Nov/19 18:15
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342195237
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
 
 Review comment:
   I would rather not, this is pretty specific to the InteractiveRunner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338263)
Time Spent: 15h 20m  (was: 15h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338262
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:14
Start Date: 04/Nov/19 18:14
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342194547
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
 
 Review comment:
   In order to allow for quiescence, I changed this from a stream to a single 
response per request. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338262)
Time Spent: 15h 10m  (was: 15h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=338261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338261
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:12
Start Date: 04/Nov/19 18:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9188: [BEAM-7886] Make row 
coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-549479136
 
 
   Yes, go ahead and squash into meaningful commits and merge. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338261)
Time Spent: 15h  (was: 14h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=338260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338260
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:12
Start Date: 04/Nov/19 18:12
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9720: 
[BEAM-8335] Add initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r342193604
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * Protocol Buffers describing a service that can be used in conjunction with
+ * the TestStream class in order to control a pipeline remotely.
+ */
+
+syntax = "proto3";
+
+package org.apache.beam.model.interactive.v1;
+
+option go_package = "interactive_v1";
+option java_package = "org.apache.beam.model.interactive.v1";
+option java_outer_classname = "BeamInteractiveApi";
+
+import "beam_runner_api.proto";
+import "google/protobuf/timestamp.proto";
+
+
+service InteractiveService {
+  // A TestStream will request for events using this RPC.
+  rpc Events(EventsRequest) returns (stream EventsResponse) {}
+}
+
+message EventsRequest { }
+message EventsResponse {
+  // The TestStreamPayloads that will be sent to the TestStream.
+  repeated org.apache.beam.model.pipeline.v1.TestStreamPayload.Event events = 
1;
+
+  // Is true when there are no more events to read.
+  bool end_of_stream = 2;
+}
+
+// The first record to be read in an interactive stream. This contains metadata
+// about the stream and how to properly process it.
+message InteractiveStreamHeader {
+  // The PCollection tag this stream is associated with.
+  string tag = 1;
+}
+
+// A record is a recorded element that sound source produced. Its function is
+// to give enough information to the InteractiveService to create a faithful
+// recreation of the original source of data.
+message InteractiveStreamRecord {
 
 Review comment:
   I don't fully understand what you are saying, but I'm guessing you want a 
single proto with a repeated field for events and watermark/clock advances? In 
that case, it wouldn't work. This is going to be the file format for storing 
the events of an unbounded source. The expectation is that the file may not fit 
into memory, so it will be streamed in one-by-one. This necessitates having 
individual records instead of a single proto for the file.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338260)
Time Spent: 15h  (was: 14h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-11-04 Thread Lukasz Gajowy (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966883#comment-16966883
 ] 

Lukasz Gajowy commented on BEAM-8432:
-

I think you're right - I missed this one. Submitting PR...

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4420) Add KafkaIO Integration Tests

2019-11-04 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-4420?focusedWorklogId=338250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-338250
 ]

ASF GitHub Bot logged work on BEAM-4420:


Author: ASF GitHub Bot
Created on: 04/Nov/19 18:02
Start Date: 04/Nov/19 18:02
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9871: [BEAM-4420] Kafka 
ioit jenkins job
URL: https://github.com/apache/beam/pull/9871#issuecomment-549474881
 
 
   R: @mwalenia 
   
   Could you take a look? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 338250)
Time Spent: 7.5h  (was: 7h 20m)

> Add KafkaIO Integration Tests
> -
>
> Key: BEAM-4420
> URL: https://issues.apache.org/jira/browse/BEAM-4420
> Project: Beam
>  Issue Type: Test
>  Components: io-java-kafka, testing
>Reporter: Ismaël Mejía
>Assignee: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> It is a good idea to have ITs for KafkaIO.
> There are two possible issues:
> 1. The tests should probably invert the pattern to be readThenWrite given 
> that Unbounded IOs block on Read and ...
> 2. Until we have a way to do PAsserts on Unbounded sources we can rely on 
> withMaxNumRecords to ensure this test ends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 >

1 - 100 of 177 matches

Mail list logo