[jira] [Commented] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking

2019-11-19 Thread Alex Van Boxel (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978167#comment-16978167
 ] 

Alex Van Boxel commented on BEAM-8174:
--

I've removed the fixed version on this

> BigQueryIO clustering documentation is incorrect and lacking
> 
>
> Key: BEAM-8174
> URL: https://issues.apache.org/jira/browse/BEAM-8174
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Trivial
>  Labels: documentation
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I noticed that the Java doc of the clustering feature in BigQueryIO is more a 
> copy/paste from the timestamp method. This needs to be corrected.
> The Clustering option should also be added to the BigQueryIO page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking

2019-11-19 Thread Alex Van Boxel (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Van Boxel updated BEAM-8174:
-
Fix Version/s: (was: 2.17.0)

> BigQueryIO clustering documentation is incorrect and lacking
> 
>
> Key: BEAM-8174
> URL: https://issues.apache.org/jira/browse/BEAM-8174
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Trivial
>  Labels: documentation
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I noticed that the Java doc of the clustering feature in BigQueryIO is more a 
> copy/paste from the timestamp method. This needs to be corrected.
> The Clustering option should also be added to the BigQueryIO page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978155#comment-16978155
 ] 

Valentyn Tymofieiev edited comment on BEAM-8651 at 11/20/19 7:33 AM:
-

[https://github.com/apache/beam/pull/10167] may be a way to address this issue 
in Beam plane, and unblock users. I cannot say at the moment whether broken 
module imports during concurrent unpickling  is a known/expected behavior from 
Dill/CPython perspective.


was (Author: tvalentyn):
[https://github.com/apache/beam/pull/10167] may be a way to address this issue 
in Beam plane, and unblock users. I cannot say at the moment whether the race 
is a known/expected behavior from Dill/CPython perspective.

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Attachments: beam8651.py
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978155#comment-16978155
 ] 

Valentyn Tymofieiev commented on BEAM-8651:
---

[https://github.com/apache/beam/pull/10167] may be a way to address this issue 
in Beam plane, and unblock users. I cannot say at the moment whether the race 
is a known/expected behavior from Dill/CPython perspective.

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Attachments: beam8651.py
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=346520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346520
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:26
Start Date: 20/Nov/19 07:26
Worklog Time Spent: 10m 
  Work Description: jfarr commented on issue #9899: [BEAM-8511] [WIP] 
KinesisIO.Read enhanced fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-555874874
 
 
   @aromanenko-dev thank you for the feedback. I’ll work on getting those 
changes in. I’m sorry, the last week or so has been very busy for me. I will be 
on vacation from my day job all of next week so I hope to wrap up this one and 
#9765 as much as I can. I have been testing this against our production data 
stream for a while now and functionally it’s been working very well but I am 
still seeing some unexpected latencies at higher throughputs and working to get 
to the bottom of that. I’ll try to post some info about that as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346520)
Time Spent: 2h 40m  (was: 2.5h)

> Support for enhanced fan-out in KinesisIO.Read
> --
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346519
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:25
Start Date: 20/Nov/19 07:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard 
pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#issuecomment-555874590
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346519)
Time Spent: 0.5h  (was: 20m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Attachments: beam8651.py
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346518
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:25
Start Date: 20/Nov/19 07:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard 
pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#issuecomment-555874543
 
 
   Run Python 2.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346518)
Time Spent: 20m  (was: 10m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Attachments: beam8651.py
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346516
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:21
Start Date: 20/Nov/19 07:21
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10167: [BEAM-8651] 
Guard pickling operations with a lock to prevent race condition in module 
imports.
URL: https://github.com/apache/beam/pull/10167
 
 
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346509
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:15
Start Date: 20/Nov/19 07:15
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-555871767
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346509)
Time Spent: 3h 40m  (was: 3.5h)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346508
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:14
Start Date: 20/Nov/19 07:14
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-555871445
 
 
   @kennknowles The tests always end up with the same state on this branch.
   
   eg.
   ```
   16:37:51 org.apache.beam.examples.WindowedWordCountIT > 
testWindowedWordCountInStreamingStaticSharding FAILED
   16:37:51 java.lang.RuntimeException at WindowedWordCountIT.java:188
   17:20:15 
   17:20:15 org.apache.beam.examples.WordCountIT > testE2EWordCount FAILED
   17:20:15 java.lang.RuntimeException at WordCountIT.java:69
   17:40:50 
   17:40:50 org.apache.beam.examples.WindowedWordCountIT > 
testWindowedWordCountInBatchStaticSharding FAILED
   17:40:50 java.lang.RuntimeException at WindowedWordCountIT.java:188
   18:15:28 Build timed out (after 120 minutes). Marking the build as aborted.
   18:15:28 Build was aborted
   18:15:28 Recording test results
   ```
   
   When I checked the source code of these tests, they seem configured have 
input and output configured `gcs://` paths. When I try to run them locally I 
end up with...
   
   ```
   com.google.api.client.http.HttpResponseException: 400 Bad Request
   {
 "error": "invalid_grant",
 "error_description": "Bad Request"
   }
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1102)
at 
com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:227)
at 
com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:181)
at 
com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:167)
at 
com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
at 
com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
at 
com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:381)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:357)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.queue(AbstractGoogleClientRequest.java:662)
at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.queue(AbstractGoogleJsonClientRequest.java:108)
at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.enqueueGetFileSize(GcsUtil.java:755)
at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.makeGetBatches(GcsUtil.java:608)
   ```
   
   which also implies that they are using `gcs` instead of local fs.
   
   Btw, this doesn't fail on Master branch: 
[jenkins](https://builds.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/5443/)
 [PR](https://github.com/apache/beam/pull/10024)
   
   What should I do next? I'm don't think this is ok to merge as the failure 
seem to be deterministic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346508)
Time Spent: 3.5h  (was: 3h 20m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346506
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 20/Nov/19 07:13
Start Date: 20/Nov/19 07:13
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-555871445
 
 
   @kennknowles The tests always end up with the same state on this branch.
   
   eg.
   ```
   16:37:51 org.apache.beam.examples.WindowedWordCountIT > 
testWindowedWordCountInStreamingStaticSharding FAILED
   16:37:51 java.lang.RuntimeException at WindowedWordCountIT.java:188
   17:20:15 
   17:20:15 org.apache.beam.examples.WordCountIT > testE2EWordCount FAILED
   17:20:15 java.lang.RuntimeException at WordCountIT.java:69
   17:40:50 
   17:40:50 org.apache.beam.examples.WindowedWordCountIT > 
testWindowedWordCountInBatchStaticSharding FAILED
   17:40:50 java.lang.RuntimeException at WindowedWordCountIT.java:188
   18:15:28 Build timed out (after 120 minutes). Marking the build as aborted.
   18:15:28 Build was aborted
   18:15:28 Recording test results
   ```
   
   When I checked the source code of these tests, they seem configured have 
input and output configured `gcs://` paths. When I try to run them locally I 
end up with...
   
   ```
   com.google.api.client.http.HttpResponseException: 400 Bad Request
   {
 "error": "invalid_grant",
 "error_description": "Bad Request"
   }
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1102)
at 
com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:227)
at 
com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:181)
at 
com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:167)
at 
com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
at 
com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52)
at 
com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:381)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:357)
at 
com.google.api.client.googleapis.services.AbstractGoogleClientRequest.queue(AbstractGoogleClientRequest.java:662)
at 
com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.queue(AbstractGoogleJsonClientRequest.java:108)
at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.enqueueGetFileSize(GcsUtil.java:755)
at 
org.apache.beam.sdk.extensions.gcp.util.GcsUtil.makeGetBatches(GcsUtil.java:608)
   ```
   
   which also implies that they are using `gcs` instead of local fs.
   
   Btw, this doesn't fail on Master branch: 
(jenkins)[https://builds.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/5443/],
 (PR)[https://github.com/apache/beam/pull/10024]
   
   What should I do next? I'm don't think this is ok to merge as the failure 
seem to be deterministic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346506)
Time Spent: 3h 20m  (was: 3h 10m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration

2019-11-19 Thread Jira


[ 
https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978142#comment-16978142
 ] 

Ismaël Mejía commented on BEAM-8384:


I have forgotten about this one. Looks like a regression but not convinced it 
is a blocker. I will move the Fix version tag to unblock the release and 
eventually send a cherry pick if still in time.

> Spark runner is not respecting spark.default.parallelism user defined 
> configuration
> ---
>
> Key: BEAM-8384
> URL: https://issues.apache.org/jira/browse/BEAM-8384
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>
> It was reported in [the mailing 
> list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E]
>  that the Spark runner is not respecting the user defined Spark default 
> parallelism configuration. We should investigate and if it is the case ensure 
> that a user defined configuration is always respected. Runner optimizations 
> should apply only for default (unconfigured) values otherwise we will confuse 
> users and limit them from parametrizing Spark for their best convenience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration

2019-11-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8384:
---
Fix Version/s: (was: 2.17.0)

> Spark runner is not respecting spark.default.parallelism user defined 
> configuration
> ---
>
> Key: BEAM-8384
> URL: https://issues.apache.org/jira/browse/BEAM-8384
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>
> It was reported in [the mailing 
> list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E]
>  that the Spark runner is not respecting the user defined Spark default 
> parallelism configuration. We should investigate and if it is the case ensure 
> that a user defined configuration is always respected. Runner optimizations 
> should apply only for default (unconfigured) values otherwise we will confuse 
> users and limit them from parametrizing Spark for their best convenience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346493
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:34
Start Date: 20/Nov/19 06:34
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-555860949
 
 
   Looks good to me: https://gradle.com/s/c47e35q47qzpq
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346493)
Time Spent: 3h 10m  (was: 3h)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346492
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:33
Start Date: 20/Nov/19 06:33
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348309394
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   Maybe we can have both. 
   
   Would you please explain how to control some values 'alive' while others in 
the same iterable not? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346492)
Time Spent: 6h 40m  (was: 6.5h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346490
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:28
Start Date: 20/Nov/19 06:28
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348309394
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   Maybe we can keep both. 
   
   Would you please explain how to control some values 'alive' while others in 
the same iterable not? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346490)
Time Spent: 6.5h  (was: 6h 20m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346489
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:28
Start Date: 20/Nov/19 06:28
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348309394
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   Maybe let us keep both. 
   
   Would you please explain how to control some values 'alive' while others in 
the same iterable not? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346489)
Time Spent: 6h 20m  (was: 6h 10m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346481
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:19
Start Date: 20/Nov/19 06:19
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9922: 
[BEAM-7390] Add code snippets for CombineValues
URL: https://github.com/apache/beam/pull/9922#discussion_r348307381
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py
 ##
 @@ -0,0 +1,246 @@
+# coding=utf-8
 
 Review comment:
   Changed the examples for more unique use cases
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346481)
Time Spent: 40m  (was: 0.5h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346479
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:19
Start Date: 20/Nov/19 06:19
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9920: 
[BEAM-7390] Add code snippets for CombineGlobally
URL: https://github.com/apache/beam/pull/9920#discussion_r348307273
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py
 ##
 @@ -0,0 +1,214 @@
+# coding=utf-8
 
 Review comment:
   I changed the examples for the Combine* transforms and GBK. Hopefully, they 
make more sense now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346479)
Time Spent: 0.5h  (was: 20m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346477
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 20/Nov/19 06:04
Start Date: 20/Nov/19 06:04
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-555853543
 
 
   Where are you seeing the logs indicating GCS? What I see in the logs is 
entirely not useful. I can certainly try to reproduce this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346477)
Time Spent: 3h  (was: 2h 50m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8740) TestPubsub ignores timeout

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8740?focusedWorklogId=346474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346474
 ]

ASF GitHub Bot logged work on BEAM-8740:


Author: ASF GitHub Bot
Created on: 20/Nov/19 05:46
Start Date: 20/Nov/19 05:46
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10153: 
[BEAM-8740] TestPubsub ignores timeout
URL: https://github.com/apache/beam/pull/10153#discussion_r348300443
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java
 ##
 @@ -225,7 +227,7 @@ public void publish(List messages) throws 
IOException {
 receivedMessages.addAll(pull(n - receivedMessages.size()));
 
 while (receivedMessages.size() < n
-&& Seconds.secondsBetween(new DateTime(), startTime).getSeconds() < 
timeoutSeconds) {
+&& Seconds.secondsBetween(startTime, new DateTime()).getSeconds() < 
timeoutSeconds) {
 
 Review comment:
   ouch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346474)
Remaining Estimate: 0h
Time Spent: 10m

> TestPubsub ignores timeout
> --
>
> Key: BEAM-8740
> URL: https://issues.apache.org/jira/browse/BEAM-8740
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8740) TestPubsub ignores timeout

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8740?focusedWorklogId=346475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346475
 ]

ASF GitHub Bot logged work on BEAM-8740:


Author: ASF GitHub Bot
Created on: 20/Nov/19 05:46
Start Date: 20/Nov/19 05:46
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10153: 
[BEAM-8740] TestPubsub ignores timeout
URL: https://github.com/apache/beam/pull/10153#discussion_r348300547
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java
 ##
 @@ -202,9 +202,11 @@ public void publish(List messages) throws 
IOException {
   public List pull(int maxBatchSize) throws IOException {
 List messages =
 pubsub.pull(0, subscriptionPath, maxBatchSize, true);
-pubsub.acknowledge(
-subscriptionPath,
-messages.stream().map(msg -> 
msg.ackId).collect(ImmutableList.toImmutableList()));
+if (!messages.isEmpty()) {
 
 Review comment:
   out of curiosity, was this a crasher?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346475)
Time Spent: 20m  (was: 10m)

> TestPubsub ignores timeout
> --
>
> Key: BEAM-8740
> URL: https://issues.apache.org/jira/browse/BEAM-8740
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978073#comment-16978073
 ] 

Kenneth Knowles commented on BEAM-8198:
---

Ping. This seems to be sitting for a while?

> Investigate possible performance regression of Wordcount 1GB batch benchmark 
> on Py3.
> 
>
> Key: BEAM-8198
> URL: https://issues.apache.org/jira/browse/BEAM-8198
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.17.0
>
>
> context: 
> https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E
> Setting fix version to 2.16.0 to understand the cause, hopefully before the 
> vote.
> cc: [~altay] [~thw] [~markflyhigh]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978075#comment-16978075
 ] 

Kenneth Knowles commented on BEAM-8174:
---

What is the status of this? It is blocking 2.17.0 and I agree it would be nice 
to have good javadoc for this.

> BigQueryIO clustering documentation is incorrect and lacking
> 
>
> Key: BEAM-8174
> URL: https://issues.apache.org/jira/browse/BEAM-8174
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.15.0
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Trivial
>  Labels: documentation
> Fix For: 2.17.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> I noticed that the Java doc of the clustering feature in BigQueryIO is more a 
> copy/paste from the timestamp method. This needs to be corrected.
> The Clustering option should also be added to the BigQueryIO page.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8363) Nexmark regression in direct runner in streaming mode

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978072#comment-16978072
 ] 

Kenneth Knowles commented on BEAM-8363:
---

I inspected the history of sdks/java/core and did not find anything obvious. I 
believe this will require a git bisect.

> Nexmark regression in direct runner in streaming mode
> -
>
> Key: BEAM-8363
> URL: https://issues.apache.org/jira/browse/BEAM-8363
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.16.0
>Reporter: Mark Liu
>Assignee: Shehzaad Nakhoda
>Priority: Critical
> Fix For: 2.17.0
>
> Attachments: regression_screenshot.png
>
>
> From Nexmark performance dashboard for direct runner: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, 
> there is a regression for streaming mode happened on August 25.
> The runtime increased about 25% in two days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8363) Nexmark regression in direct runner in streaming mode

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978070#comment-16978070
 ] 

Kenneth Knowles commented on BEAM-8363:
---

The main difference is the use of a custom UnboundedSource if isStreaming() is 
set.

 

> Nexmark regression in direct runner in streaming mode
> -
>
> Key: BEAM-8363
> URL: https://issues.apache.org/jira/browse/BEAM-8363
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.16.0
>Reporter: Mark Liu
>Assignee: Shehzaad Nakhoda
>Priority: Critical
> Fix For: 2.17.0
>
> Attachments: regression_screenshot.png
>
>
> From Nexmark performance dashboard for direct runner: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, 
> there is a regression for streaming mode happened on August 25.
> The runtime increased about 25% in two days.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978069#comment-16978069
 ] 

Kenneth Knowles commented on BEAM-8368:
---

I see a linked PR merged into 2.17.0 branch. Is this done now?

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978068#comment-16978068
 ] 

Kenneth Knowles commented on BEAM-8384:
---

[~iemejia] is this in progress? Is there a chance for it to be done for 2.17.0? 
Is it a regression?

> Spark runner is not respecting spark.default.parallelism user defined 
> configuration
> ---
>
> Key: BEAM-8384
> URL: https://issues.apache.org/jira/browse/BEAM-8384
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
> Fix For: 2.17.0
>
>
> It was reported in [the mailing 
> list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E]
>  that the Spark runner is not respecting the user defined Spark default 
> parallelism configuration. We should investigate and if it is the case ensure 
> that a user defined configuration is always respected. Runner optimizations 
> should apply only for default (unconfigured) values otherwise we will confuse 
> users and limit them from parametrizing Spark for their best convenience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-19 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978067#comment-16978067
 ] 

Kenneth Knowles commented on BEAM-8504:
---

[~kanterov] would you drive getting a cherrypick in since you requested?

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346446
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 03:44
Start Date: 20/Nov/19 03:44
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10132: 
[BEAM-8016] Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348279216
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -107,6 +135,17 @@ def _top_level_transforms(self):
 top_level_transform_proto = transforms[top_level_transform_id]
 yield top_level_transform_id, top_level_transform_proto
 
+  def _decorate(self, value):
+"""Decorates label-ish values used for rendering in dot language.
+
+Escapes special characters in the given str value for dot language. Please
+escape all PTransform unique names when building dot representation.
+Otherwise, special characters will break the graph rendered.
 
 Review comment:
   Is this statement still valid? Does 'value' need to be escaped any more?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346446)
Time Spent: 5h 20m  (was: 5h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346447
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 03:44
Start Date: 20/Nov/19 03:44
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10132: 
[BEAM-8016] Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348279347
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -55,9 +63,17 @@ def __init__(self,
   pipeline: (Pipeline proto) or (Pipeline) pipeline to be rendered.
   default_vertex_attrs: (Dict[str, str]) a dict of default vertex 
attributes
   default_edge_attrs: (Dict[str, str]) a dict of default edge attributes
+  render_option: (str) this parameter decides how the pipeline graph is
+  rendered. See display.pipeline_graph_renderer for available options.
 """
 self._lock = threading.Lock()
 self._graph = None
+self._pin = None
 
 Review comment:
   Can we also spell out this one?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346447)
Time Spent: 5h 20m  (was: 5h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346439
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 03:13
Start Date: 20/Nov/19 03:13
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #10166: 
[BEAM-7390] Add code snippet for Latest
URL: https://github.com/apache/beam/pull/10166
 
 
   Adding the code snippets for `Latest`.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346438
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 03:11
Start Date: 20/Nov/19 03:11
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #10165: 
[BEAM-7390] Add code snippet for GroupIntoBatches
URL: https://github.com/apache/beam/pull/10165
 
 
   Adding the code snippets for `GroupIntoBatches`.
   
   > Questions:
   > * What is the difference between `GroupIntoBatches` and `BatchElements`?
   > * Which is the recommended approach?
   > * I found that `BatchElements` was both more intuitive and flexible since 
it doesn't require a key, should we get rid of the `GroupIntoBatches` example 
in favor of `BatchElements` or is there a scenario where `GroupIntoBatches` 
make more sense?
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs

2019-11-19 Thread David Cavazos (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cavazos updated BEAM-8784:

Description: 
PyDoc: 
[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]

The combiners.Top PyDoc still shows the `compare` argument as usable. It is 
deprecated and results in an error in Python 3, but the PyDoc does not reflect 
that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.

  was:
The [combiners.Top 
PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]]
 still shows the `compare` argument as usable. It is deprecated and results in 
an error in Python 3, but the PyDoc does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.


> Remove deprecated 'compare' argument from combiners.Top in PyDocs
> -
>
> Key: BEAM-8784
> URL: https://issues.apache.org/jira/browse/BEAM-8784
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Cavazos
>Priority: Trivial
>
> PyDoc: 
> [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]
> The combiners.Top PyDoc still shows the `compare` argument as usable. It is 
> deprecated and results in an error in Python 3, but the PyDoc does not 
> reflect that.
> It should say that the argument is deprecated in favor of using the `key` and 
> `reverse` arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs

2019-11-19 Thread David Cavazos (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cavazos updated BEAM-8784:

Description: 
The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] 
[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]
 []|#apache_beam.transforms.combiners.Top]] still shows the `compare` argument 
as usable. It is deprecated and results in an error in Python 3, but the PyDoc 
does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.

  was:
The [combiners.Top 
PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]]
 still shows the `compare` argument as usable. It is deprecated and results in 
an error in Python 3, but the PyDoc does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.


> Remove deprecated 'compare' argument from combiners.Top in PyDocs
> -
>
> Key: BEAM-8784
> URL: https://issues.apache.org/jira/browse/BEAM-8784
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Cavazos
>Priority: Trivial
>
> The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] 
> [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]
>  []|#apache_beam.transforms.combiners.Top]] still shows the `compare` 
> argument as usable. It is deprecated and results in an error in Python 3, but 
> the PyDoc does not reflect that.
> It should say that the argument is deprecated in favor of using the `key` and 
> `reverse` arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs

2019-11-19 Thread David Cavazos (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Cavazos updated BEAM-8784:

Description: 
The [combiners.Top 
PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]]
 still shows the `compare` argument as usable. It is deprecated and results in 
an error in Python 3, but the PyDoc does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.

  was:
The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] 
[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]
 []|#apache_beam.transforms.combiners.Top]] still shows the `compare` argument 
as usable. It is deprecated and results in an error in Python 3, but the PyDoc 
does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.


> Remove deprecated 'compare' argument from combiners.Top in PyDocs
> -
>
> Key: BEAM-8784
> URL: https://issues.apache.org/jira/browse/BEAM-8784
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Cavazos
>Priority: Trivial
>
> The [combiners.Top 
> PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]]
>  still shows the `compare` argument as usable. It is deprecated and results 
> in an error in Python 3, but the PyDoc does not reflect that.
> It should say that the argument is deprecated in favor of using the `key` and 
> `reverse` arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs

2019-11-19 Thread David Cavazos (Jira)
David Cavazos created BEAM-8784:
---

 Summary: Remove deprecated 'compare' argument from combiners.Top 
in PyDocs
 Key: BEAM-8784
 URL: https://issues.apache.org/jira/browse/BEAM-8784
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: David Cavazos


The [combiners.Top 
PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]]
 still shows the `compare` argument as usable. It is deprecated and results in 
an error in Python 3, but the PyDoc does not reflect that.

It should say that the argument is deprecated in favor of using the `key` and 
`reverse` arguments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346430
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 02:44
Start Date: 20/Nov/19 02:44
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-555810399
 
 
   > I would say always avoid abbreviations that can be confusing. Perhaps just 
spell it out? >_pipeline_instrument
   
   Spelled out all variable names of this in the module.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346430)
Time Spent: 5h 10m  (was: 5h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8737) beam_Dependency_Check is missing bigtable-client-core as high priority items

2019-11-19 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki closed BEAM-8737.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> beam_Dependency_Check is missing bigtable-client-core as high priority items
> 
>
> Key: BEAM-8737
> URL: https://issues.apache.org/jira/browse/BEAM-8737
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> beam_Dependency_Check is missing high priority items such as 
> bigtable-client-core.
> For example, 
> https://builds.apache.org/job/beam_Dependency_Check/235/consoleFull sent 
> email but the email did not contain bigtable-client-core. The current version 
> 1.8.0 is 4-minor-version older than the latest 1.12.1.
> Initially I was suspecting the line with {{high_priority_deps.append}}:
> {noformat}
>   if (version_comparer.compare_dependency_versions(curr_ver, latest_ver) 
> or
>   compare_dependency_release_dates(curr_release_date, 
> latest_release_date)):
> # Create a new issue or update on the existing issue
> jira_issue = jira_manager.run(dep_name, curr_ver, latest_ver, 
> sdk_type, group_id = group_id)
> if (jira_issue.fields.status.name == 'Open' or
> jira_issue.fields.status.name == 'Reopened'):
>   dep_info += "{1}".format(
> ReportGeneratorConfig.BEAM_JIRA_HOST+"browse/"+ jira_issue.key,
> jira_issue.key)
>   high_priority_deps.append(dep_info)
> {noformat}
> and 2nd run would include the artifact in the email. But it did not: 
> https://builds.apache.org/job/beam_Dependency_Check/237/
> {noformat}
> 15:43:02 Start processing:  - com.google.cloud.bigtable:bigtable-client-core 
> [1.8.0 -> 1.12.1]
> 15:43:02 
> 15:43:02 INFO:root:Finding release date of 
> com.google.cloud.bigtable:bigtable-client-core 1.8.0 from the Maven Central
> 15:43:02 INFO:root:Finding release date of 
> com.google.cloud.bigtable:bigtable-client-core 1.12.1 from the Maven Central
> 15:43:03 INFO:root:Start handling the JIRA issues for Java dependency: 
> com.google.cloud.bigtable:bigtable-client-core 1.12.1
> 15:43:04 INFO:root:The parent issue BEAM-8690 is not opening. Attempt 
> reopening the issue
> 15:43:04 Traceback (most recent call last):
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/jira_utils/jira_manager.py",
>  line 90, in run
> 15:43:04 self.jira.reopen_issue(parent_issue)
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/jira_utils/jira_client.py",
>  line 130, in reopen_issue
> 15:43:04 self.jira.transition_issue(issue.key, 3)
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/client.py",
>  line 126, in wrapper
> 15:43:04 result = func(*arg_list, **kwargs)
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/client.py",
>  line 1578, in transition_issue
> 15:43:04 url, data=json.dumps(data))
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py",
>  line 154, in post
> 15:43:04 return self.__verb('POST', url, **kwargs)
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py",
>  line 147, in __verb
> 15:43:04 raise_on_error(response, verb=verb, **kwargs)
> 15:43:04   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py",
>  line 57, in raise_on_error
> 15:43:04 r.status_code, error, r.url, request=request, response=r, 
> **kwargs)
> 15:43:04 jira.exceptions.JIRAError: JiraError HTTP 400 url: 
> https://issues.apache.org/jira/rest/api/2/issue/BEAM-8690/transitions
> 15:43:04  text: It seems that you have tried to perform a workflow 
> operation (Reopen Issue) that is not valid for the current state of this 
> issue (BEAM-8690). The likely cause is that somebody has changed the issue 
> recently, please look at the issue history for details.
> 15:43:04  
> 15:43:04  response headers = {'X-AREQUESTID': '1243x60347388x3', 'Date': 
> 'Mon, 18 Nov 2019 20:43:04 GMT', 'X-AUSERNAME': , 'Cache-Control': 
> 'no-cache, no-store, no-transform', 'Server': 'Apache', 'Connection': 
> 'close', 'X-XSS-Protection': '1; mode=block', 'Content-Type': 

[jira] [Closed] (BEAM-8744) Fix error in Beam Dependency Check Report

2019-11-19 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki closed BEAM-8744.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Fix error in Beam Dependency Check Report
> -
>
> Key: BEAM-8744
> URL: https://issues.apache.org/jira/browse/BEAM-8744
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The latest changes had a bug.
> https://builds.apache.org/job/beam_Dependency_Check/238/console
> {noformat}
> 12:41:11 python -m dependency_check.dependency_check_report_generator_test
> 12:41:11 Traceback (most recent call last):
> 12:41:11   File "/usr/lib/python3.5/runpy.py", line 184, in 
> _run_module_as_main
> 12:41:11 "__main__", mod_spec)
> 12:41:11   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> 12:41:11 exec(code, run_globals)
> 12:41:11   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/dependency_check/dependency_check_report_generator_test.py",
>  line 26, in 
> 12:41:11 from .dependency_check_report_generator import 
> prioritize_dependencies
> 12:41:11   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/dependency_check/dependency_check_report_generator.py",
>  line 146
> 12:41:11 if (jira_issue and jira_issue.fields.status.name in ['Open', 
> 'Reopened', 'TRIAGE NEEDED'):
> 12:41:11  
>^
> 12:41:11 SyntaxError: invalid syntax
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346410
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:49
Start Date: 20/Nov/19 01:49
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10132: [BEAM-8016] 
Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348257200
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -107,6 +135,16 @@ def _top_level_transforms(self):
 top_level_transform_proto = transforms[top_level_transform_id]
 yield top_level_transform_id, top_level_transform_proto
 
+  def _decorate(self, value):
+"""Decorates label-ish values used for rendering in dot language.
+
+Escapes special characters in the given str value for dot language. '"' is
 
 Review comment:
   Thanks, David! Yes, `'\\"'` works for escaping `"` and I'll add a test for 
this class.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346410)
Time Spent: 5h  (was: 4h 50m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346399
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348250056
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
+
+assert_that(main, equal_to(['a', 2]))
+p.run()
 
 Review comment:
   No p.run needed when using the `with` statement. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346399)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346402
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348249550
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
 
 Review comment:
   I would call this something like test_gbk_many_values or similar. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346402)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346401
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348250260
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   Actually, a better test would be to ensure no more than N (for some value of 
N < number of elements) instances of the value type are alive at any given 
moment. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346401)
Time Spent: 6h 10m  (was: 6h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346400
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348249978
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
 
 Review comment:
   Rather than create all the values in memory, I'd create these with a DoFn. 
E.g.
   
   beam.Create([None]) | beam.FlatMap(lambda x: ((x, 1) for _ in range(2))) 
| ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346400)
Time Spent: 6h  (was: 5h 50m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346398
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348249763
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
 
 Review comment:
   As before, no need to name the GBK (or others). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346398)
Time Spent: 5h 50m  (was: 5h 40m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346397
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:18
Start Date: 20/Nov/19 01:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10143: [BEAM-8645] 
To test state backed iterable coder in py sdk.
URL: https://github.com/apache/beam/pull/10143#discussion_r348249348
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1579,6 +1580,27 @@ def test_lull_logging(self):
 '.*There has been a processing lull of over.*',
 'Unable to find a lull logged for this job.')
 
+@attr('ValidatesRunner')
+class FnApiBasedStateBackedCoderTest(unittest.TestCase):
+  def create_pipeline(self):
+return beam.Pipeline(
+runner=fn_api_runner.FnApiRunner(use_state_iterables=True))
+
+  def test_state_backed_coder(self):
+class MyDoFn(beam.DoFn):
+  def process(self, gbk_result):
+value_list = gbk_result[1]
+return (gbk_result[0], sum(value_list))
+
+with self.create_pipeline() as p:
+  # The number of integers could be a knob to test against
+  # different runners' default settings on page size.
+  main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)])
+  | 'GBK' >> beam.GroupByKey()
+  | 'Sum' >> beam.ParDo(MyDoFn()))
 
 Review comment:
   This could just be beam.MapTuple(lambda key, values: sum(values))
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346397)
Time Spent: 5h 40m  (was: 5.5h)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=346395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346395
 ]

ASF GitHub Bot logged work on BEAM-8658:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:11
Start Date: 20/Nov/19 01:11
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10163: [BEAM-8658] 
[BEAM-8781] Optionally set jar and artifact staging port …
URL: https://github.com/apache/beam/pull/10163
 
 
   …in FlinkUberJarJobServer
   
   Pass around options to fix:
   
   - BEAM-8658: add an `artifact_port` option. We need this because we need to 
expose a static port to use for artifact staging on Kubernetes (AFAIK gRPC does 
not support in-process mode on Python).
   - BEAM-8781: respect the existing `flink_job_server_jar` option. I wanted 
this so I could download the released job server jar from Maven instead of 
having to build it locally. This allows me to override the default behavior for 
.dev, which is to fail: 
https://github.com/apache/beam/blob/0b415fd7e9dd5c80034ca237e08c3959ec78ffe3/sdks/python/apache_beam/utils/subprocess_server.py#L176-L181
   We could consider changing that behavior (as indicated in the TODO), but it 
matters less if you can get around it if you really want to.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Created] (BEAM-8783) Document Python SDK pickling

2019-11-19 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8783:
---

 Summary: Document Python SDK pickling
 Key: BEAM-8783
 URL: https://issues.apache.org/jira/browse/BEAM-8783
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8782) Python typehints: with_output_types breaks multi-output dofns

2019-11-19 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8782:

Status: Open  (was: Triage Needed)

> Python typehints: with_output_types breaks multi-output dofns
> -
>
> Key: BEAM-8782
> URL: https://issues.apache.org/jira/browse/BEAM-8782
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> {code}
>   def test_typed_multi_pardo(self):
> p = TestPipeline()
> res = (p
>| beam.Create([1, 2, 3])
>| beam.Map(lambda e: e).with_outputs().with_output_types(int))
> self.assertIsNotNone(res[None].element_type)
> res_main = (res[None]
> | 'id_none' >> beam.ParDo(lambda e: 
> [e]).with_input_types(int))
> assert_that(res_main, equal_to([1, 2, 3]), label='none_check')
> p.run()
> {code}
> Fails with:
> {code}
> typed_pipeline_test.py:212: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../pvalue.py:113: in __or__
> return self.pipeline.apply(ptransform, self)
> ../pipeline.py:528: in apply
> transform.type_check_outputs(pvalueish_result)
> ../transforms/ptransform.py:386: in type_check_outputs
> self.type_check_inputs_or_outputs(pvalueish, 'output')
> ../transforms/ptransform.py:401: in type_check_inputs_or_outputs
> if pvalue_.element_type is None:
> ../pvalue.py:241: in __getattr__
> return self[tag]
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self =  label=[Map()]> at 0x7fa9513f3048>
> tag = 'element_type'
> def __getitem__(self, tag):
>   # Accept int tags so that we can look at Partition tags with the
>   # same ints that we used in the partition function.
>   # TODO(gildea): Consider requiring string-based tags everywhere.
>   # This will require a partition function that does not return ints.
>   if isinstance(tag, int):
> tag = str(tag)
>   if tag == self._main_tag:
> tag = None
>   elif self._tags and tag not in self._tags:
> raise ValueError(
> "Tag '%s' is neither the main tag '%s' "
> "nor any of the tags %s" % (
> tag, self._main_tag, self._tags))
>   # Check if we accessed this tag before.
>   if tag in self._pcolls:
> return self._pcolls[tag]
> 
>   if tag is not None:
> self._transform.output_tags.add(tag)
> pcoll = PCollection(self._pipeline, tag=tag, 
> element_type=typehints.Any)
> # Transfer the producer from the DoOutputsTuple to the resulting
> # PCollection.
> >   pcoll.producer = self.producer.parts[0]
> E   AttributeError: 'NoneType' object has no attribute 'parts'
> ../pvalue.py:266: AttributeError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8782) Python typehints: with_output_types breaks multi-output dofns

2019-11-19 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8782:
---

 Summary: Python typehints: with_output_types breaks multi-output 
dofns
 Key: BEAM-8782
 URL: https://issues.apache.org/jira/browse/BEAM-8782
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


{code}
  def test_typed_multi_pardo(self):
p = TestPipeline()
res = (p
   | beam.Create([1, 2, 3])
   | beam.Map(lambda e: e).with_outputs().with_output_types(int))
self.assertIsNotNone(res[None].element_type)
res_main = (res[None]
| 'id_none' >> beam.ParDo(lambda e: [e]).with_input_types(int))
assert_that(res_main, equal_to([1, 2, 3]), label='none_check')
p.run()
{code}

Fails with:
{code}
typed_pipeline_test.py:212: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../pvalue.py:113: in __or__
return self.pipeline.apply(ptransform, self)
../pipeline.py:528: in apply
transform.type_check_outputs(pvalueish_result)
../transforms/ptransform.py:386: in type_check_outputs
self.type_check_inputs_or_outputs(pvalueish, 'output')
../transforms/ptransform.py:401: in type_check_inputs_or_outputs
if pvalue_.element_type is None:
../pvalue.py:241: in __getattr__
return self[tag]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = )]> at 0x7fa9513f3048>
tag = 'element_type'

def __getitem__(self, tag):
  # Accept int tags so that we can look at Partition tags with the
  # same ints that we used in the partition function.
  # TODO(gildea): Consider requiring string-based tags everywhere.
  # This will require a partition function that does not return ints.
  if isinstance(tag, int):
tag = str(tag)
  if tag == self._main_tag:
tag = None
  elif self._tags and tag not in self._tags:
raise ValueError(
"Tag '%s' is neither the main tag '%s' "
"nor any of the tags %s" % (
tag, self._main_tag, self._tags))
  # Check if we accessed this tag before.
  if tag in self._pcolls:
return self._pcolls[tag]

  if tag is not None:
self._transform.output_tags.add(tag)
pcoll = PCollection(self._pipeline, tag=tag, element_type=typehints.Any)
# Transfer the producer from the DoOutputsTuple to the resulting
# PCollection.
>   pcoll.producer = self.producer.parts[0]
E   AttributeError: 'NoneType' object has no attribute 'parts'

../pvalue.py:266: AttributeError
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346391
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:05
Start Date: 20/Nov/19 01:05
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10132: 
[BEAM-8016] Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348243649
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
+render_option=self._render_option)
+  pg.display_graph()
 
 result = pipeline_to_execute.run()
 result.wait_until_finish()
 
-if not self._skip_display:
-  display.stop_periodic_update()
-
-return PipelineResult(result, self, self._analyzer.pipeline_info(),
-  self._cache_manager, pcolls_to_pcoll_id)
-
-  def _pcolls_to_pcoll_id(self, pipeline, original_context):
-"""Returns a dict mapping PCollections string to PCollection IDs.
-
-Using a PipelineVisitor to iterate over every node in the pipeline,
-records the mapping from PCollections to PCollections IDs. This mapping
-will be used to query cached PCollections.
-
-Args:
-  pipeline: (pipeline.Pipeline)
-  original_context: (pipeline_context.PipelineContext)
-
-Returns:
-  (dict from str to str) a dict mapping str(pcoll) to pcoll_id.
-"""
-pcolls_to_pcoll_id = {}
-
-from apache_beam.pipeline import PipelineVisitor  # pylint: 
disable=import-error
-
-class PCollVisitor(PipelineVisitor):  # pylint: 
disable=used-before-assignment
-  A visitor that records input and output values to be replaced.
-
-  Input and output values that should be updated are recorded in maps
-  input_replacements and output_replacements respectively.
-
-  We cannot update input and output values while visiting since that
-  results in validation errors.
-  """
-
-  def enter_composite_transform(self, transform_node):
-self.visit_transform(transform_node)
-
-  def visit_transform(self, transform_node):
-for pcoll in transform_node.outputs.values():
-  pcolls_to_pcoll_id[str(pcoll)] = 
original_context.pcollections.get_id(
-  pcoll)
-
-pipeline.visit(PCollVisitor())
-return pcolls_to_pcoll_id
+return PipelineResult(result, pin)
 
 
 class PipelineResult(beam.runners.runner.PipelineResult):
   """Provides access to information about a pipeline."""
 
-  def __init__(self, underlying_result, runner, pipeline_info, cache_manager,
-   pcolls_to_pcoll_id):
+  def __init__(self, underlying_result, pin):
 super(PipelineResult, self).__init__(underlying_result.state)
-self._runner = runner
-self._pipeline_info = pipeline_info
-self._cache_manager = cache_manager
-self._pcolls_to_pcoll_id = pcolls_to_pcoll_id
-
-  def _cache_label(self, pcoll):
-

[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346390
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:05
Start Date: 20/Nov/19 01:05
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10132: 
[BEAM-8016] Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348242820
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py
 ##
 @@ -107,6 +135,16 @@ def _top_level_transforms(self):
 top_level_transform_proto = transforms[top_level_transform_id]
 yield top_level_transform_id, top_level_transform_proto
 
+  def _decorate(self, value):
+"""Decorates label-ish values used for rendering in dot language.
+
+Escapes special characters in the given str value for dot language. '"' is
 
 Review comment:
   https://www.graphviz.org/doc/info/lang.html says '"' can be escaped with 
\\", and contrary to the comment, " is the only escaped character in dot and a 
\ not followed by a " is a literal \ . So perhaps we should change the code to 
return
   `'"{}"'.format(value.replace('"', '\\"'))`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346390)
Time Spent: 4.5h  (was: 4h 20m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346393
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 01:05
Start Date: 20/Nov/19 01:05
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on pull request #10132: 
[BEAM-8016] Pipeline Graph
URL: https://github.com/apache/beam/pull/10132#discussion_r348243649
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py
 ##
 @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options):
 return self._underlying_runner.apply(transform, pvalueish, options)
 
   def run_pipeline(self, pipeline, options):
-if not hasattr(self, '_desired_cache_labels'):
-  self._desired_cache_labels = set()
-
-# Invoke a round trip through the runner API. This makes sure the Pipeline
-# proto is stable.
-pipeline = beam.pipeline.Pipeline.from_runner_api(
-pipeline.to_runner_api(use_fake_coders=True),
-pipeline.runner,
-options)
-
-# Snapshot the pipeline in a portable proto before mutating it.
-pipeline_proto, original_context = pipeline.to_runner_api(
-return_context=True, use_fake_coders=True)
-pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context)
-
-analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager,
-  pipeline_proto,
-  self._underlying_runner,
-  options,
-  self._desired_cache_labels)
-# Should be only accessed for debugging purpose.
-self._analyzer = analyzer
+pin = inst.pin(pipeline, options)
 
 pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api(
-analyzer.pipeline_proto_to_execute(),
+pin.instrumented_pipeline_proto(),
 self._underlying_runner,
 options)
 
 if not self._skip_display:
-  display = display_manager.DisplayManager(
-  pipeline_proto=pipeline_proto,
-  pipeline_analyzer=analyzer,
-  cache_manager=self._cache_manager,
-  pipeline_graph_renderer=self._renderer)
-  display.start_periodic_update()
+  pg = pipeline_graph.PipelineGraph(pin.original_pipeline,
+render_option=self._render_option)
+  pg.display_graph()
 
 result = pipeline_to_execute.run()
 result.wait_until_finish()
 
-if not self._skip_display:
-  display.stop_periodic_update()
-
-return PipelineResult(result, self, self._analyzer.pipeline_info(),
-  self._cache_manager, pcolls_to_pcoll_id)
-
-  def _pcolls_to_pcoll_id(self, pipeline, original_context):
-"""Returns a dict mapping PCollections string to PCollection IDs.
-
-Using a PipelineVisitor to iterate over every node in the pipeline,
-records the mapping from PCollections to PCollections IDs. This mapping
-will be used to query cached PCollections.
-
-Args:
-  pipeline: (pipeline.Pipeline)
-  original_context: (pipeline_context.PipelineContext)
-
-Returns:
-  (dict from str to str) a dict mapping str(pcoll) to pcoll_id.
-"""
-pcolls_to_pcoll_id = {}
-
-from apache_beam.pipeline import PipelineVisitor  # pylint: 
disable=import-error
-
-class PCollVisitor(PipelineVisitor):  # pylint: 
disable=used-before-assignment
-  A visitor that records input and output values to be replaced.
-
-  Input and output values that should be updated are recorded in maps
-  input_replacements and output_replacements respectively.
-
-  We cannot update input and output values while visiting since that
-  results in validation errors.
-  """
-
-  def enter_composite_transform(self, transform_node):
-self.visit_transform(transform_node)
-
-  def visit_transform(self, transform_node):
-for pcoll in transform_node.outputs.values():
-  pcolls_to_pcoll_id[str(pcoll)] = 
original_context.pcollections.get_id(
-  pcoll)
-
-pipeline.visit(PCollVisitor())
-return pcolls_to_pcoll_id
+return PipelineResult(result, pin)
 
 
 class PipelineResult(beam.runners.runner.PipelineResult):
   """Provides access to information about a pipeline."""
 
-  def __init__(self, underlying_result, runner, pipeline_info, cache_manager,
-   pcolls_to_pcoll_id):
+  def __init__(self, underlying_result, pin):
 super(PipelineResult, self).__init__(underlying_result.state)
-self._runner = runner
-self._pipeline_info = pipeline_info
-self._cache_manager = cache_manager
-self._pcolls_to_pcoll_id = pcolls_to_pcoll_id
-
-  def _cache_label(self, pcoll):
-

[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346382
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:44
Start Date: 20/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r348241661
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool {
   // Stop the SDK worker.
   rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {}
 }
+
+// Request from runner to SDK Harness asking for its status.
+message WorkerStatusRequest {
+  // (Required) Unique ID identifying this request.
+  string request_id = 1;
+}
+
+// Response from SDK Harness to runner containing the debug related status 
info.
+message WorkerStatusResponse {
+  // (Required) Unique ID from the original request.
+  string request_id = 1;
 
 Review comment:
   ```suggestion
 string id = 1;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346382)
Time Spent: 2h  (was: 1h 50m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346384
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:44
Start Date: 20/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r348242096
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool {
   // Stop the SDK worker.
   rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {}
 }
+
+// Request from runner to SDK Harness asking for its status.
+message WorkerStatusRequest {
+  // (Required) Unique ID identifying this request.
+  string request_id = 1;
+}
+
+// Response from SDK Harness to runner containing the debug related status 
info.
+message WorkerStatusResponse {
+  // (Required) Unique ID from the original request.
+  string request_id = 1;
+  // (Optional) Error message if exception encountered generating the status 
response.
+  string error = 2;
+  // (Optional) Status debugging info reported by SDK harness worker.
+  string status_info = 3;
+}
+
+// Fn Api for SDK harness to report its debug-related statuses to runner.
 
 Review comment:
   ```suggestion
   // API for SDKs to report debug-related statuses to runner during pipeline 
execution.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346384)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346383
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:44
Start Date: 20/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r348241948
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool {
   // Stop the SDK worker.
   rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {}
 }
+
+// Request from runner to SDK Harness asking for its status.
+message WorkerStatusRequest {
+  // (Required) Unique ID identifying this request.
+  string request_id = 1;
+}
+
+// Response from SDK Harness to runner containing the debug related status 
info.
+message WorkerStatusResponse {
+  // (Required) Unique ID from the original request.
+  string request_id = 1;
+  // (Optional) Error message if exception encountered generating the status 
response.
+  string error = 2;
+  // (Optional) Status debugging info reported by SDK harness worker.
 
 Review comment:
   You need to describe the format and what needs to be part of the response 
otherwise SDK and runner authors can't tell what they are supposed to do with 
this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346383)
Time Spent: 2h 10m  (was: 2h)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346381
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:44
Start Date: 20/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r348241528
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool {
   // Stop the SDK worker.
   rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {}
 }
+
+// Request from runner to SDK Harness asking for its status.
+message WorkerStatusRequest {
+  // (Required) Unique ID identifying this request.
+  string request_id = 1;
 
 Review comment:
   ```suggestion
 string id = 1;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346381)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346385
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:44
Start Date: 20/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10115: [BEAM-8624] 
Implement Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#discussion_r348241604
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool {
   // Stop the SDK worker.
   rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {}
 }
+
+// Request from runner to SDK Harness asking for its status.
 
 Review comment:
   Please add the link to the design doc for more details.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346385)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8623) Add additional message field to Provision API response for passing status endpoint

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8623?focusedWorklogId=346380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346380
 ]

ASF GitHub Bot logged work on BEAM-8623:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:38
Start Date: 20/Nov/19 00:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10075: [BEAM-8623] 
Add status_endpoint field to provision api ProvisionInfo
URL: https://github.com/apache/beam/pull/10075#discussion_r348240469
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_provision_api.proto
 ##
 @@ -71,6 +72,11 @@ message ProvisionInfo {
 // (required) The artifact retrieval token produced by
 // ArtifactStagingService.CommitManifestResponse.
 string retrieval_token = 6;
+
+// (optional) The endpoint for runner to use for hosting the worker status
 
 Review comment:
   please update comment to:
   ```
   (optional) The endpoint that the runner is hosting for the SDK to submit 
status reports to during pipeline execution. This field will only be populated 
if the runner supports SDK status reports. For more details see {link to design 
doc}.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346380)
Time Spent: 0.5h  (was: 20m)

> Add additional message field to Provision API response for passing status 
> endpoint
> --
>
> Key: BEAM-8623
> URL: https://issues.apache.org/jira/browse/BEAM-8623
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8781) FlinkUberJarJobServer should respect --flink_job_server_jar

2019-11-19 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8781:
--
Status: Open  (was: Triage Needed)

> FlinkUberJarJobServer should respect --flink_job_server_jar
> ---
>
> Key: BEAM-8781
> URL: https://issues.apache.org/jira/browse/BEAM-8781
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=346376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346376
 ]

ASF GitHub Bot logged work on BEAM-7049:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:28
Start Date: 20/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9358: 
(WIP-BEAM-7049)Changes made to make a simple case of threeway union work
URL: https://github.com/apache/beam/pull/9358#issuecomment-555778972
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346376)
Time Spent: 1h 20m  (was: 1h 10m)

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8143) Provide a simple LineSource built on top of SDF

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8143?focusedWorklogId=346375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346375
 ]

ASF GitHub Bot logged work on BEAM-8143:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:28
Start Date: 20/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #9366: [BEAM-8143] Build 
simple LineSource directly on top of SDF
URL: https://github.com/apache/beam/pull/9366#issuecomment-555778965
 
 
   This pull request has been closed due to lack of activity. If you think that 
is incorrect, or the pull request requires review, you can revive the PR at any 
time.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346375)
Time Spent: 1h 10m  (was: 1h)

> Provide a simple LineSource built on top of SDF
> ---
>
> Key: BEAM-8143
> URL: https://issues.apache.org/jira/browse/BEAM-8143
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8143) Provide a simple LineSource built on top of SDF

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8143?focusedWorklogId=346377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346377
 ]

ASF GitHub Bot logged work on BEAM-8143:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:28
Start Date: 20/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9366: [BEAM-8143] 
Build simple LineSource directly on top of SDF
URL: https://github.com/apache/beam/pull/9366
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346377)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide a simple LineSource built on top of SDF
> ---
>
> Key: BEAM-8143
> URL: https://issues.apache.org/jira/browse/BEAM-8143
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=346378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346378
 ]

ASF GitHub Bot logged work on BEAM-7049:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:28
Start Date: 20/Nov/19 00:28
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on pull request #9358: 
(WIP-BEAM-7049)Changes made to make a simple case of threeway union work
URL: https://github.com/apache/beam/pull/9358
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346378)
Time Spent: 1.5h  (was: 1h 20m)

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=346373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346373
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:27
Start Date: 20/Nov/19 00:27
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest 
testing infrastructure
URL: https://github.com/apache/beam/pull/7949#issuecomment-555778714
 
 
   still working on this
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346373)
Time Spent: 12h 10m  (was: 12h)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=346369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346369
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 20/Nov/19 00:13
Start Date: 20/Nov/19 00:13
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb 
worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150#issuecomment-555775063
 
 
   > If a region/zone are not set up, then None/null will be set - Dataflow is 
able to handle these properly?
   
   Good question -- doesn't look like it.
   
   I'm planning on running tests against staging before submitting this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346369)
Time Spent: 50m  (was: 40m)

> Add worker_region and worker_zone options
> -
>
> Key: BEAM-8251
> URL: https://issues.apache.org/jira/browse/BEAM-8251
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We are refining the way the user specifies worker regions and zones to the 
> Dataflow service. We need to add worker_region and worker_zone pipeline 
> options that will be preferred over the old experiments=worker_region and 
> --zone flags. I will create subtasks for adding these options to each SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346365
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:58
Start Date: 19/Nov/19 23:58
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-555771367
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346365)
Time Spent: 1h 40m  (was: 1.5h)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346364
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:55
Start Date: 19/Nov/19 23:55
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-555770721
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346364)
Time Spent: 1.5h  (was: 1h 20m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346355
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:37
Start Date: 19/Nov/19 23:37
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348225946
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -878,6 +1033,7 @@ def get_coder(self, coder_id):
   json.loads(coder_proto.spec.payload.decode('utf-8')))
 
   def get_windowed_coder(self, pcoll_id):
+# type: (str) -> WindowedValueCoder
 
 Review comment:
   Ok, I can change it to `Coder`, but we'll have to come back and change it to 
`Coder[WindowedValue]` once we make `Coder` generic.  
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346355)
Time Spent: 27.5h  (was: 27h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923
 ] 

Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:36 PM:
--

While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [2] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls", 0)

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io", 0)

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))

for thread in threads:
thread.start()
{code}
fails with
{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}
[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
 [2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
 [3] 
[https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426]


was (Author: tvalentyn):
While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [2] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls", 0)

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io", 0)

def t3():
return __import__("tensorflow.core.example.example_pb2", 0)

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}
fails with
{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}
[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
 [2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
 [3] 
[https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426]

> Python 3 portable 

[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=346352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346352
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:29
Start Date: 19/Nov/19 23:29
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: 
Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-555763959
 
 
   I reworked this a bit to make it easier for other job services to implement 
state history, and I added some tests for the local job service. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346352)
Time Spent: 2h 20m  (was: 2h 10m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346351
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:28
Start Date: 19/Nov/19 23:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348223233
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -2320,6 +2359,7 @@ def expand(self, pcoll):
 return super(WindowInto, self).expand(pcoll)
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> typing.Tuple[str, message.Message]
 
 Review comment:
   Oh, yes, this is the method on the transform, not the WindowFn. Fine as is. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346351)
Time Spent: 27h 20m  (was: 27h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=346349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346349
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:26
Start Date: 19/Nov/19 23:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161
 
 
   Simple fix that makes it possible to access a local job service running in 
docker.  
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=346350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346350
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:26
Start Date: 19/Nov/19 23:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10161: [BEAM-8746] Make 
local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#issuecomment-555763150
 
 
   R: @mxm 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346350)
Time Spent: 20m  (was: 10m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346348
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:25
Start Date: 19/Nov/19 23:25
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348222590
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -878,6 +1033,7 @@ def get_coder(self, coder_id):
   json.loads(coder_proto.spec.payload.decode('utf-8')))
 
   def get_windowed_coder(self, pcoll_id):
+# type: (str) -> WindowedValueCoder
 
 Review comment:
   But it need not. And there's discussions on the list (e.g. the 
ValueOnlyWindowedValueCoder) to change this. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346348)
Time Spent: 27h 10m  (was: 27h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346347
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:24
Start Date: 19/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348222344
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -328,10 +404,12 @@ class _ConcatIterable(object):
   Unlike itertools.chain, this allows reiteration.
   """
   def __init__(self, first, second):
+# type: (Iterable[Any], Iterable[Any]) -> None
 
 Review comment:
   Yeah, follow-up is fine. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346347)
Time Spent: 27h  (was: 26h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 27h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346346
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:24
Start Date: 19/Nov/19 23:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r34844
 
 

 ##
 File path: sdks/python/apache_beam/runners/runner.py
 ##
 @@ -133,7 +152,10 @@ def run_async(self, transform, options=None):
   transform(PBegin(p))
 return p.run()
 
-  def run_pipeline(self, pipeline, options):
+  def run_pipeline(self,
+   pipeline,  # type: Pipeline
+   options  # type: PipelineOptions
+  ):
 
 Review comment:
   Ack. Yeah, this should probably just raise NotImplemented. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346346)
Time Spent: 26h 50m  (was: 26h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=346345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346345
 ]

ASF GitHub Bot logged work on BEAM-8151:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:23
Start Date: 19/Nov/19 23:23
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10134: [BEAM-8151] 
Further cleanup of SDK Workers.
URL: https://github.com/apache/beam/pull/10134
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346345)
Time Spent: 14.5h  (was: 14h 20m)

> Allow the Python SDK to use many many threads
> -
>
> Key: BEAM-8151
> URL: https://issues.apache.org/jira/browse/BEAM-8151
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> We need to use a thread pool which shrinks the number of active threads when 
> they are not being used.
>  
> This is to prevent any stuckness issues related to a runner scheduling more 
> work items then there are "work" threads inside the SDK harness.
>  
> By default the control plane should have all "requests" being processed in 
> parallel and the runner is responsible for not overloading the SDK with too 
> much work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923
 ] 

Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:22 PM:
--

While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [2] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls", 0)

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io", 0)

def t3():
return __import__("tensorflow.core.example.example_pb2", 0)

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}
fails with
{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}
[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
 [2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
 [3] 
[https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426]


was (Author: tvalentyn):
While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [2] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls")

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io")

def t3():
return __import__("tensorflow.core.example.example_pb2")

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}
fails with
{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}
[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
 [2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
 [3] 

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=346344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346344
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:21
Start Date: 19/Nov/19 23:21
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10119: [BEAM-8335] 
Adds the StreamingCache
URL: https://github.com/apache/beam/pull/10119
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346344)
Time Spent: 33h 50m  (was: 33h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 33h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8781) FlinkUberJarJobServer should respect --flink_job_server_jar

2019-11-19 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8781:
-

 Summary: FlinkUberJarJobServer should respect 
--flink_job_server_jar
 Key: BEAM-8781
 URL: https://issues.apache.org/jira/browse/BEAM-8781
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923
 ] 

Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:19 PM:
--

While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [2] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls")

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io")

def t3():
return __import__("tensorflow.core.example.example_pb2")

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}
fails with
{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}
[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
 [2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
 [3] 
[https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426]


was (Author: tvalentyn):
While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [1] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls")

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io")

def t3():
return __import__("tensorflow.core.example.example_pb2")

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}

fails with 

{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}


[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
[2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
[3] 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346342
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:19
Start Date: 19/Nov/19 23:19
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348220566
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -460,8 +558,14 @@ def clear(self):
 class FnApiUserStateContext(userstate.UserStateContext):
   """Interface for state and timers from SDK to Fn API servicer of state.."""
 
-  def __init__(
-  self, state_handler, transform_id, key_coder, window_coder, timer_specs):
+  def __init__(self,
+   state_handler,
+   transform_id,  # type: str
+   key_coder,  # type: coders.Coder
+   window_coder,  # type: coders.Coder
+   timer_specs  # type: MutableMapping[str, 
beam_runner_api_pb2.TimerSpec]
 
 Review comment:
   Looks like no.  Addressed in my upcoming review commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346342)
Time Spent: 26h 40m  (was: 26.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346340
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:18
Start Date: 19/Nov/19 23:18
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348220199
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -477,16 +581,22 @@ def __init__(
 self._key_coder = key_coder
 self._window_coder = window_coder
 self._timer_specs = timer_specs
-self._timer_receivers = None
-self._all_states = {}
+self._timer_receivers = None  # type: Optional[Dict[str, 
operations.ConsumerSet]]
+self._all_states = {}  # type: Dict[tuple, 
Union[SynchronousBagRuntimeState, SynchronousSetRuntimeState, 
CombiningValueRuntimeState]]
 
 Review comment:
   I could replace this with `Dict[tuple, userstate.AccumulatingRuntimeState]` 
but in order for this to work I need to add `_commit` to the abstract methods 
on this `AccumulatingRuntimeState`, because this method is expected to exist:
   
   ```python
   class FnApiUserStateContext(userstate.UserStateContext):
 def __init__(self):
   ...
   self._all_states = {}  # type: Dict[tuple, 
userstate.AccumulatingRuntimeState]
   
 ...
   
 def commit(self):
   # type: () -> None
   for state in self._all_states.values():
 state._commit()
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346340)
Time Spent: 26.5h  (was: 26h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-19 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923
 ] 

Valentyn Tymofieiev commented on BEAM-8651:
---

While investigating the example from 
[https://github.com/tensorflow/tfx/issues/928] I observe the following failure 
mode:

SDK worker starts multiple threads. Each thread happens to unpickle some 
payload, which calls
{code:java}
 dill.loads [1] {code}
Dill calls
{code:java}
 pickle.Unpickler.find_class [1] {code}
which calls
{code:java}
 __import__ [3] {code}
Concurrent import calls cause a deadlock on Python 3 (checked on Python 
3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from 
the Chicago taxi pipeline
{code:python}
import threading

def t1():
return __import__("tensorflow_transform.beam.analyzer_impls")

def t2():
return __import__("tensorflow_transform.tf_metadata.metadata_io")

def t3():
return __import__("tensorflow.core.example.example_pb2")

threads = []
threads.append(threading.Thread(target=t1))
threads.append(threading.Thread(target=t2))
threads.append(threading.Thread(target=t3))

for thread in threads:
thread.start()
{code}

fails with 

{noformat}
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
  File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
  File "dealock_repro.py", line 4, in t1
return __import__("tensorflow_transform.beam.analyzer_impls", level=0)
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py",
 line 23, in 
from tensorflow_transform.output_wrapper import TFTransformOutput
  File 
"/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py",
 line 29, in 
from tensorflow_transform.tf_metadata import metadata_io
  File "", line 980, in _find_and_load
  File "", line 149, in __enter__
  File "", line 94, in acquire
_frozen_importlib._DeadlockError: deadlock detected by 
_ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456
{noformat}


[1] 
[https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261]
[2] 
[https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462]
[3] 
[https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426]

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Attachments: beam8651.py
>
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows 

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346335
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:05
Start Date: 19/Nov/19 23:05
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348216261
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -847,16 +993,24 @@ def __init__(self, descriptor, data_channel_factory, 
counter_factory,
 runner=beam_fn_api_pb2.StateKey.Runner(key=token)),
 element_coder_impl))
 
-  _known_urns = {}
+  _known_urns = {}  # type: Dict[str, Tuple[ConstructorFn, 
Union[Type[message.Message], Type[bytes], None]]]
 
   @classmethod
-  def register_urn(cls, urn, parameter_type):
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type  # type: Optional[Type[T]]
+  ):
+# type: (...) -> Callable[[Callable[[BeamTransformFactory, str, 
beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], 
operations.Operation]], Callable[[BeamTransformFactory, str, 
beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], 
operations.Operation]]
 
 Review comment:
   There are ways to create module-level aliases to break down and hide 
complexity, but since this uses a bunch of TypeVars to create relationships 
between the args and return value I had to do it all at once.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346335)
Time Spent: 26h 20m  (was: 26h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8277) Make docker build quicker

2019-11-19 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977912#comment-16977912
 ] 

Kyle Weaver commented on BEAM-8277:
---

Build times have gotten slower still: 11m 48s on current master.

> Make docker build quicker
> -
>
> Key: BEAM-8277
> URL: https://issues.apache.org/jira/browse/BEAM-8277
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Building the Python SDK harness container takes minutes on my machine.
> ```
> ./gradlew :sdks:python:container:buildAll
> BUILD SUCCESSFUL in 9m 33s
> ```
> Possible lead: "We spend mins pulling cmd/beamctl deps."
> [https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346334
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:04
Start Date: 19/Nov/19 23:04
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348215834
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -847,16 +993,24 @@ def __init__(self, descriptor, data_channel_factory, 
counter_factory,
 runner=beam_fn_api_pb2.StateKey.Runner(key=token)),
 element_coder_impl))
 
-  _known_urns = {}
+  _known_urns = {}  # type: Dict[str, Tuple[ConstructorFn, 
Union[Type[message.Message], Type[bytes], None]]]
 
   @classmethod
-  def register_urn(cls, urn, parameter_type):
+  def register_urn(cls,
+   urn,  # type: str
+   parameter_type  # type: Optional[Type[T]]
+  ):
+# type: (...) -> Callable[[Callable[[BeamTransformFactory, str, 
beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], 
operations.Operation]], Callable[[BeamTransformFactory, str, 
beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], 
operations.Operation]]
 
 Review comment:
   Not until we get to python3 annotations :(
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346334)
Time Spent: 26h 10m  (was: 26h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346332
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:02
Start Date: 19/Nov/19 23:02
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348215242
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -405,6 +417,7 @@ class _RestrictionDoFnParam(_DoFnParam):
   """Restriction Provider DoFn parameter."""
 
   def __init__(self, restriction_provider):
+# type: (RestrictionProvider) -> None
 if not isinstance(restriction_provider, RestrictionProvider):
 
 Review comment:
   agreed!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346332)
Time Spent: 26h  (was: 25h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 26h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346331
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 23:01
Start Date: 19/Nov/19 23:01
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348215088
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -878,6 +1033,7 @@ def get_coder(self, coder_id):
   json.loads(coder_proto.spec.payload.decode('utf-8')))
 
   def get_windowed_coder(self, pcoll_id):
+# type: (str) -> WindowedValueCoder
 
 Review comment:
   I'm confused.  This method is definitely returning `WindowedValueCoder`:
   
   ```python
 def get_windowed_coder(self, pcoll_id):
   # type: (str) -> WindowedValueCoder
   coder = self.get_coder(self.descriptor.pcollections[pcoll_id].coder_id)
   # TODO(robertwb): Remove this condition once all runners are consistent.
   if not isinstance(coder, WindowedValueCoder):
 windowing_strategy = self.descriptor.windowing_strategies[
 self.descriptor.pcollections[pcoll_id].windowing_strategy_id]
 return WindowedValueCoder(
 coder, self.get_coder(windowing_strategy.window_coder_id))
   else:
 return coder
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346331)
Time Spent: 25h 50m  (was: 25h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346329
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:59
Start Date: 19/Nov/19 22:59
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348214287
 
 

 ##
 File path: sdks/python/apache_beam/runners/runner.py
 ##
 @@ -133,7 +152,10 @@ def run_async(self, transform, options=None):
   transform(PBegin(p))
 return p.run()
 
-  def run_pipeline(self, pipeline, options):
+  def run_pipeline(self,
+   pipeline,  # type: Pipeline
+   options  # type: PipelineOptions
+  ):
 
 Review comment:
   What is the point of this base implementation?  Should we just raise 
`NotImplementedError` here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346329)
Time Spent: 25h 40m  (was: 25.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346326
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:54
Start Date: 19/Nov/19 22:54
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348212865
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -2320,6 +2359,7 @@ def expand(self, pcoll):
 return super(WindowInto, self).expand(pcoll)
 
   def to_runner_api_parameter(self, context):
+# type: (PipelineContext) -> typing.Tuple[str, message.Message]
 
 Review comment:
   Looking at the code I don't see anywhere that 
`WindowInto.to_runner_api_parameter()` returns `bytes`.   This function 
actually returns `beam_runner_api_pb2.WindowingStrategy` by way of 
`self.windowing.to_runner_api(context)` (just below this).  I also don't see 
any subclasses of `WindowInto` that override this method.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346326)
Time Spent: 25.5h  (was: 25h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346313
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:31
Start Date: 19/Nov/19 22:31
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10081: [BEAM-8645] A test 
case for TimestampCombiner.
URL: https://github.com/apache/beam/pull/10081#issuecomment-555747139
 
 
   would you please point out which PR in the master *Might* resolve the issue? 
 I can follow and trace that part a bit using this test case. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346313)
Time Spent: 5.5h  (was: 5h 20m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346311
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:30
Start Date: 19/Nov/19 22:30
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348203903
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -73,8 +108,17 @@
 class RunnerIOOperation(operations.Operation):
   """Common baseclass for runner harness IO operations."""
 
-  def __init__(self, name_context, step_name, consumers, counter_factory,
-   state_sampler, windowed_coder, transform_id, data_channel):
+  def __init__(self,
+   name_context,  # type: Union[str, common.NameContext]
+   step_name,
+   consumers,  # type: Mapping[Any, Iterable[operations.Operation]]
+   counter_factory,
+   state_sampler,
+   windowed_coder,  # type: coders.WindowedValueCoder
+   transform_id,  # type: str
+   data_channel  # type: data_plane.GrpcClientDataChannel
 
 Review comment:
   done. will be included in my upcoming review commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346311)
Time Spent: 25h 20m  (was: 25h 10m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25h 20m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346310
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:29
Start Date: 19/Nov/19 22:29
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348203629
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -328,10 +404,12 @@ class _ConcatIterable(object):
   Unlike itertools.chain, this allows reiteration.
   """
   def __init__(self, first, second):
+# type: (Iterable[Any], Iterable[Any]) -> None
 
 Review comment:
   To solve this well, we'd want to make `_ConcatIterable` a generic.  I've 
been holding off on introducing any generics for now.  I'll do this in a 
followup if that's ok with you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346310)
Time Spent: 25h 10m  (was: 25h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346308
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:27
Start Date: 19/Nov/19 22:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348203062
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -2167,8 +2198,12 @@ def expand(self, pcoll):
 
 
 class Windowing(object):
-  def __init__(self, windowfn, triggerfn=None, accumulation_mode=None,
-   timestamp_combiner=None):
+  def __init__(self,
+   windowfn,  # type: WindowFn
+   triggerfn=None,  # type: typing.Optional[TriggerFn]
+   accumulation_mode=None,
 
 Review comment:
   done. will be included in my upcoming review commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346308)
Time Spent: 24h 50m  (was: 24h 40m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 24h 50m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346306
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:27
Start Date: 19/Nov/19 22:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348202914
 
 

 ##
 File path: sdks/python/apache_beam/runners/runner.py
 ##
 @@ -133,7 +152,10 @@ def run_async(self, transform, options=None):
   transform(PBegin(p))
 return p.run()
 
-  def run_pipeline(self, pipeline, options):
+  def run_pipeline(self,
+   pipeline,  # type: Pipeline
+   options  # type: PipelineOptions
+  ):
 
 Review comment:
   this appears to return `None`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346306)
Time Spent: 24h 40m  (was: 24.5h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346309
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:27
Start Date: 19/Nov/19 22:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9915: [BEAM-7746] 
Add python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#discussion_r348203078
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -2276,10 +2313,11 @@ def process(self, element, 
timestamp=DoFn.TimestampParam,
   yield WindowedValue(element, context.timestamp, new_windows)
 
   def __init__(self,
-   windowfn,
-   trigger=None,
+   windowfn,  # type: typing.Union[Windowing, WindowFn]
+   trigger=None,  # type: typing.Optional[TriggerFn]
accumulation_mode=None,
-   timestamp_combiner=None):
+   timestamp_combiner=None
 
 Review comment:
   done. will be included in my upcoming review commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346309)
Time Spent: 25h  (was: 24h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 25h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python

2019-11-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346305
 ]

ASF GitHub Bot logged work on BEAM-8645:


Author: ASF GitHub Bot
Created on: 19/Nov/19 22:26
Start Date: 19/Nov/19 22:26
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10081: [BEAM-8645] A test 
case for TimestampCombiner.
URL: https://github.com/apache/beam/pull/10081#issuecomment-555745530
 
 
   Pulled from master, and retried.  The EARLIEST case still failed, with 
error: 
   
   "apache_beam.runners.direct.executor: WARNING: A task failed with exception: 
Failed assert: [(('k', 500), Timestamp(7))] not in [(('k', 500), Timestamp(2))] 
[while running 'assert per window/Match']"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 346305)
Time Spent: 5h 20m  (was: 5h 10m)

> TimestampCombiner incorrect in beam python
> --
>
> Key: BEAM-8645
> URL: https://issues.apache.org/jira/browse/BEAM-8645
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When we have a TimestampValue on combine: 
> {code:java}
> main_stream = (p   
> | 'main TestStream' >> TestStream()   
> .add_elements([window.TimestampedValue(('k', 100), 0)])   
> .add_elements([window.TimestampedValue(('k', 400), 9)])   
> .advance_watermark_to_infinity()   
> | 'main windowInto' >> beam.WindowInto( 
> window.FixedWindows(10),  
> timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST)   | 
> 'Combine' >> beam.CombinePerKey(sum))
> The expect timestamp should be:
> LATEST:    (('k', 500), Timestamp(9)),
> EARLIEST:    (('k', 500), Timestamp(0)),
> END_OF_WINDOW: (('k', 500), Timestamp(10)),
> But current py streaming gives following results: 
> LATEST:    (('k', 500), Timestamp(10)),
> EARLIEST:    (('k', 500), Timestamp(10)),
> END_OF_WINDOW: (('k', 500), Timestamp(9.)),
> More details and discussions:
> https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >