[jira] [Commented] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking
[ https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978167#comment-16978167 ] Alex Van Boxel commented on BEAM-8174: -- I've removed the fixed version on this > BigQueryIO clustering documentation is incorrect and lacking > > > Key: BEAM-8174 > URL: https://issues.apache.org/jira/browse/BEAM-8174 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.15.0 >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Trivial > Labels: documentation > Original Estimate: 2h > Remaining Estimate: 2h > > I noticed that the Java doc of the clustering feature in BigQueryIO is more a > copy/paste from the timestamp method. This needs to be corrected. > The Clustering option should also be added to the BigQueryIO page. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking
[ https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Van Boxel updated BEAM-8174: - Fix Version/s: (was: 2.17.0) > BigQueryIO clustering documentation is incorrect and lacking > > > Key: BEAM-8174 > URL: https://issues.apache.org/jira/browse/BEAM-8174 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.15.0 >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Trivial > Labels: documentation > Original Estimate: 2h > Remaining Estimate: 2h > > I noticed that the Java doc of the clustering feature in BigQueryIO is more a > copy/paste from the timestamp method. This needs to be corrected. > The Clustering option should also be added to the BigQueryIO page. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978155#comment-16978155 ] Valentyn Tymofieiev edited comment on BEAM-8651 at 11/20/19 7:33 AM: - [https://github.com/apache/beam/pull/10167] may be a way to address this issue in Beam plane, and unblock users. I cannot say at the moment whether broken module imports during concurrent unpickling is a known/expected behavior from Dill/CPython perspective. was (Author: tvalentyn): [https://github.com/apache/beam/pull/10167] may be a way to address this issue in Beam plane, and unblock users. I cannot say at the moment whether the race is a known/expected behavior from Dill/CPython perspective. > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Attachments: beam8651.py > > Time Spent: 0.5h > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978155#comment-16978155 ] Valentyn Tymofieiev commented on BEAM-8651: --- [https://github.com/apache/beam/pull/10167] may be a way to address this issue in Beam plane, and unblock users. I cannot say at the moment whether the race is a known/expected behavior from Dill/CPython perspective. > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Attachments: beam8651.py > > Time Spent: 0.5h > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=346520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346520 ] ASF GitHub Bot logged work on BEAM-8511: Author: ASF GitHub Bot Created on: 20/Nov/19 07:26 Start Date: 20/Nov/19 07:26 Worklog Time Spent: 10m Work Description: jfarr commented on issue #9899: [BEAM-8511] [WIP] KinesisIO.Read enhanced fanout URL: https://github.com/apache/beam/pull/9899#issuecomment-555874874 @aromanenko-dev thank you for the feedback. I’ll work on getting those changes in. I’m sorry, the last week or so has been very busy for me. I will be on vacation from my day job all of next week so I hope to wrap up this one and #9765 as much as I can. I have been testing this against our production data stream for a while now and functionally it’s been working very well but I am still seeing some unexpected latencies at higher throughputs and working to get to the bottom of that. I’ll try to post some info about that as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346520) Time Spent: 2h 40m (was: 2.5h) > Support for enhanced fan-out in KinesisIO.Read > -- > > Key: BEAM-8511 > URL: https://issues.apache.org/jira/browse/BEAM-8511 > Project: Beam > Issue Type: New Feature > Components: io-java-kinesis >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Add support for reading from an enhanced fan-out consumer using KinesisIO. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346519 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 07:25 Start Date: 20/Nov/19 07:25 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167#issuecomment-555874590 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346519) Time Spent: 0.5h (was: 20m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Attachments: beam8651.py > > Time Spent: 0.5h > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346518 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 07:25 Start Date: 20/Nov/19 07:25 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167#issuecomment-555874543 Run Python 2.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346518) Time Spent: 20m (was: 10m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Attachments: beam8651.py > > Time Spent: 20m > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=346516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346516 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 07:21 Start Date: 20/Nov/19 07:21 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346509 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 20/Nov/19 07:15 Start Date: 20/Nov/19 07:15 Worklog Time Spent: 10m Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-555871767 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346509) Time Spent: 3h 40m (was: 3.5h) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346508 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 20/Nov/19 07:14 Start Date: 20/Nov/19 07:14 Worklog Time Spent: 10m Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-555871445 @kennknowles The tests always end up with the same state on this branch. eg. ``` 16:37:51 org.apache.beam.examples.WindowedWordCountIT > testWindowedWordCountInStreamingStaticSharding FAILED 16:37:51 java.lang.RuntimeException at WindowedWordCountIT.java:188 17:20:15 17:20:15 org.apache.beam.examples.WordCountIT > testE2EWordCount FAILED 17:20:15 java.lang.RuntimeException at WordCountIT.java:69 17:40:50 17:40:50 org.apache.beam.examples.WindowedWordCountIT > testWindowedWordCountInBatchStaticSharding FAILED 17:40:50 java.lang.RuntimeException at WindowedWordCountIT.java:188 18:15:28 Build timed out (after 120 minutes). Marking the build as aborted. 18:15:28 Build was aborted 18:15:28 Recording test results ``` When I checked the source code of these tests, they seem configured have input and output configured `gcs://` paths. When I try to run them locally I end up with... ``` com.google.api.client.http.HttpResponseException: 400 Bad Request { "error": "invalid_grant", "error_description": "Bad Request" } at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1102) at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:227) at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:181) at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:167) at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96) at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52) at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:381) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:357) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.queue(AbstractGoogleClientRequest.java:662) at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.queue(AbstractGoogleJsonClientRequest.java:108) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.enqueueGetFileSize(GcsUtil.java:755) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.makeGetBatches(GcsUtil.java:608) ``` which also implies that they are using `gcs` instead of local fs. Btw, this doesn't fail on Master branch: [jenkins](https://builds.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/5443/) [PR](https://github.com/apache/beam/pull/10024) What should I do next? I'm don't think this is ok to merge as the failure seem to be deterministic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346508) Time Spent: 3.5h (was: 3h 20m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346506=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346506 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 20/Nov/19 07:13 Start Date: 20/Nov/19 07:13 Worklog Time Spent: 10m Work Description: dmvk commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-555871445 @kennknowles The tests always end up with the same state on this branch. eg. ``` 16:37:51 org.apache.beam.examples.WindowedWordCountIT > testWindowedWordCountInStreamingStaticSharding FAILED 16:37:51 java.lang.RuntimeException at WindowedWordCountIT.java:188 17:20:15 17:20:15 org.apache.beam.examples.WordCountIT > testE2EWordCount FAILED 17:20:15 java.lang.RuntimeException at WordCountIT.java:69 17:40:50 17:40:50 org.apache.beam.examples.WindowedWordCountIT > testWindowedWordCountInBatchStaticSharding FAILED 17:40:50 java.lang.RuntimeException at WindowedWordCountIT.java:188 18:15:28 Build timed out (after 120 minutes). Marking the build as aborted. 18:15:28 Build was aborted 18:15:28 Recording test results ``` When I checked the source code of these tests, they seem configured have input and output configured `gcs://` paths. When I try to run them locally I end up with... ``` com.google.api.client.http.HttpResponseException: 400 Bad Request { "error": "invalid_grant", "error_description": "Bad Request" } at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1102) at com.google.auth.oauth2.UserCredentials.refreshAccessToken(UserCredentials.java:227) at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:181) at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:167) at com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96) at com.google.cloud.hadoop.util.ChainingHttpRequestInitializer.initialize(ChainingHttpRequestInitializer.java:52) at com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:381) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:357) at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.queue(AbstractGoogleClientRequest.java:662) at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.queue(AbstractGoogleJsonClientRequest.java:108) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.enqueueGetFileSize(GcsUtil.java:755) at org.apache.beam.sdk.extensions.gcp.util.GcsUtil.makeGetBatches(GcsUtil.java:608) ``` which also implies that they are using `gcs` instead of local fs. Btw, this doesn't fail on Master branch: (jenkins)[https://builds.apache.org/job/beam_PreCommit_Java_Examples_Dataflow_Commit/5443/], (PR)[https://github.com/apache/beam/pull/10024] What should I do next? I'm don't think this is ok to merge as the failure seem to be deterministic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346506) Time Spent: 3h 20m (was: 3h 10m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration
[ https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978142#comment-16978142 ] Ismaël Mejía commented on BEAM-8384: I have forgotten about this one. Looks like a regression but not convinced it is a blocker. I will move the Fix version tag to unblock the release and eventually send a cherry pick if still in time. > Spark runner is not respecting spark.default.parallelism user defined > configuration > --- > > Key: BEAM-8384 > URL: https://issues.apache.org/jira/browse/BEAM-8384 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.16.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Major > > It was reported in [the mailing > list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E] > that the Spark runner is not respecting the user defined Spark default > parallelism configuration. We should investigate and if it is the case ensure > that a user defined configuration is always respected. Runner optimizations > should apply only for default (unconfigured) values otherwise we will confuse > users and limit them from parametrizing Spark for their best convenience. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration
[ https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-8384: --- Fix Version/s: (was: 2.17.0) > Spark runner is not respecting spark.default.parallelism user defined > configuration > --- > > Key: BEAM-8384 > URL: https://issues.apache.org/jira/browse/BEAM-8384 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.16.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Major > > It was reported in [the mailing > list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E] > that the Spark runner is not respecting the user defined Spark default > parallelism configuration. We should investigate and if it is the case ensure > that a user defined configuration is always respected. Runner optimizations > should apply only for default (unconfigured) values otherwise we will confuse > users and limit them from parametrizing Spark for their best convenience. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346493 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 20/Nov/19 06:34 Start Date: 20/Nov/19 06:34 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-555860949 Looks good to me: https://gradle.com/s/c47e35q47qzpq This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346493) Time Spent: 3h 10m (was: 3h) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346492 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 06:33 Start Date: 20/Nov/19 06:33 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348309394 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: Maybe we can have both. Would you please explain how to control some values 'alive' while others in the same iterable not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346492) Time Spent: 6h 40m (was: 6.5h) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6h 40m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346490 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 06:28 Start Date: 20/Nov/19 06:28 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348309394 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: Maybe we can keep both. Would you please explain how to control some values 'alive' while others in the same iterable not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346490) Time Spent: 6.5h (was: 6h 20m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6.5h > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346489 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 06:28 Start Date: 20/Nov/19 06:28 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348309394 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: Maybe let us keep both. Would you please explain how to control some values 'alive' while others in the same iterable not? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346489) Time Spent: 6h 20m (was: 6h 10m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346481 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 06:19 Start Date: 20/Nov/19 06:19 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9922: [BEAM-7390] Add code snippets for CombineValues URL: https://github.com/apache/beam/pull/9922#discussion_r348307381 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py ## @@ -0,0 +1,246 @@ +# coding=utf-8 Review comment: Changed the examples for more unique use cases This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346481) Time Spent: 40m (was: 0.5h) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346479 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 06:19 Start Date: 20/Nov/19 06:19 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9920: [BEAM-7390] Add code snippets for CombineGlobally URL: https://github.com/apache/beam/pull/9920#discussion_r348307273 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py ## @@ -0,0 +1,214 @@ +# coding=utf-8 Review comment: I changed the examples for the Combine* transforms and GBK. Hopefully, they make more sense now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346479) Time Spent: 0.5h (was: 20m) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=346477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346477 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 20/Nov/19 06:04 Start Date: 20/Nov/19 06:04 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-555853543 Where are you seeing the logs indicating GCS? What I see in the logs is entirely not useful. I can certainly try to reproduce this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346477) Time Spent: 3h (was: 2h 50m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8740) TestPubsub ignores timeout
[ https://issues.apache.org/jira/browse/BEAM-8740?focusedWorklogId=346474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346474 ] ASF GitHub Bot logged work on BEAM-8740: Author: ASF GitHub Bot Created on: 20/Nov/19 05:46 Start Date: 20/Nov/19 05:46 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10153: [BEAM-8740] TestPubsub ignores timeout URL: https://github.com/apache/beam/pull/10153#discussion_r348300443 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java ## @@ -225,7 +227,7 @@ public void publish(List messages) throws IOException { receivedMessages.addAll(pull(n - receivedMessages.size())); while (receivedMessages.size() < n -&& Seconds.secondsBetween(new DateTime(), startTime).getSeconds() < timeoutSeconds) { +&& Seconds.secondsBetween(startTime, new DateTime()).getSeconds() < timeoutSeconds) { Review comment: ouch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346474) Remaining Estimate: 0h Time Spent: 10m > TestPubsub ignores timeout > -- > > Key: BEAM-8740 > URL: https://issues.apache.org/jira/browse/BEAM-8740 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8740) TestPubsub ignores timeout
[ https://issues.apache.org/jira/browse/BEAM-8740?focusedWorklogId=346475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346475 ] ASF GitHub Bot logged work on BEAM-8740: Author: ASF GitHub Bot Created on: 20/Nov/19 05:46 Start Date: 20/Nov/19 05:46 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10153: [BEAM-8740] TestPubsub ignores timeout URL: https://github.com/apache/beam/pull/10153#discussion_r348300547 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/TestPubsub.java ## @@ -202,9 +202,11 @@ public void publish(List messages) throws IOException { public List pull(int maxBatchSize) throws IOException { List messages = pubsub.pull(0, subscriptionPath, maxBatchSize, true); -pubsub.acknowledge( -subscriptionPath, -messages.stream().map(msg -> msg.ackId).collect(ImmutableList.toImmutableList())); +if (!messages.isEmpty()) { Review comment: out of curiosity, was this a crasher? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346475) Time Spent: 20m (was: 10m) > TestPubsub ignores timeout > -- > > Key: BEAM-8740 > URL: https://issues.apache.org/jira/browse/BEAM-8740 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8198) Investigate possible performance regression of Wordcount 1GB batch benchmark on Py3.
[ https://issues.apache.org/jira/browse/BEAM-8198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978073#comment-16978073 ] Kenneth Knowles commented on BEAM-8198: --- Ping. This seems to be sitting for a while? > Investigate possible performance regression of Wordcount 1GB batch benchmark > on Py3. > > > Key: BEAM-8198 > URL: https://issues.apache.org/jira/browse/BEAM-8198 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.17.0 > > > context: > https://lists.apache.org/thread.html/51e000f16481451c207c00ac5e881aa4a46fa020922eddffd00ad527@%3Cdev.beam.apache.org%3E > Setting fix version to 2.16.0 to understand the cause, hopefully before the > vote. > cc: [~altay] [~thw] [~markflyhigh] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8174) BigQueryIO clustering documentation is incorrect and lacking
[ https://issues.apache.org/jira/browse/BEAM-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978075#comment-16978075 ] Kenneth Knowles commented on BEAM-8174: --- What is the status of this? It is blocking 2.17.0 and I agree it would be nice to have good javadoc for this. > BigQueryIO clustering documentation is incorrect and lacking > > > Key: BEAM-8174 > URL: https://issues.apache.org/jira/browse/BEAM-8174 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.15.0 >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Trivial > Labels: documentation > Fix For: 2.17.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > I noticed that the Java doc of the clustering feature in BigQueryIO is more a > copy/paste from the timestamp method. This needs to be corrected. > The Clustering option should also be added to the BigQueryIO page. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8363) Nexmark regression in direct runner in streaming mode
[ https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978072#comment-16978072 ] Kenneth Knowles commented on BEAM-8363: --- I inspected the history of sdks/java/core and did not find anything obvious. I believe this will require a git bisect. > Nexmark regression in direct runner in streaming mode > - > > Key: BEAM-8363 > URL: https://issues.apache.org/jira/browse/BEAM-8363 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.16.0 >Reporter: Mark Liu >Assignee: Shehzaad Nakhoda >Priority: Critical > Fix For: 2.17.0 > > Attachments: regression_screenshot.png > > > From Nexmark performance dashboard for direct runner: > https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, > there is a regression for streaming mode happened on August 25. > The runtime increased about 25% in two days. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8363) Nexmark regression in direct runner in streaming mode
[ https://issues.apache.org/jira/browse/BEAM-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978070#comment-16978070 ] Kenneth Knowles commented on BEAM-8363: --- The main difference is the use of a custom UnboundedSource if isStreaming() is set. > Nexmark regression in direct runner in streaming mode > - > > Key: BEAM-8363 > URL: https://issues.apache.org/jira/browse/BEAM-8363 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.16.0 >Reporter: Mark Liu >Assignee: Shehzaad Nakhoda >Priority: Critical > Fix For: 2.17.0 > > Attachments: regression_screenshot.png > > > From Nexmark performance dashboard for direct runner: > https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424, > there is a regression for streaming mode happened on August 25. > The runtime increased about 25% in two days. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978069#comment-16978069 ] Kenneth Knowles commented on BEAM-8368: --- I see a linked PR merged into 2.17.0 branch. Is this done now? > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8384) Spark runner is not respecting spark.default.parallelism user defined configuration
[ https://issues.apache.org/jira/browse/BEAM-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978068#comment-16978068 ] Kenneth Knowles commented on BEAM-8384: --- [~iemejia] is this in progress? Is there a chance for it to be done for 2.17.0? Is it a regression? > Spark runner is not respecting spark.default.parallelism user defined > configuration > --- > > Key: BEAM-8384 > URL: https://issues.apache.org/jira/browse/BEAM-8384 > Project: Beam > Issue Type: Bug > Components: runner-spark >Affects Versions: 2.16.0 >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Major > Fix For: 2.17.0 > > > It was reported in [the mailing > list|https://lists.apache.org/thread.html/792fb7fc2a5113837fbcdafce6a5d9100309881b366c1a7163d2c898@%3Cdev.beam.apache.org%3E] > that the Spark runner is not respecting the user defined Spark default > parallelism configuration. We should investigate and if it is the case ensure > that a user defined configuration is always respected. Runner optimizations > should apply only for default (unconfigured) values otherwise we will confuse > users and limit them from parametrizing Spark for their best convenience. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978067#comment-16978067 ] Kenneth Knowles commented on BEAM-8504: --- [~kanterov] would you drive getting a cherrypick in since you requested? > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346446 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 03:44 Start Date: 20/Nov/19 03:44 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348279216 ## File path: sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py ## @@ -107,6 +135,17 @@ def _top_level_transforms(self): top_level_transform_proto = transforms[top_level_transform_id] yield top_level_transform_id, top_level_transform_proto + def _decorate(self, value): +"""Decorates label-ish values used for rendering in dot language. + +Escapes special characters in the given str value for dot language. Please +escape all PTransform unique names when building dot representation. +Otherwise, special characters will break the graph rendered. Review comment: Is this statement still valid? Does 'value' need to be escaped any more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346446) Time Spent: 5h 20m (was: 5h 10m) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346447 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 03:44 Start Date: 20/Nov/19 03:44 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348279347 ## File path: sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py ## @@ -55,9 +63,17 @@ def __init__(self, pipeline: (Pipeline proto) or (Pipeline) pipeline to be rendered. default_vertex_attrs: (Dict[str, str]) a dict of default vertex attributes default_edge_attrs: (Dict[str, str]) a dict of default edge attributes + render_option: (str) this parameter decides how the pipeline graph is + rendered. See display.pipeline_graph_renderer for available options. """ self._lock = threading.Lock() self._graph = None +self._pin = None Review comment: Can we also spell out this one? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346447) Time Spent: 5h 20m (was: 5h 10m) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346439 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 03:13 Start Date: 20/Nov/19 03:13 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #10166: [BEAM-7390] Add code snippet for Latest URL: https://github.com/apache/beam/pull/10166 Adding the code snippets for `Latest`. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=346438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346438 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 03:11 Start Date: 20/Nov/19 03:11 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #10165: [BEAM-7390] Add code snippet for GroupIntoBatches URL: https://github.com/apache/beam/pull/10165 Adding the code snippets for `GroupIntoBatches`. > Questions: > * What is the difference between `GroupIntoBatches` and `BatchElements`? > * Which is the recommended approach? > * I found that `BatchElements` was both more intuitive and flexible since it doesn't require a key, should we get rid of the `GroupIntoBatches` example in favor of `BatchElements` or is there a scenario where `GroupIntoBatches` make more sense? Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs
[ https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cavazos updated BEAM-8784: Description: PyDoc: [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top] The combiners.Top PyDoc still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. was: The [combiners.Top PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. > Remove deprecated 'compare' argument from combiners.Top in PyDocs > - > > Key: BEAM-8784 > URL: https://issues.apache.org/jira/browse/BEAM-8784 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: David Cavazos >Priority: Trivial > > PyDoc: > [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top] > The combiners.Top PyDoc still shows the `compare` argument as usable. It is > deprecated and results in an error in Python 3, but the PyDoc does not > reflect that. > It should say that the argument is deprecated in favor of using the `key` and > `reverse` arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs
[ https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cavazos updated BEAM-8784: Description: The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top] []|#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. was: The [combiners.Top PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. > Remove deprecated 'compare' argument from combiners.Top in PyDocs > - > > Key: BEAM-8784 > URL: https://issues.apache.org/jira/browse/BEAM-8784 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: David Cavazos >Priority: Trivial > > The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] > [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top] > []|#apache_beam.transforms.combiners.Top]] still shows the `compare` > argument as usable. It is deprecated and results in an error in Python 3, but > the PyDoc does not reflect that. > It should say that the argument is deprecated in favor of using the `key` and > `reverse` arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs
[ https://issues.apache.org/jira/browse/BEAM-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Cavazos updated BEAM-8784: Description: The [combiners.Top PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. was: The [[combiners.Top PyDoc||#apache_beam.transforms.combiners.Top]] [https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top] []|#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. > Remove deprecated 'compare' argument from combiners.Top in PyDocs > - > > Key: BEAM-8784 > URL: https://issues.apache.org/jira/browse/BEAM-8784 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: David Cavazos >Priority: Trivial > > The [combiners.Top > PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]] > still shows the `compare` argument as usable. It is deprecated and results > in an error in Python 3, but the PyDoc does not reflect that. > It should say that the argument is deprecated in favor of using the `key` and > `reverse` arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8784) Remove deprecated 'compare' argument from combiners.Top in PyDocs
David Cavazos created BEAM-8784: --- Summary: Remove deprecated 'compare' argument from combiners.Top in PyDocs Key: BEAM-8784 URL: https://issues.apache.org/jira/browse/BEAM-8784 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: David Cavazos The [combiners.Top PyDoc|[https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.transforms.combiners.html?#apache_beam.transforms.combiners.Top]] still shows the `compare` argument as usable. It is deprecated and results in an error in Python 3, but the PyDoc does not reflect that. It should say that the argument is deprecated in favor of using the `key` and `reverse` arguments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346430 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 02:44 Start Date: 20/Nov/19 02:44 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#issuecomment-555810399 > I would say always avoid abbreviations that can be confusing. Perhaps just spell it out? >_pipeline_instrument Spelled out all variable names of this in the module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346430) Time Spent: 5h 10m (was: 5h) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8737) beam_Dependency_Check is missing bigtable-client-core as high priority items
[ https://issues.apache.org/jira/browse/BEAM-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomo Suzuki closed BEAM-8737. - Fix Version/s: Not applicable Resolution: Fixed > beam_Dependency_Check is missing bigtable-client-core as high priority items > > > Key: BEAM-8737 > URL: https://issues.apache.org/jira/browse/BEAM-8737 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Fix For: Not applicable > > Time Spent: 50m > Remaining Estimate: 0h > > beam_Dependency_Check is missing high priority items such as > bigtable-client-core. > For example, > https://builds.apache.org/job/beam_Dependency_Check/235/consoleFull sent > email but the email did not contain bigtable-client-core. The current version > 1.8.0 is 4-minor-version older than the latest 1.12.1. > Initially I was suspecting the line with {{high_priority_deps.append}}: > {noformat} > if (version_comparer.compare_dependency_versions(curr_ver, latest_ver) > or > compare_dependency_release_dates(curr_release_date, > latest_release_date)): > # Create a new issue or update on the existing issue > jira_issue = jira_manager.run(dep_name, curr_ver, latest_ver, > sdk_type, group_id = group_id) > if (jira_issue.fields.status.name == 'Open' or > jira_issue.fields.status.name == 'Reopened'): > dep_info += "{1}".format( > ReportGeneratorConfig.BEAM_JIRA_HOST+"browse/"+ jira_issue.key, > jira_issue.key) > high_priority_deps.append(dep_info) > {noformat} > and 2nd run would include the artifact in the email. But it did not: > https://builds.apache.org/job/beam_Dependency_Check/237/ > {noformat} > 15:43:02 Start processing: - com.google.cloud.bigtable:bigtable-client-core > [1.8.0 -> 1.12.1] > 15:43:02 > 15:43:02 INFO:root:Finding release date of > com.google.cloud.bigtable:bigtable-client-core 1.8.0 from the Maven Central > 15:43:02 INFO:root:Finding release date of > com.google.cloud.bigtable:bigtable-client-core 1.12.1 from the Maven Central > 15:43:03 INFO:root:Start handling the JIRA issues for Java dependency: > com.google.cloud.bigtable:bigtable-client-core 1.12.1 > 15:43:04 INFO:root:The parent issue BEAM-8690 is not opening. Attempt > reopening the issue > 15:43:04 Traceback (most recent call last): > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/jira_utils/jira_manager.py", > line 90, in run > 15:43:04 self.jira.reopen_issue(parent_issue) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/jira_utils/jira_client.py", > line 130, in reopen_issue > 15:43:04 self.jira.transition_issue(issue.key, 3) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/client.py", > line 126, in wrapper > 15:43:04 result = func(*arg_list, **kwargs) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/client.py", > line 1578, in transition_issue > 15:43:04 url, data=json.dumps(data)) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py", > line 154, in post > 15:43:04 return self.__verb('POST', url, **kwargs) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py", > line 147, in __verb > 15:43:04 raise_on_error(response, verb=verb, **kwargs) > 15:43:04 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/dependency/check/lib/python3.5/site-packages/jira/resilientsession.py", > line 57, in raise_on_error > 15:43:04 r.status_code, error, r.url, request=request, response=r, > **kwargs) > 15:43:04 jira.exceptions.JIRAError: JiraError HTTP 400 url: > https://issues.apache.org/jira/rest/api/2/issue/BEAM-8690/transitions > 15:43:04 text: It seems that you have tried to perform a workflow > operation (Reopen Issue) that is not valid for the current state of this > issue (BEAM-8690). The likely cause is that somebody has changed the issue > recently, please look at the issue history for details. > 15:43:04 > 15:43:04 response headers = {'X-AREQUESTID': '1243x60347388x3', 'Date': > 'Mon, 18 Nov 2019 20:43:04 GMT', 'X-AUSERNAME': , 'Cache-Control': > 'no-cache, no-store, no-transform', 'Server': 'Apache', 'Connection': > 'close', 'X-XSS-Protection': '1; mode=block', 'Content-Type':
[jira] [Closed] (BEAM-8744) Fix error in Beam Dependency Check Report
[ https://issues.apache.org/jira/browse/BEAM-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomo Suzuki closed BEAM-8744. - Fix Version/s: Not applicable Resolution: Fixed > Fix error in Beam Dependency Check Report > - > > Key: BEAM-8744 > URL: https://issues.apache.org/jira/browse/BEAM-8744 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > The latest changes had a bug. > https://builds.apache.org/job/beam_Dependency_Check/238/console > {noformat} > 12:41:11 python -m dependency_check.dependency_check_report_generator_test > 12:41:11 Traceback (most recent call last): > 12:41:11 File "/usr/lib/python3.5/runpy.py", line 184, in > _run_module_as_main > 12:41:11 "__main__", mod_spec) > 12:41:11 File "/usr/lib/python3.5/runpy.py", line 85, in _run_code > 12:41:11 exec(code, run_globals) > 12:41:11 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/dependency_check/dependency_check_report_generator_test.py", > line 26, in > 12:41:11 from .dependency_check_report_generator import > prioritize_dependencies > 12:41:11 File > "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/.test-infra/jenkins/dependency_check/dependency_check_report_generator.py", > line 146 > 12:41:11 if (jira_issue and jira_issue.fields.status.name in ['Open', > 'Reopened', 'TRIAGE NEEDED'): > 12:41:11 >^ > 12:41:11 SyntaxError: invalid syntax > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346410 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 01:49 Start Date: 20/Nov/19 01:49 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348257200 ## File path: sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py ## @@ -107,6 +135,16 @@ def _top_level_transforms(self): top_level_transform_proto = transforms[top_level_transform_id] yield top_level_transform_id, top_level_transform_proto + def _decorate(self, value): +"""Decorates label-ish values used for rendering in dot language. + +Escapes special characters in the given str value for dot language. '"' is Review comment: Thanks, David! Yes, `'\\"'` works for escaping `"` and I'll add a test for this class. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346410) Time Spent: 5h (was: 4h 50m) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346399=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346399 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348250056 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) + +assert_that(main, equal_to(['a', 2])) +p.run() Review comment: No p.run needed when using the `with` statement. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346399) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346402=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346402 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348249550 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): Review comment: I would call this something like test_gbk_many_values or similar. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346402) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346401=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346401 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348250260 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: Actually, a better test would be to ensure no more than N (for some value of N < number of elements) instances of the value type are alive at any given moment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346401) Time Spent: 6h 10m (was: 6h) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346400=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346400 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348249978 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) Review comment: Rather than create all the values in memory, I'd create these with a DoFn. E.g. beam.Create([None]) | beam.FlatMap(lambda x: ((x, 1) for _ in range(2))) | ... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346400) Time Spent: 6h (was: 5h 50m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346398 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348249763 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() Review comment: As before, no need to name the GBK (or others). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346398) Time Spent: 5h 50m (was: 5h 40m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346397=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346397 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 20/Nov/19 01:18 Start Date: 20/Nov/19 01:18 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10143: [BEAM-8645] To test state backed iterable coder in py sdk. URL: https://github.com/apache/beam/pull/10143#discussion_r348249348 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -1579,6 +1580,27 @@ def test_lull_logging(self): '.*There has been a processing lull of over.*', 'Unable to find a lull logged for this job.') +@attr('ValidatesRunner') +class FnApiBasedStateBackedCoderTest(unittest.TestCase): + def create_pipeline(self): +return beam.Pipeline( +runner=fn_api_runner.FnApiRunner(use_state_iterables=True)) + + def test_state_backed_coder(self): +class MyDoFn(beam.DoFn): + def process(self, gbk_result): +value_list = gbk_result[1] +return (gbk_result[0], sum(value_list)) + +with self.create_pipeline() as p: + # The number of integers could be a knob to test against + # different runners' default settings on page size. + main = (p | 'main' >> beam.Create([('a', 1) for _ in range(0, 2)]) + | 'GBK' >> beam.GroupByKey() + | 'Sum' >> beam.ParDo(MyDoFn())) Review comment: This could just be beam.MapTuple(lambda key, values: sum(values)) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346397) Time Spent: 5h 40m (was: 5.5h) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=346395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346395 ] ASF GitHub Bot logged work on BEAM-8658: Author: ASF GitHub Bot Created on: 20/Nov/19 01:11 Start Date: 20/Nov/19 01:11 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10163: [BEAM-8658] [BEAM-8781] Optionally set jar and artifact staging port … URL: https://github.com/apache/beam/pull/10163 …in FlinkUberJarJobServer Pass around options to fix: - BEAM-8658: add an `artifact_port` option. We need this because we need to expose a static port to use for artifact staging on Kubernetes (AFAIK gRPC does not support in-process mode on Python). - BEAM-8781: respect the existing `flink_job_server_jar` option. I wanted this so I could download the released job server jar from Maven instead of having to build it locally. This allows me to override the default behavior for .dev, which is to fail: https://github.com/apache/beam/blob/0b415fd7e9dd5c80034ca237e08c3959ec78ffe3/sdks/python/apache_beam/utils/subprocess_server.py#L176-L181 We could consider changing that behavior (as indicated in the TODO), but it matters less if you can get around it if you really want to. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build
[jira] [Created] (BEAM-8783) Document Python SDK pickling
Udi Meiri created BEAM-8783: --- Summary: Document Python SDK pickling Key: BEAM-8783 URL: https://issues.apache.org/jira/browse/BEAM-8783 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8782) Python typehints: with_output_types breaks multi-output dofns
[ https://issues.apache.org/jira/browse/BEAM-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-8782: Status: Open (was: Triage Needed) > Python typehints: with_output_types breaks multi-output dofns > - > > Key: BEAM-8782 > URL: https://issues.apache.org/jira/browse/BEAM-8782 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > > {code} > def test_typed_multi_pardo(self): > p = TestPipeline() > res = (p >| beam.Create([1, 2, 3]) >| beam.Map(lambda e: e).with_outputs().with_output_types(int)) > self.assertIsNotNone(res[None].element_type) > res_main = (res[None] > | 'id_none' >> beam.ParDo(lambda e: > [e]).with_input_types(int)) > assert_that(res_main, equal_to([1, 2, 3]), label='none_check') > p.run() > {code} > Fails with: > {code} > typed_pipeline_test.py:212: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../pvalue.py:113: in __or__ > return self.pipeline.apply(ptransform, self) > ../pipeline.py:528: in apply > transform.type_check_outputs(pvalueish_result) > ../transforms/ptransform.py:386: in type_check_outputs > self.type_check_inputs_or_outputs(pvalueish, 'output') > ../transforms/ptransform.py:401: in type_check_inputs_or_outputs > if pvalue_.element_type is None: > ../pvalue.py:241: in __getattr__ > return self[tag] > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = label=[Map()]> at 0x7fa9513f3048> > tag = 'element_type' > def __getitem__(self, tag): > # Accept int tags so that we can look at Partition tags with the > # same ints that we used in the partition function. > # TODO(gildea): Consider requiring string-based tags everywhere. > # This will require a partition function that does not return ints. > if isinstance(tag, int): > tag = str(tag) > if tag == self._main_tag: > tag = None > elif self._tags and tag not in self._tags: > raise ValueError( > "Tag '%s' is neither the main tag '%s' " > "nor any of the tags %s" % ( > tag, self._main_tag, self._tags)) > # Check if we accessed this tag before. > if tag in self._pcolls: > return self._pcolls[tag] > > if tag is not None: > self._transform.output_tags.add(tag) > pcoll = PCollection(self._pipeline, tag=tag, > element_type=typehints.Any) > # Transfer the producer from the DoOutputsTuple to the resulting > # PCollection. > > pcoll.producer = self.producer.parts[0] > E AttributeError: 'NoneType' object has no attribute 'parts' > ../pvalue.py:266: AttributeError > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8782) Python typehints: with_output_types breaks multi-output dofns
Udi Meiri created BEAM-8782: --- Summary: Python typehints: with_output_types breaks multi-output dofns Key: BEAM-8782 URL: https://issues.apache.org/jira/browse/BEAM-8782 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri {code} def test_typed_multi_pardo(self): p = TestPipeline() res = (p | beam.Create([1, 2, 3]) | beam.Map(lambda e: e).with_outputs().with_output_types(int)) self.assertIsNotNone(res[None].element_type) res_main = (res[None] | 'id_none' >> beam.ParDo(lambda e: [e]).with_input_types(int)) assert_that(res_main, equal_to([1, 2, 3]), label='none_check') p.run() {code} Fails with: {code} typed_pipeline_test.py:212: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../pvalue.py:113: in __or__ return self.pipeline.apply(ptransform, self) ../pipeline.py:528: in apply transform.type_check_outputs(pvalueish_result) ../transforms/ptransform.py:386: in type_check_outputs self.type_check_inputs_or_outputs(pvalueish, 'output') ../transforms/ptransform.py:401: in type_check_inputs_or_outputs if pvalue_.element_type is None: ../pvalue.py:241: in __getattr__ return self[tag] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = )]> at 0x7fa9513f3048> tag = 'element_type' def __getitem__(self, tag): # Accept int tags so that we can look at Partition tags with the # same ints that we used in the partition function. # TODO(gildea): Consider requiring string-based tags everywhere. # This will require a partition function that does not return ints. if isinstance(tag, int): tag = str(tag) if tag == self._main_tag: tag = None elif self._tags and tag not in self._tags: raise ValueError( "Tag '%s' is neither the main tag '%s' " "nor any of the tags %s" % ( tag, self._main_tag, self._tags)) # Check if we accessed this tag before. if tag in self._pcolls: return self._pcolls[tag] if tag is not None: self._transform.output_tags.add(tag) pcoll = PCollection(self._pipeline, tag=tag, element_type=typehints.Any) # Transfer the producer from the DoOutputsTuple to the resulting # PCollection. > pcoll.producer = self.producer.parts[0] E AttributeError: 'NoneType' object has no attribute 'parts' ../pvalue.py:266: AttributeError {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346391 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 01:05 Start Date: 20/Nov/19 01:05 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348243649 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py ## @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options): return self._underlying_runner.apply(transform, pvalueish, options) def run_pipeline(self, pipeline, options): -if not hasattr(self, '_desired_cache_labels'): - self._desired_cache_labels = set() - -# Invoke a round trip through the runner API. This makes sure the Pipeline -# proto is stable. -pipeline = beam.pipeline.Pipeline.from_runner_api( -pipeline.to_runner_api(use_fake_coders=True), -pipeline.runner, -options) - -# Snapshot the pipeline in a portable proto before mutating it. -pipeline_proto, original_context = pipeline.to_runner_api( -return_context=True, use_fake_coders=True) -pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context) - -analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager, - pipeline_proto, - self._underlying_runner, - options, - self._desired_cache_labels) -# Should be only accessed for debugging purpose. -self._analyzer = analyzer +pin = inst.pin(pipeline, options) pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api( -analyzer.pipeline_proto_to_execute(), +pin.instrumented_pipeline_proto(), self._underlying_runner, options) if not self._skip_display: - display = display_manager.DisplayManager( - pipeline_proto=pipeline_proto, - pipeline_analyzer=analyzer, - cache_manager=self._cache_manager, - pipeline_graph_renderer=self._renderer) - display.start_periodic_update() + pg = pipeline_graph.PipelineGraph(pin.original_pipeline, +render_option=self._render_option) + pg.display_graph() result = pipeline_to_execute.run() result.wait_until_finish() -if not self._skip_display: - display.stop_periodic_update() - -return PipelineResult(result, self, self._analyzer.pipeline_info(), - self._cache_manager, pcolls_to_pcoll_id) - - def _pcolls_to_pcoll_id(self, pipeline, original_context): -"""Returns a dict mapping PCollections string to PCollection IDs. - -Using a PipelineVisitor to iterate over every node in the pipeline, -records the mapping from PCollections to PCollections IDs. This mapping -will be used to query cached PCollections. - -Args: - pipeline: (pipeline.Pipeline) - original_context: (pipeline_context.PipelineContext) - -Returns: - (dict from str to str) a dict mapping str(pcoll) to pcoll_id. -""" -pcolls_to_pcoll_id = {} - -from apache_beam.pipeline import PipelineVisitor # pylint: disable=import-error - -class PCollVisitor(PipelineVisitor): # pylint: disable=used-before-assignment - A visitor that records input and output values to be replaced. - - Input and output values that should be updated are recorded in maps - input_replacements and output_replacements respectively. - - We cannot update input and output values while visiting since that - results in validation errors. - """ - - def enter_composite_transform(self, transform_node): -self.visit_transform(transform_node) - - def visit_transform(self, transform_node): -for pcoll in transform_node.outputs.values(): - pcolls_to_pcoll_id[str(pcoll)] = original_context.pcollections.get_id( - pcoll) - -pipeline.visit(PCollVisitor()) -return pcolls_to_pcoll_id +return PipelineResult(result, pin) class PipelineResult(beam.runners.runner.PipelineResult): """Provides access to information about a pipeline.""" - def __init__(self, underlying_result, runner, pipeline_info, cache_manager, - pcolls_to_pcoll_id): + def __init__(self, underlying_result, pin): super(PipelineResult, self).__init__(underlying_result.state) -self._runner = runner -self._pipeline_info = pipeline_info -self._cache_manager = cache_manager -self._pcolls_to_pcoll_id = pcolls_to_pcoll_id - - def _cache_label(self, pcoll): -
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346390 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 01:05 Start Date: 20/Nov/19 01:05 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348242820 ## File path: sdks/python/apache_beam/runners/interactive/display/pipeline_graph.py ## @@ -107,6 +135,16 @@ def _top_level_transforms(self): top_level_transform_proto = transforms[top_level_transform_id] yield top_level_transform_id, top_level_transform_proto + def _decorate(self, value): +"""Decorates label-ish values used for rendering in dot language. + +Escapes special characters in the given str value for dot language. '"' is Review comment: https://www.graphviz.org/doc/info/lang.html says '"' can be escaped with \\", and contrary to the comment, " is the only escaped character in dot and a \ not followed by a " is a literal \ . So perhaps we should change the code to return `'"{}"'.format(value.replace('"', '\\"'))` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346390) Time Spent: 4.5h (was: 4h 20m) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=346393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346393 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 01:05 Start Date: 20/Nov/19 01:05 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#discussion_r348243649 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py ## @@ -117,123 +119,43 @@ def apply(self, transform, pvalueish, options): return self._underlying_runner.apply(transform, pvalueish, options) def run_pipeline(self, pipeline, options): -if not hasattr(self, '_desired_cache_labels'): - self._desired_cache_labels = set() - -# Invoke a round trip through the runner API. This makes sure the Pipeline -# proto is stable. -pipeline = beam.pipeline.Pipeline.from_runner_api( -pipeline.to_runner_api(use_fake_coders=True), -pipeline.runner, -options) - -# Snapshot the pipeline in a portable proto before mutating it. -pipeline_proto, original_context = pipeline.to_runner_api( -return_context=True, use_fake_coders=True) -pcolls_to_pcoll_id = self._pcolls_to_pcoll_id(pipeline, original_context) - -analyzer = pipeline_analyzer.PipelineAnalyzer(self._cache_manager, - pipeline_proto, - self._underlying_runner, - options, - self._desired_cache_labels) -# Should be only accessed for debugging purpose. -self._analyzer = analyzer +pin = inst.pin(pipeline, options) pipeline_to_execute = beam.pipeline.Pipeline.from_runner_api( -analyzer.pipeline_proto_to_execute(), +pin.instrumented_pipeline_proto(), self._underlying_runner, options) if not self._skip_display: - display = display_manager.DisplayManager( - pipeline_proto=pipeline_proto, - pipeline_analyzer=analyzer, - cache_manager=self._cache_manager, - pipeline_graph_renderer=self._renderer) - display.start_periodic_update() + pg = pipeline_graph.PipelineGraph(pin.original_pipeline, +render_option=self._render_option) + pg.display_graph() result = pipeline_to_execute.run() result.wait_until_finish() -if not self._skip_display: - display.stop_periodic_update() - -return PipelineResult(result, self, self._analyzer.pipeline_info(), - self._cache_manager, pcolls_to_pcoll_id) - - def _pcolls_to_pcoll_id(self, pipeline, original_context): -"""Returns a dict mapping PCollections string to PCollection IDs. - -Using a PipelineVisitor to iterate over every node in the pipeline, -records the mapping from PCollections to PCollections IDs. This mapping -will be used to query cached PCollections. - -Args: - pipeline: (pipeline.Pipeline) - original_context: (pipeline_context.PipelineContext) - -Returns: - (dict from str to str) a dict mapping str(pcoll) to pcoll_id. -""" -pcolls_to_pcoll_id = {} - -from apache_beam.pipeline import PipelineVisitor # pylint: disable=import-error - -class PCollVisitor(PipelineVisitor): # pylint: disable=used-before-assignment - A visitor that records input and output values to be replaced. - - Input and output values that should be updated are recorded in maps - input_replacements and output_replacements respectively. - - We cannot update input and output values while visiting since that - results in validation errors. - """ - - def enter_composite_transform(self, transform_node): -self.visit_transform(transform_node) - - def visit_transform(self, transform_node): -for pcoll in transform_node.outputs.values(): - pcolls_to_pcoll_id[str(pcoll)] = original_context.pcollections.get_id( - pcoll) - -pipeline.visit(PCollVisitor()) -return pcolls_to_pcoll_id +return PipelineResult(result, pin) class PipelineResult(beam.runners.runner.PipelineResult): """Provides access to information about a pipeline.""" - def __init__(self, underlying_result, runner, pipeline_info, cache_manager, - pcolls_to_pcoll_id): + def __init__(self, underlying_result, pin): super(PipelineResult, self).__init__(underlying_result.state) -self._runner = runner -self._pipeline_info = pipeline_info -self._cache_manager = cache_manager -self._pcolls_to_pcoll_id = pcolls_to_pcoll_id - - def _cache_label(self, pcoll): -
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346382 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 20/Nov/19 00:44 Start Date: 20/Nov/19 00:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#discussion_r348241661 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool { // Stop the SDK worker. rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {} } + +// Request from runner to SDK Harness asking for its status. +message WorkerStatusRequest { + // (Required) Unique ID identifying this request. + string request_id = 1; +} + +// Response from SDK Harness to runner containing the debug related status info. +message WorkerStatusResponse { + // (Required) Unique ID from the original request. + string request_id = 1; Review comment: ```suggestion string id = 1; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346382) Time Spent: 2h (was: 1h 50m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346384 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 20/Nov/19 00:44 Start Date: 20/Nov/19 00:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#discussion_r348242096 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool { // Stop the SDK worker. rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {} } + +// Request from runner to SDK Harness asking for its status. +message WorkerStatusRequest { + // (Required) Unique ID identifying this request. + string request_id = 1; +} + +// Response from SDK Harness to runner containing the debug related status info. +message WorkerStatusResponse { + // (Required) Unique ID from the original request. + string request_id = 1; + // (Optional) Error message if exception encountered generating the status response. + string error = 2; + // (Optional) Status debugging info reported by SDK harness worker. + string status_info = 3; +} + +// Fn Api for SDK harness to report its debug-related statuses to runner. Review comment: ```suggestion // API for SDKs to report debug-related statuses to runner during pipeline execution. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346384) Time Spent: 2h 20m (was: 2h 10m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346383=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346383 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 20/Nov/19 00:44 Start Date: 20/Nov/19 00:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#discussion_r348241948 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool { // Stop the SDK worker. rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {} } + +// Request from runner to SDK Harness asking for its status. +message WorkerStatusRequest { + // (Required) Unique ID identifying this request. + string request_id = 1; +} + +// Response from SDK Harness to runner containing the debug related status info. +message WorkerStatusResponse { + // (Required) Unique ID from the original request. + string request_id = 1; + // (Optional) Error message if exception encountered generating the status response. + string error = 2; + // (Optional) Status debugging info reported by SDK harness worker. Review comment: You need to describe the format and what needs to be part of the response otherwise SDK and runner authors can't tell what they are supposed to do with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346383) Time Spent: 2h 10m (was: 2h) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346381 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 20/Nov/19 00:44 Start Date: 20/Nov/19 00:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#discussion_r348241528 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool { // Stop the SDK worker. rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {} } + +// Request from runner to SDK Harness asking for its status. +message WorkerStatusRequest { + // (Required) Unique ID identifying this request. + string request_id = 1; Review comment: ```suggestion string id = 1; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346381) Time Spent: 1h 50m (was: 1h 40m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346385 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 20/Nov/19 00:44 Start Date: 20/Nov/19 00:44 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#discussion_r348241604 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -872,3 +872,25 @@ service BeamFnExternalWorkerPool { // Stop the SDK worker. rpc StopWorker (StopWorkerRequest) returns (StopWorkerResponse) {} } + +// Request from runner to SDK Harness asking for its status. Review comment: Please add the link to the design doc for more details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346385) Time Spent: 2h 20m (was: 2h 10m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8623) Add additional message field to Provision API response for passing status endpoint
[ https://issues.apache.org/jira/browse/BEAM-8623?focusedWorklogId=346380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346380 ] ASF GitHub Bot logged work on BEAM-8623: Author: ASF GitHub Bot Created on: 20/Nov/19 00:38 Start Date: 20/Nov/19 00:38 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10075: [BEAM-8623] Add status_endpoint field to provision api ProvisionInfo URL: https://github.com/apache/beam/pull/10075#discussion_r348240469 ## File path: model/fn-execution/src/main/proto/beam_provision_api.proto ## @@ -71,6 +72,11 @@ message ProvisionInfo { // (required) The artifact retrieval token produced by // ArtifactStagingService.CommitManifestResponse. string retrieval_token = 6; + +// (optional) The endpoint for runner to use for hosting the worker status Review comment: please update comment to: ``` (optional) The endpoint that the runner is hosting for the SDK to submit status reports to during pipeline execution. This field will only be populated if the runner supports SDK status reports. For more details see {link to design doc}. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346380) Time Spent: 0.5h (was: 20m) > Add additional message field to Provision API response for passing status > endpoint > -- > > Key: BEAM-8623 > URL: https://issues.apache.org/jira/browse/BEAM-8623 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8781) FlinkUberJarJobServer should respect --flink_job_server_jar
[ https://issues.apache.org/jira/browse/BEAM-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8781: -- Status: Open (was: Triage Needed) > FlinkUberJarJobServer should respect --flink_job_server_jar > --- > > Key: BEAM-8781 > URL: https://issues.apache.org/jira/browse/BEAM-8781 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel
[ https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=346376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346376 ] ASF GitHub Bot logged work on BEAM-7049: Author: ASF GitHub Bot Created on: 20/Nov/19 00:28 Start Date: 20/Nov/19 00:28 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #9358: (WIP-BEAM-7049)Changes made to make a simple case of threeway union work URL: https://github.com/apache/beam/pull/9358#issuecomment-555778972 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346376) Time Spent: 1h 20m (was: 1h 10m) > Merge multiple input to one BeamUnionRel > > > Key: BEAM-7049 > URL: https://issues.apache.org/jira/browse/BEAM-7049 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: sridhar Reddy >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` > will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If > BeamUnionRel can handle multiple shuffles, we will have only one shuffle -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8143) Provide a simple LineSource built on top of SDF
[ https://issues.apache.org/jira/browse/BEAM-8143?focusedWorklogId=346375=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346375 ] ASF GitHub Bot logged work on BEAM-8143: Author: ASF GitHub Bot Created on: 20/Nov/19 00:28 Start Date: 20/Nov/19 00:28 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #9366: [BEAM-8143] Build simple LineSource directly on top of SDF URL: https://github.com/apache/beam/pull/9366#issuecomment-555778965 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346375) Time Spent: 1h 10m (was: 1h) > Provide a simple LineSource built on top of SDF > --- > > Key: BEAM-8143 > URL: https://issues.apache.org/jira/browse/BEAM-8143 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8143) Provide a simple LineSource built on top of SDF
[ https://issues.apache.org/jira/browse/BEAM-8143?focusedWorklogId=346377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346377 ] ASF GitHub Bot logged work on BEAM-8143: Author: ASF GitHub Bot Created on: 20/Nov/19 00:28 Start Date: 20/Nov/19 00:28 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #9366: [BEAM-8143] Build simple LineSource directly on top of SDF URL: https://github.com/apache/beam/pull/9366 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346377) Time Spent: 1h 20m (was: 1h 10m) > Provide a simple LineSource built on top of SDF > --- > > Key: BEAM-8143 > URL: https://issues.apache.org/jira/browse/BEAM-8143 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7049) Merge multiple input to one BeamUnionRel
[ https://issues.apache.org/jira/browse/BEAM-7049?focusedWorklogId=346378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346378 ] ASF GitHub Bot logged work on BEAM-7049: Author: ASF GitHub Bot Created on: 20/Nov/19 00:28 Start Date: 20/Nov/19 00:28 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #9358: (WIP-BEAM-7049)Changes made to make a simple case of threeway union work URL: https://github.com/apache/beam/pull/9358 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346378) Time Spent: 1.5h (was: 1h 20m) > Merge multiple input to one BeamUnionRel > > > Key: BEAM-7049 > URL: https://issues.apache.org/jira/browse/BEAM-7049 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Rui Wang >Assignee: sridhar Reddy >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` > will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If > BeamUnionRel can handle multiple shuffles, we will have only one shuffle -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=346373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346373 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 20/Nov/19 00:27 Start Date: 20/Nov/19 00:27 Worklog Time Spent: 10m Work Description: udim commented on issue #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#issuecomment-555778714 still working on this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346373) Time Spent: 12h 10m (was: 12h) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 12h 10m > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=346369=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346369 ] ASF GitHub Bot logged work on BEAM-8251: Author: ASF GitHub Bot Created on: 20/Nov/19 00:13 Start Date: 20/Nov/19 00:13 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb worker_(region|zone) to Environment proto URL: https://github.com/apache/beam/pull/10150#issuecomment-555775063 > If a region/zone are not set up, then None/null will be set - Dataflow is able to handle these properly? Good question -- doesn't look like it. I'm planning on running tests against staging before submitting this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346369) Time Spent: 50m (was: 40m) > Add worker_region and worker_zone options > - > > Key: BEAM-8251 > URL: https://issues.apache.org/jira/browse/BEAM-8251 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We are refining the way the user specifies worker regions and zones to the > Dataflow service. We need to add worker_region and worker_zone pipeline > options that will be preferred over the old experiments=worker_region and > --zone flags. I will create subtasks for adding these options to each SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346365 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 19/Nov/19 23:58 Start Date: 19/Nov/19 23:58 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-555771367 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346365) Time Spent: 1h 40m (was: 1.5h) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=346364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346364 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 19/Nov/19 23:55 Start Date: 19/Nov/19 23:55 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-555770721 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346364) Time Spent: 1.5h (was: 1h 20m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346355 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:37 Start Date: 19/Nov/19 23:37 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348225946 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -878,6 +1033,7 @@ def get_coder(self, coder_id): json.loads(coder_proto.spec.payload.decode('utf-8'))) def get_windowed_coder(self, pcoll_id): +# type: (str) -> WindowedValueCoder Review comment: Ok, I can change it to `Coder`, but we'll have to come back and change it to `Coder[WindowedValue]` once we make `Coder` generic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346355) Time Spent: 27.5h (was: 27h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 27.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923 ] Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:36 PM: -- While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [2] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls", 0) def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io", 0) threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3] [https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426] was (Author: tvalentyn): While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [2] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls", 0) def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io", 0) def t3(): return __import__("tensorflow.core.example.example_pb2", 0) threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3] [https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426] > Python 3 portable
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=346352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346352 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 19/Nov/19 23:29 Start Date: 19/Nov/19 23:29 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9959: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#issuecomment-555763959 I reworked this a bit to make it easier for other job services to implement state history, and I added some tests for the local job service. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346352) Time Spent: 2h 20m (was: 2h 10m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346351 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:28 Start Date: 19/Nov/19 23:28 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348223233 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -2320,6 +2359,7 @@ def expand(self, pcoll): return super(WindowInto, self).expand(pcoll) def to_runner_api_parameter(self, context): +# type: (PipelineContext) -> typing.Tuple[str, message.Message] Review comment: Oh, yes, this is the method on the transform, not the WindowFn. Fine as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346351) Time Spent: 27h 20m (was: 27h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 27h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=346349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346349 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:26 Start Date: 19/Nov/19 23:26 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161 Simple fix that makes it possible to access a local job service running in docker. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=346350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346350 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:26 Start Date: 19/Nov/19 23:26 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161#issuecomment-555763150 R: @mxm R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346350) Time Spent: 20m (was: 10m) > Allow the local job service to work from inside docker > -- > > Key: BEAM-8746 > URL: https://issues.apache.org/jira/browse/BEAM-8746 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346348 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:25 Start Date: 19/Nov/19 23:25 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348222590 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -878,6 +1033,7 @@ def get_coder(self, coder_id): json.loads(coder_proto.spec.payload.decode('utf-8'))) def get_windowed_coder(self, pcoll_id): +# type: (str) -> WindowedValueCoder Review comment: But it need not. And there's discussions on the list (e.g. the ValueOnlyWindowedValueCoder) to change this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346348) Time Spent: 27h 10m (was: 27h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 27h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346347=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346347 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:24 Start Date: 19/Nov/19 23:24 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348222344 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -328,10 +404,12 @@ class _ConcatIterable(object): Unlike itertools.chain, this allows reiteration. """ def __init__(self, first, second): +# type: (Iterable[Any], Iterable[Any]) -> None Review comment: Yeah, follow-up is fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346347) Time Spent: 27h (was: 26h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 27h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346346=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346346 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:24 Start Date: 19/Nov/19 23:24 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r34844 ## File path: sdks/python/apache_beam/runners/runner.py ## @@ -133,7 +152,10 @@ def run_async(self, transform, options=None): transform(PBegin(p)) return p.run() - def run_pipeline(self, pipeline, options): + def run_pipeline(self, + pipeline, # type: Pipeline + options # type: PipelineOptions + ): Review comment: Ack. Yeah, this should probably just raise NotImplemented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346346) Time Spent: 26h 50m (was: 26h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8151) Allow the Python SDK to use many many threads
[ https://issues.apache.org/jira/browse/BEAM-8151?focusedWorklogId=346345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346345 ] ASF GitHub Bot logged work on BEAM-8151: Author: ASF GitHub Bot Created on: 19/Nov/19 23:23 Start Date: 19/Nov/19 23:23 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10134: [BEAM-8151] Further cleanup of SDK Workers. URL: https://github.com/apache/beam/pull/10134 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346345) Time Spent: 14.5h (was: 14h 20m) > Allow the Python SDK to use many many threads > - > > Key: BEAM-8151 > URL: https://issues.apache.org/jira/browse/BEAM-8151 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Major > Fix For: 2.18.0 > > Time Spent: 14.5h > Remaining Estimate: 0h > > We need to use a thread pool which shrinks the number of active threads when > they are not being used. > > This is to prevent any stuckness issues related to a runner scheduling more > work items then there are "work" threads inside the SDK harness. > > By default the control plane should have all "requests" being processed in > parallel and the runner is responsible for not overloading the SDK with too > much work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923 ] Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:22 PM: -- While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [2] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls", 0) def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io", 0) def t3(): return __import__("tensorflow.core.example.example_pb2", 0) threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3] [https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426] was (Author: tvalentyn): While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [2] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls") def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io") def t3(): return __import__("tensorflow.core.example.example_pb2") threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3]
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=346344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346344 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Nov/19 23:21 Start Date: 19/Nov/19 23:21 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10119: [BEAM-8335] Adds the StreamingCache URL: https://github.com/apache/beam/pull/10119 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346344) Time Spent: 33h 50m (was: 33h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 33h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8781) FlinkUberJarJobServer should respect --flink_job_server_jar
Kyle Weaver created BEAM-8781: - Summary: FlinkUberJarJobServer should respect --flink_job_server_jar Key: BEAM-8781 URL: https://issues.apache.org/jira/browse/BEAM-8781 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Kyle Weaver Assignee: Kyle Weaver -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923 ] Valentyn Tymofieiev edited comment on BEAM-8651 at 11/19/19 11:19 PM: -- While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [2] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls") def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io") def t3(): return __import__("tensorflow.core.example.example_pb2") threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3] [https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426] was (Author: tvalentyn): While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [1] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls") def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io") def t3(): return __import__("tensorflow.core.example.example_pb2") threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3]
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346342=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346342 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:19 Start Date: 19/Nov/19 23:19 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348220566 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -460,8 +558,14 @@ def clear(self): class FnApiUserStateContext(userstate.UserStateContext): """Interface for state and timers from SDK to Fn API servicer of state..""" - def __init__( - self, state_handler, transform_id, key_coder, window_coder, timer_specs): + def __init__(self, + state_handler, + transform_id, # type: str + key_coder, # type: coders.Coder + window_coder, # type: coders.Coder + timer_specs # type: MutableMapping[str, beam_runner_api_pb2.TimerSpec] Review comment: Looks like no. Addressed in my upcoming review commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346342) Time Spent: 26h 40m (was: 26.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346340 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:18 Start Date: 19/Nov/19 23:18 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348220199 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -477,16 +581,22 @@ def __init__( self._key_coder = key_coder self._window_coder = window_coder self._timer_specs = timer_specs -self._timer_receivers = None -self._all_states = {} +self._timer_receivers = None # type: Optional[Dict[str, operations.ConsumerSet]] +self._all_states = {} # type: Dict[tuple, Union[SynchronousBagRuntimeState, SynchronousSetRuntimeState, CombiningValueRuntimeState]] Review comment: I could replace this with `Dict[tuple, userstate.AccumulatingRuntimeState]` but in order for this to work I need to add `_commit` to the abstract methods on this `AccumulatingRuntimeState`, because this method is expected to exist: ```python class FnApiUserStateContext(userstate.UserStateContext): def __init__(self): ... self._all_states = {} # type: Dict[tuple, userstate.AccumulatingRuntimeState] ... def commit(self): # type: () -> None for state in self._all_states.values(): state._commit() ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346340) Time Spent: 26.5h (was: 26h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977923#comment-16977923 ] Valentyn Tymofieiev commented on BEAM-8651: --- While investigating the example from [https://github.com/tensorflow/tfx/issues/928] I observe the following failure mode: SDK worker starts multiple threads. Each thread happens to unpickle some payload, which calls {code:java} dill.loads [1] {code} Dill calls {code:java} pickle.Unpickler.find_class [1] {code} which calls {code:java} __import__ [3] {code} Concurrent import calls cause a deadlock on Python 3 (checked on Python 3.7.5rc1), but not on Python 2.7. Following snippet, with calls extracted from the Chicago taxi pipeline {code:python} import threading def t1(): return __import__("tensorflow_transform.beam.analyzer_impls") def t2(): return __import__("tensorflow_transform.tf_metadata.metadata_io") def t3(): return __import__("tensorflow.core.example.example_pb2") threads = [] threads.append(threading.Thread(target=t1)) threads.append(threading.Thread(target=t2)) threads.append(threading.Thread(target=t3)) for thread in threads: thread.start() {code} fails with {noformat} Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "dealock_repro.py", line 4, in t1 return __import__("tensorflow_transform.beam.analyzer_impls", level=0) File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/__init__.py", line 23, in from tensorflow_transform.output_wrapper import TFTransformOutput File "/home/valentyn/tmp/tfx_master_py37/lib/python3.7/site-packages/tensorflow_transform/output_wrapper.py", line 29, in from tensorflow_transform.tf_metadata import metadata_io File "", line 980, in _find_and_load File "", line 149, in __enter__ File "", line 94, in acquire _frozen_importlib._DeadlockError: deadlock detected by _ModuleLock('tensorflow_transform.tf_metadata.metadata_io') at 140070103765456 {noformat} [1] [https://github.com/apache/beam/blob/cba445c8da93d9bdd01b30b2f54e9c3b52a98b7d/sdks/python/apache_beam/internal/pickler.py#L261] [2] [https://github.com/uqfoundation/dill/blob/76e8472502a656f3ab6973cd8375cf7847f33842/dill/_dill.py#L462] [3] [https://github.com/python/cpython/blob/4ffc569b47bef9f95e443f3c56f7e7e32cb440c0/Lib/pickle.py#L1426] > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Attachments: beam8651.py > > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346335=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346335 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:05 Start Date: 19/Nov/19 23:05 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348216261 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -847,16 +993,24 @@ def __init__(self, descriptor, data_channel_factory, counter_factory, runner=beam_fn_api_pb2.StateKey.Runner(key=token)), element_coder_impl)) - _known_urns = {} + _known_urns = {} # type: Dict[str, Tuple[ConstructorFn, Union[Type[message.Message], Type[bytes], None]]] @classmethod - def register_urn(cls, urn, parameter_type): + def register_urn(cls, + urn, # type: str + parameter_type # type: Optional[Type[T]] + ): +# type: (...) -> Callable[[Callable[[BeamTransformFactory, str, beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], operations.Operation]], Callable[[BeamTransformFactory, str, beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], operations.Operation]] Review comment: There are ways to create module-level aliases to break down and hide complexity, but since this uses a bunch of TypeVars to create relationships between the args and return value I had to do it all at once. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346335) Time Spent: 26h 20m (was: 26h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8277) Make docker build quicker
[ https://issues.apache.org/jira/browse/BEAM-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977912#comment-16977912 ] Kyle Weaver commented on BEAM-8277: --- Build times have gotten slower still: 11m 48s on current master. > Make docker build quicker > - > > Key: BEAM-8277 > URL: https://issues.apache.org/jira/browse/BEAM-8277 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > Building the Python SDK harness container takes minutes on my machine. > ``` > ./gradlew :sdks:python:container:buildAll > BUILD SUCCESSFUL in 9m 33s > ``` > Possible lead: "We spend mins pulling cmd/beamctl deps." > [https://github.com/apache/beam/blob/47feeafb21023e2a60ae51737cc4000a2033719c/sdks/python/container/build.gradle#L38] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346334 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:04 Start Date: 19/Nov/19 23:04 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348215834 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -847,16 +993,24 @@ def __init__(self, descriptor, data_channel_factory, counter_factory, runner=beam_fn_api_pb2.StateKey.Runner(key=token)), element_coder_impl)) - _known_urns = {} + _known_urns = {} # type: Dict[str, Tuple[ConstructorFn, Union[Type[message.Message], Type[bytes], None]]] @classmethod - def register_urn(cls, urn, parameter_type): + def register_urn(cls, + urn, # type: str + parameter_type # type: Optional[Type[T]] + ): +# type: (...) -> Callable[[Callable[[BeamTransformFactory, str, beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], operations.Operation]], Callable[[BeamTransformFactory, str, beam_runner_api_pb2.PTransform, T, Dict[str, List[operations.Operation]]], operations.Operation]] Review comment: Not until we get to python3 annotations :( This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346334) Time Spent: 26h 10m (was: 26h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346332 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:02 Start Date: 19/Nov/19 23:02 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348215242 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -405,6 +417,7 @@ class _RestrictionDoFnParam(_DoFnParam): """Restriction Provider DoFn parameter.""" def __init__(self, restriction_provider): +# type: (RestrictionProvider) -> None if not isinstance(restriction_provider, RestrictionProvider): Review comment: agreed! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346332) Time Spent: 26h (was: 25h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 26h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346331 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 23:01 Start Date: 19/Nov/19 23:01 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348215088 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -878,6 +1033,7 @@ def get_coder(self, coder_id): json.loads(coder_proto.spec.payload.decode('utf-8'))) def get_windowed_coder(self, pcoll_id): +# type: (str) -> WindowedValueCoder Review comment: I'm confused. This method is definitely returning `WindowedValueCoder`: ```python def get_windowed_coder(self, pcoll_id): # type: (str) -> WindowedValueCoder coder = self.get_coder(self.descriptor.pcollections[pcoll_id].coder_id) # TODO(robertwb): Remove this condition once all runners are consistent. if not isinstance(coder, WindowedValueCoder): windowing_strategy = self.descriptor.windowing_strategies[ self.descriptor.pcollections[pcoll_id].windowing_strategy_id] return WindowedValueCoder( coder, self.get_coder(windowing_strategy.window_coder_id)) else: return coder ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346331) Time Spent: 25h 50m (was: 25h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346329 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:59 Start Date: 19/Nov/19 22:59 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348214287 ## File path: sdks/python/apache_beam/runners/runner.py ## @@ -133,7 +152,10 @@ def run_async(self, transform, options=None): transform(PBegin(p)) return p.run() - def run_pipeline(self, pipeline, options): + def run_pipeline(self, + pipeline, # type: Pipeline + options # type: PipelineOptions + ): Review comment: What is the point of this base implementation? Should we just raise `NotImplementedError` here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346329) Time Spent: 25h 40m (was: 25.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346326 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:54 Start Date: 19/Nov/19 22:54 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348212865 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -2320,6 +2359,7 @@ def expand(self, pcoll): return super(WindowInto, self).expand(pcoll) def to_runner_api_parameter(self, context): +# type: (PipelineContext) -> typing.Tuple[str, message.Message] Review comment: Looking at the code I don't see anywhere that `WindowInto.to_runner_api_parameter()` returns `bytes`. This function actually returns `beam_runner_api_pb2.WindowingStrategy` by way of `self.windowing.to_runner_api(context)` (just below this). I also don't see any subclasses of `WindowInto` that override this method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346326) Time Spent: 25.5h (was: 25h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346313 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 19/Nov/19 22:31 Start Date: 19/Nov/19 22:31 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10081: [BEAM-8645] A test case for TimestampCombiner. URL: https://github.com/apache/beam/pull/10081#issuecomment-555747139 would you please point out which PR in the master *Might* resolve the issue? I can follow and trace that part a bit using this test case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346313) Time Spent: 5.5h (was: 5h 20m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346311=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346311 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:30 Start Date: 19/Nov/19 22:30 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348203903 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -73,8 +108,17 @@ class RunnerIOOperation(operations.Operation): """Common baseclass for runner harness IO operations.""" - def __init__(self, name_context, step_name, consumers, counter_factory, - state_sampler, windowed_coder, transform_id, data_channel): + def __init__(self, + name_context, # type: Union[str, common.NameContext] + step_name, + consumers, # type: Mapping[Any, Iterable[operations.Operation]] + counter_factory, + state_sampler, + windowed_coder, # type: coders.WindowedValueCoder + transform_id, # type: str + data_channel # type: data_plane.GrpcClientDataChannel Review comment: done. will be included in my upcoming review commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346311) Time Spent: 25h 20m (was: 25h 10m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25h 20m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346310=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346310 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:29 Start Date: 19/Nov/19 22:29 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348203629 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -328,10 +404,12 @@ class _ConcatIterable(object): Unlike itertools.chain, this allows reiteration. """ def __init__(self, first, second): +# type: (Iterable[Any], Iterable[Any]) -> None Review comment: To solve this well, we'd want to make `_ConcatIterable` a generic. I've been holding off on introducing any generics for now. I'll do this in a followup if that's ok with you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346310) Time Spent: 25h 10m (was: 25h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346308 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:27 Start Date: 19/Nov/19 22:27 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348203062 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -2167,8 +2198,12 @@ def expand(self, pcoll): class Windowing(object): - def __init__(self, windowfn, triggerfn=None, accumulation_mode=None, - timestamp_combiner=None): + def __init__(self, + windowfn, # type: WindowFn + triggerfn=None, # type: typing.Optional[TriggerFn] + accumulation_mode=None, Review comment: done. will be included in my upcoming review commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346308) Time Spent: 24h 50m (was: 24h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 24h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346306=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346306 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:27 Start Date: 19/Nov/19 22:27 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348202914 ## File path: sdks/python/apache_beam/runners/runner.py ## @@ -133,7 +152,10 @@ def run_async(self, transform, options=None): transform(PBegin(p)) return p.run() - def run_pipeline(self, pipeline, options): + def run_pipeline(self, + pipeline, # type: Pipeline + options # type: PipelineOptions + ): Review comment: this appears to return `None`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346306) Time Spent: 24h 40m (was: 24.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 24h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=346309=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346309 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Nov/19 22:27 Start Date: 19/Nov/19 22:27 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9915: [BEAM-7746] Add python type hints (part 1) URL: https://github.com/apache/beam/pull/9915#discussion_r348203078 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -2276,10 +2313,11 @@ def process(self, element, timestamp=DoFn.TimestampParam, yield WindowedValue(element, context.timestamp, new_windows) def __init__(self, - windowfn, - trigger=None, + windowfn, # type: typing.Union[Windowing, WindowFn] + trigger=None, # type: typing.Optional[TriggerFn] accumulation_mode=None, - timestamp_combiner=None): + timestamp_combiner=None Review comment: done. will be included in my upcoming review commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346309) Time Spent: 25h (was: 24h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 25h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8645) TimestampCombiner incorrect in beam python
[ https://issues.apache.org/jira/browse/BEAM-8645?focusedWorklogId=346305=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-346305 ] ASF GitHub Bot logged work on BEAM-8645: Author: ASF GitHub Bot Created on: 19/Nov/19 22:26 Start Date: 19/Nov/19 22:26 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10081: [BEAM-8645] A test case for TimestampCombiner. URL: https://github.com/apache/beam/pull/10081#issuecomment-555745530 Pulled from master, and retried. The EARLIEST case still failed, with error: "apache_beam.runners.direct.executor: WARNING: A task failed with exception: Failed assert: [(('k', 500), Timestamp(7))] not in [(('k', 500), Timestamp(2))] [while running 'assert per window/Match']" This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 346305) Time Spent: 5h 20m (was: 5h 10m) > TimestampCombiner incorrect in beam python > -- > > Key: BEAM-8645 > URL: https://issues.apache.org/jira/browse/BEAM-8645 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ruoyun Huang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > When we have a TimestampValue on combine: > {code:java} > main_stream = (p > | 'main TestStream' >> TestStream() > .add_elements([window.TimestampedValue(('k', 100), 0)]) > .add_elements([window.TimestampedValue(('k', 400), 9)]) > .advance_watermark_to_infinity() > | 'main windowInto' >> beam.WindowInto( > window.FixedWindows(10), > timestamp_combiner=TimestampCombiner.OUTPUT_AT_LATEST) | > 'Combine' >> beam.CombinePerKey(sum)) > The expect timestamp should be: > LATEST: (('k', 500), Timestamp(9)), > EARLIEST: (('k', 500), Timestamp(0)), > END_OF_WINDOW: (('k', 500), Timestamp(10)), > But current py streaming gives following results: > LATEST: (('k', 500), Timestamp(10)), > EARLIEST: (('k', 500), Timestamp(10)), > END_OF_WINDOW: (('k', 500), Timestamp(9.)), > More details and discussions: > https://lists.apache.org/thread.html/d3af1f2f84a2e59a747196039eae77812b78a991f0f293c717e5f4e1@%3Cdev.beam.apache.org%3E > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)