[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=344487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-344487 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 15/Nov/19 18:58 Start Date: 15/Nov/19 18:58 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 344487) Time Spent: 10h 20m (was: 10h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.18.0 > > Time Spent: 10h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment. > We are doing it by checking if the current execution path is with ipython and > if the ipython kernel is connected to a notebook frontend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343810 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 14/Nov/19 22:40 Start Date: 14/Nov/19 22:40 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-554115250 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 343810) Time Spent: 10h 10m (was: 10h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.18.0 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment. > We are doing it by checking if the current execution path is with ipython and > if the ipython kernel is connected to a notebook frontend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343679 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 14/Nov/19 19:13 Start Date: 14/Nov/19 19:13 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-554036059 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 343679) Time Spent: 10h (was: 9h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.18.0 > > Time Spent: 10h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment. > We are doing it by checking if the current execution path is with ipython and > if the ipython kernel is connected to a notebook frontend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343116 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 14/Nov/19 04:28 Start Date: 14/Nov/19 04:28 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-553719079 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 343116) Time Spent: 9h 50m (was: 9h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.18.0 > > Time Spent: 9h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment. > We are doing it by checking if the current execution path is with ipython and > if the ipython kernel is connected to a notebook frontend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341411 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 11/Nov/19 18:20 Start Date: 11/Nov/19 18:20 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r344843896 ## File path: sdks/python/apache_beam/runners/interactive/interactive_environment.py ## @@ -93,17 +94,7 @@ def __init__(self, cache_manager=None): 'install apache-beam[interactive]` to install necessary ' 'dependencies to enable all data visualization features.') -self._is_in_ipython = False -self._is_in_notebook = False -# Check if the runtime is within an interactive environment, i.e., ipython. -try: - from IPython import get_ipython # pylint: disable=import-error - if get_ipython(): -self._is_in_ipython = True -if 'IPKernelApp' in get_ipython().config: - self._is_in_notebook = True -except ImportError: - pass +self._is_in_ipython, self._is_in_notebook = is_interactive() Review comment: Roger, will make it into 2 separate APIs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 341411) Time Spent: 9h 40m (was: 9.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341407 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 11/Nov/19 18:17 Start Date: 11/Nov/19 18:17 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-552554913 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 341407) Time Spent: 9.5h (was: 9h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341402&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341402 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 11/Nov/19 18:14 Start Date: 11/Nov/19 18:14 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r344841478 ## File path: sdks/python/apache_beam/runners/interactive/interactive_environment.py ## @@ -93,17 +94,7 @@ def __init__(self, cache_manager=None): 'install apache-beam[interactive]` to install necessary ' 'dependencies to enable all data visualization features.') -self._is_in_ipython = False -self._is_in_notebook = False -# Check if the runtime is within an interactive environment, i.e., ipython. -try: - from IPython import get_ipython # pylint: disable=import-error - if get_ipython(): -self._is_in_ipython = True -if 'IPKernelApp' in get_ipython().config: - self._is_in_notebook = True -except ImportError: - pass +self._is_in_ipython, self._is_in_notebook = is_interactive() Review comment: Conventionally, `is_xxx` functions return a boolean. Returning a pair will be especially surprising if one writes statements like `if is_interactive()` and the return value is `(False, False)` (which as a non-zero-length tuple evaluates to `True`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 341402) Time Spent: 9h 20m (was: 9h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341401 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 11/Nov/19 18:11 Start Date: 11/Nov/19 18:11 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 341401) Time Spent: 9h 10m (was: 9h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340780 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 08/Nov/19 22:18 Start Date: 08/Nov/19 22:18 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-552011987 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340780) Time Spent: 9h (was: 8h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 9h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340292 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 08/Nov/19 01:59 Start Date: 08/Nov/19 01:59 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-551349679 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340292) Time Spent: 8h 50m (was: 8h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340235 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 23:43 Start Date: 07/Nov/19 23:43 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-551317534 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340235) Time Spent: 8h 40m (was: 8.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340125 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 20:03 Start Date: 07/Nov/19 20:03 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343848058 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: Discussed with David and Sam. Since we also want to track jobs started from notebook even if the user never uses `InteractiveRunner`, checking the environment might just be the only way to do it. By putting the logic into a try-except block as it is, we could avoid introducing `ipython` dependency into `DataflowRunner`. If the `[interactive]` dependency is never installed and current execution_path has never imported `ipython`, the code would just never be executed. I'll move the logic into a standalone utility module and import it in DataflowRunner to do the check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340125) Time Spent: 8.5h (was: 8h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340120 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 19:57 Start Date: 07/Nov/19 19:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343845181 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: Putting this discussion on next Monday's agenda and will remove changes to the API. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340120) Time Spent: 8h 20m (was: 8h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340119 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 19:56 Start Date: 07/Nov/19 19:56 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343844948 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: I'll go with the check environment route and make it a standalone utility module in the interactive package. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340119) Time Spent: 8h 10m (was: 8h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339650 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 00:44 Start Date: 07/Nov/19 00:44 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343401907 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: > Do you think we should put it into a separate PR, Yes. At least let's have a seperate discussion for API changes like this. > or simply not supporting it at all? Maybe not. This could be just a override for the interactive runners run() (e.g. run_with(NewRunner, NewOptions). At least, let's discuss with all stakeholders. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339650) Time Spent: 8h (was: 7h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 8h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339649 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 07/Nov/19 00:41 Start Date: 07/Nov/19 00:41 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r343401313 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: Could we move https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L100:L102 to a common utility function, and each runner if they want could call this without worrying about require additional imports? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339649) Time Spent: 7h 50m (was: 7h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339085 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:22 Start Date: 06/Nov/19 00:22 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clarify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported but not necessarily used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339085) Time Spent: 7h 40m (was: 7.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339084 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:20 Start Date: 06/Nov/19 00:20 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clarify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339084) Time Spent: 7.5h (was: 7h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339083 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:17 Start Date: 06/Nov/19 00:17 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! If we track `interactive` as a property of runner, we cannot implicitly pass along the property from runner to runner. And if we deduce `interactive` from the environment, we'll introduce new dependencies into DataflowRunner. See below comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339083) Time Spent: 7h 20m (was: 7h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339082 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:16 Start Date: 06/Nov/19 00:16 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison, we don't introduce new dependency into DataflowRunner. 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including at least ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339082) Time Spent: 7h 10m (was: 7h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339081 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:14 Start Date: 06/Nov/19 00:14 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. I think we have a little bit trade off here: 1. What we have here: Determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. Doing it with string comparison 2. Deduce if the job is started from a notebook environment. We'll introduce [interactive] dependencies including ipython into DataflowRunner. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339081) Time Spent: 7h (was: 6h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 7h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339080 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:12 Start Date: 06/Nov/19 00:12 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! If we track `interactive` as a property of runner, we cannot implicitly pass along the property from runner to runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339080) Time Spent: 6h 50m (was: 6h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339064 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 06/Nov/19 00:01 Start Date: 06/Nov/19 00:01 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. This will label Dataflow jobs from any pipeline originally bundled with arbitrary runner in any kind of ipython-notebook as long as `interactive_environment` module in `interactive` package has been (transitively) imported. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339064) Time Spent: 6h 40m (was: 6.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339061 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:57 Start Date: 05/Nov/19 23:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342858336 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: I see your point! Yes, I have the [capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131) to check if current interpreted code is in a notebook or not. This branch will need a rebase against master to take those changes. To clartify the process: When a DataflowRunner tries to run a job from a given pipeline, 1. Check if the module `interactive_environment` is imported by checking the `sys.modules` dictionary; 2. Check if `current_env().is_in_notebook`; 3. If yes, label the job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339061) Time Spent: 6.5h (was: 6h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339054 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:48 Start Date: 05/Nov/19 23:48 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855715 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: IIRC, we want to allow the user to switch to `DataflowRunner` using the `p.run()` pattern instead of limiting the user to `Runner().run_pipeline(p, options)`. Do you think we should put it into a separate PR, or simply not supporting it at all? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339054) Time Spent: 6h 20m (was: 6h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339053 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 23:46 Start Date: 05/Nov/19 23:46 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342855200 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: Thanks! I'll go with your suggestion below to deduce if the job is started from a notebook environment instead of determining if the job is started from a pipeline that was originally bundled with an Interactive Runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339053) Time Spent: 6h 10m (was: 6h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339031 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342831745 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, Review comment: Why are we adding runner and options parameters here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339031) Time Spent: 5h 50m (was: 5h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339030 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342830295 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None): # If a transform is applied and the full label is already in the set # then the transform will have to be cloned with a new label. self.applied_labels = set() +# A boolean value indicating whether the pipeline is created in an +# interactive environment such as interactive notebooks. Initialized as +# None. The value is set ad hoc when `pipeline.run()` is invoked. +self.interactive = None Review comment: I believe one more suggestion from the previous PR was to not keepting track of interactivity as part of the Pipeline but making it a property of the runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339030) Time Spent: 5h 40m (was: 5.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339032 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 05/Nov/19 22:32 Start Date: 05/Nov/19 22:32 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r342832459 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,16 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-dataflow-notebook if pipeline is initiated from interactive +# runner. +if pipeline.interactive: Review comment: The change could be limited to: - here detect, whether we are in interactive environment or not. (For example, check whether certain imports are loaded?) Could also be a utility method somewhere, to check whether this in an interactive method. - If yes, add the labels. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339032) Time Spent: 6h (was: 5h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 6h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337933 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 04/Nov/19 03:40 Start Date: 04/Nov/19 03:40 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-549216682 This is pretty much the same PR as before, except it checks the runner via string matching. It LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337933) Time Spent: 5.5h (was: 5h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337642 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 02/Nov/19 01:04 Start Date: 02/Nov/19 01:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-548995354 R: @pabloem could you make a first review pass? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337642) Time Spent: 5h 20m (was: 5h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=336366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336366 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 30/Oct/19 18:32 Start Date: 30/Oct/19 18:32 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-548053063 R: @aaltay R: @pabloem Sorry for the previous rollback. I've removed the dependency of interactive_runner from pipeline, so there shouldn't be import issues now. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 336366) Time Spent: 5h 10m (was: 5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335801 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 29/Oct/19 22:40 Start Date: 29/Oct/19 22:40 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-547660969 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335801) Time Spent: 5h (was: 4h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335684 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 29/Oct/19 18:41 Start Date: 29/Oct/19 18:41 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-547571605 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335684) Time Spent: 4h 50m (was: 4h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335222 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 28/Oct/19 22:07 Start Date: 28/Oct/19 22:07 Worklog Time Spent: 10m Work Description: Ardagan commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-547166274 Precommits fail due to missing docker container. I'm working on it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335222) Time Spent: 4h 40m (was: 4.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335164 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 28/Oct/19 20:36 Start Date: 28/Oct/19 20:36 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-547134650 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335164) Time Spent: 4.5h (was: 4h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335059 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 28/Oct/19 17:32 Start Date: 28/Oct/19 17:32 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-547060373 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335059) Time Spent: 4h 20m (was: 4h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335049 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 28/Oct/19 17:17 Start Date: 28/Oct/19 17:17 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-547053125 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335049) Time Spent: 4h 10m (was: 4h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335039 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 28/Oct/19 17:12 Start Date: 28/Oct/19 17:12 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-547050887 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335039) Time Spent: 4h (was: 3h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334570 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 26/Oct/19 21:08 Start Date: 26/Oct/19 21:08 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-546639550 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334570) Time Spent: 3h 50m (was: 3h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334431 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 23:39 Start Date: 25/Oct/19 23:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-546545161 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334431) Time Spent: 3h 40m (was: 3.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334343 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 19:58 Start Date: 25/Oct/19 19:58 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887 … from Notebook" This reverts commit 1a8391da9222ab8d0493b0007bd60bdbeeb5e275. **This is a cherry pick of PR #9879** **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/b
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334374&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334374 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 21:15 Start Date: 25/Oct/19 21:15 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-546516062 +1. We should not be importing the interactive runner (it's causing problems with tests as well), and interactivity should not be a property of the pipeline, but of the runner (and I'd prefer a design that avoid passing an interactive bit around everywhere). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334374) Time Spent: 3.5h (was: 3h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334368 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 20:55 Start Date: 25/Oct/19 20:55 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#discussion_r339235798 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +400,57 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" - + def run(self, test_runner_api=True, runner=None, options=None, + interactive=None): +"""Runs the pipeline. Returns whatever our runner returns after running. + +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +a ValueError is raised. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +Additionally, an interactive field can be set to override the pipeline's +self.interactive field to mark current pipeline as being initiated from an +interactive environment. +""" +if interactive: + self.interactive = interactive +elif (type(self.runner).__module__ + == 'apache_beam.runners.interactive.interactive_runner' and + type(self.runner).__name__ == 'InteractiveRunner'): Review comment: This is the difference from previous [PR](https://www.google.com/url?q=https://github.com/apache/beam/pull/9854). All runners are using "new-style" classes in Python, the `type(obj).__module__/__name__` should always work. Please let me know if there would be backward incompatible cases. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334368) Time Spent: 3h 20m (was: 3h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334352 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 20:21 Start Date: 25/Oct/19 20:21 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9887: [release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs… URL: https://github.com/apache/beam/pull/9887#issuecomment-546498911 cc: @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334352) Time Spent: 3h 10m (was: 3h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334265 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 18:14 Start Date: 25/Oct/19 18:14 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885 1. Changed the pipeline.run() API to allow a runner and an option parameter so that a pipeline initially bundled w/ an interactive runner can be directly run by other runners from notebook. 2. Implicitly added the necessary source information through user labels when the user does p.run(runner=DataflowRunner(), options=options) or DataflowRunner().run_pipeline(p, options). 3. User '--labels' doesn't support character '.' or '"'. When applying version related label, replace '.' w/ '_'. Avoid enclosing any label with '"'. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Post
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334197 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 15:23 Start Date: 25/Oct/19 15:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9879: Revert "[BEAM-8457] Label Dataflow jobs from Notebook" URL: https://github.com/apache/beam/pull/9879#issuecomment-546398359 @pabloem or @KevinGG could one of you cherry pick this to the release branch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334197) Time Spent: 2h 40m (was: 2.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334195 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 15:23 Start Date: 25/Oct/19 15:23 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9879: Revert "[BEAM-8457] Label Dataflow jobs from Notebook" URL: https://github.com/apache/beam/pull/9879 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 334195) Time Spent: 2.5h (was: 2h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333794 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 00:27 Start Date: 25/Oct/19 00:27 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9879: Revert "[BEAM-8457] Label Dataflow jobs from Notebook" URL: https://github.com/apache/beam/pull/9879#issuecomment-546152188 R: @KevinGG cc: @charlesccychen cc: @Ardagan -- This also need to goto the release branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333794) Time Spent: 2h 20m (was: 2h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333793 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 00:26 Start Date: 25/Oct/19 00:26 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9879: Revert "[BEAM-8457] Label Dataflow jobs from Notebook" URL: https://github.com/apache/beam/pull/9879 Reverts apache/beam#9854 pipeline code references interactive_runner without guarding for optional imports. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333793) Time Spent: 2h 10m (was: 2h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333792 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 00:25 Start Date: 25/Oct/19 00:25 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-546151918 Let's roll this back. I do not think we should be importing interactive in pipeline.py. And also my understanding is that runner will keep track of the interactivity not the pipeline. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333792) Time Spent: 2h (was: 1h 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333790 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 25/Oct/19 00:20 Start Date: 25/Oct/19 00:20 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-546150981 @KevinGG @pabloem I believe this PR may be causing issues in environments without IPython because it unconditionally imports the interactive runner (and thereby IPython). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333790) Time Spent: 1h 50m (was: 1h 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332850 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 23/Oct/19 20:25 Start Date: 23/Oct/19 20:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332850) Time Spent: 1.5h (was: 1h 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332851 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 23/Oct/19 20:25 Start Date: 23/Oct/19 20:25 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-545620626 Thanks Ning This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332851) Time Spent: 1h 40m (was: 1.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332750 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 23/Oct/19 18:06 Start Date: 23/Oct/19 18:06 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r338199293 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +405,46 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: + runner_in_use = runner + options_in_use = options +elif not runner and options: + raise ValueError('Parameter runner is not given when parameter options ' + 'is given.') +elif not options and runner: + raise ValueError('Parameter options is not given when parameter runner ' + 'is given.') # When possible, invoke a round trip through the runner API. if test_runner_api and self._verify_runner_api_compatible(): return Pipeline.from_runner_api( self.to_runner_api(use_fake_coders=True), - self.runner, - self._options).run(False) + runner_in_use, + options_in_use, + interactive=self.interactive).run(False) Review comment: No, it's not necessary. On a second thought, I can just move the logic to determine `interactive` ad hoc in `run()` and put the `interactive` field as a parameter in the `run()` method. Then I don't even need to change the Pipeline constructor. Also, I've added this to the interactive_runner for below use case: `InteractiveRunner(underlying_runner=DataflowRunner()).run_pipeline(Pipeline(DataflowRunner()))` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332750) Time Spent: 1h 20m (was: 1h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Blocker > Fix For: 2.17.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332355 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 23/Oct/19 01:15 Start Date: 23/Oct/19 01:15 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337813433 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +405,46 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: + runner_in_use = runner + options_in_use = options +elif not runner and options: + raise ValueError('Parameter runner is not given when parameter options ' + 'is given.') +elif not options and runner: + raise ValueError('Parameter options is not given when parameter runner ' + 'is given.') # When possible, invoke a round trip through the runner API. if test_runner_api and self._verify_runner_api_compatible(): return Pipeline.from_runner_api( self.to_runner_api(use_fake_coders=True), - self.runner, - self._options).run(False) + runner_in_use, + options_in_use, + interactive=self.interactive).run(False) Review comment: Did you find that this was necessary? I don't think we should change the signature of the `from_runner_api` call. The pipeline protobuf should contain all the necessary information... Though I'd defer to @robertwb on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332355) Time Spent: 1h 10m (was: 1h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332279 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 23:05 Start Date: 22/Oct/19 23:05 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337787464 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,15 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-notebook if pipeline is initiated from interactive runner. +from apache_beam.runners.interactive import interactive_runner +if isinstance(pipeline.runner, interactive_runner.InteractiveRunner): Review comment: I've missed the path where a new Pipeline is created and `run()` is invoked again. Yes, all of these would be possible. I've added an `interactive` parameter at the constructor level for `Pipeline` using default value `None`. `run()` and `from_runner_api()` will pass the `None` or `bool` value down no matter how the user chains the runners. I'm not very confident with the naming but the change should be backward compatible for Beam. Currently, I'm running into a problem when testing. Once I set `labels`, Dataflow job will fail immediately and throw `Error processing pipeline.` error. There will be no job graph, no worker started, no logs. Looks like when there is user label in the job request, Dataflow cannot convert the work item into internal representation. I'll do some investigation and figure out why. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332279) Time Spent: 1h (was: 50m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332274 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 22:58 Start Date: 22/Oct/19 22:58 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337785938 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +396,40 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. + +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: Review comment: You're right! This will surprise the user. I've changed it to throw error if either is not provided instead of ignoring the input by default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332274) Time Spent: 50m (was: 40m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332231 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 21:23 Start Date: 22/Oct/19 21:23 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337757120 ## File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py ## @@ -360,6 +360,15 @@ def visit_transform(self, transform_node): def run_pipeline(self, pipeline, options): """Remotely executes entire pipeline or parts reachable from node.""" +# Label goog-notebook if pipeline is initiated from interactive runner. +from apache_beam.runners.interactive import interactive_runner +if isinstance(pipeline.runner, interactive_runner.InteractiveRunner): Review comment: This seems fine - but what if we go with the `runner_in_use` codepath? Would users do: `p.run(runner=InteractiveRunner(DataflowRunner()), options=...)`? Or would users create a pipeline with InteractiveRunner and then do `p.run(runner=DataflowRunner()...`? Is it poissible for users to do `p = beam.Pipeline()`, and then do `InteractiveRunner().run_pipeline(p)`/`InteractiveRunner(DataflowRunner()).run_pipeline(p)`? IIUC users would have to pass the interactive runner in `p = beam.Pipeline()` to activate the interactive mode, right? InteractiveRunner is not automatically selected? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332231) Time Spent: 40m (was: 0.5h) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332230 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 21:23 Start Date: 22/Oct/19 21:23 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#discussion_r337751395 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -396,28 +396,40 @@ def replace_all(self, replacements): for override in replacements: self._check_replacement(override) - def run(self, test_runner_api=True): -"""Runs the pipeline. Returns whatever our runner returns after running.""" + def run(self, test_runner_api=True, runner=None, options=None): +"""Runs the pipeline. Returns whatever our runner returns after running. + +If another runner instance and options are provided, that runner will +execute the pipeline with the given options. If either of them is not set, +the default runner will run the pipeline with the original options +assigned to the pipeline. The usage is similar to directly invoking +`runner.run_pipeline(pipeline, options)`. +""" +runner_in_use = self.runner +options_in_use = self._options +if runner and options: Review comment: What if either runner or options are not provided? Should that throw an error? Currently, if only one is provided, it'll be ignored - and that would be quite surprising for users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332230) Time Spent: 0.5h (was: 20m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332213 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 20:45 Start Date: 22/Oct/19 20:45 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854 1. Changed the pipeline.run() API to allow a runner and an option parameter so that a pipeline initially bundled w/ an interactive runner can be directly run by other runners from notebook. 2. Implicitly added the necessary source information through user labels when the user does p.run(runner=DataflowRunner(), options=options) or DataflowRunner().run_pipeline(p, options). **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/i
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332214 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 22/Oct/19 20:45 Start Date: 22/Oct/19 20:45 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #9854: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9854#issuecomment-545146836 R: @pabloem PTAL, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332214) Time Spent: 20m (was: 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)