[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=344487&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-344487
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 15/Nov/19 18:58
Start Date: 15/Nov/19 18:58
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #9887: 
[release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label 
Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 344487)
Time Spent: 10h 20m  (was: 10h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment.
> We are doing it by checking if the current execution path is with ipython and 
> if the ipython kernel is connected to a notebook frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343810
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 14/Nov/19 22:40
Start Date: 14/Nov/19 22:40
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-554115250
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 343810)
Time Spent: 10h 10m  (was: 10h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment.
> We are doing it by checking if the current execution path is with ipython and 
> if the ipython kernel is connected to a notebook frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343679
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 14/Nov/19 19:13
Start Date: 14/Nov/19 19:13
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-554036059
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 343679)
Time Spent: 10h  (was: 9h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment.
> We are doing it by checking if the current execution path is with ipython and 
> if the ipython kernel is connected to a notebook frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=343116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-343116
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 14/Nov/19 04:28
Start Date: 14/Nov/19 04:28
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-553719079
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 343116)
Time Spent: 9h 50m  (was: 9h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment.
> We are doing it by checking if the current execution path is with ipython and 
> if the ipython kernel is connected to a notebook frontend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341411
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 11/Nov/19 18:20
Start Date: 11/Nov/19 18:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r344843896
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -93,17 +94,7 @@ def __init__(self, cache_manager=None):
   'install apache-beam[interactive]` to install necessary '
   'dependencies to enable all data visualization 
features.')
 
-self._is_in_ipython = False
-self._is_in_notebook = False
-# Check if the runtime is within an interactive environment, i.e., ipython.
-try:
-  from IPython import get_ipython  # pylint: disable=import-error
-  if get_ipython():
-self._is_in_ipython = True
-if 'IPKernelApp' in get_ipython().config:
-  self._is_in_notebook = True
-except ImportError:
-  pass
+self._is_in_ipython, self._is_in_notebook = is_interactive()
 
 Review comment:
   Roger, will make it into 2 separate APIs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341411)
Time Spent: 9h 40m  (was: 9.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341407
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 11/Nov/19 18:17
Start Date: 11/Nov/19 18:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-552554913
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341407)
Time Spent: 9.5h  (was: 9h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341402&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341402
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 11/Nov/19 18:14
Start Date: 11/Nov/19 18:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r344841478
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -93,17 +94,7 @@ def __init__(self, cache_manager=None):
   'install apache-beam[interactive]` to install necessary '
   'dependencies to enable all data visualization 
features.')
 
-self._is_in_ipython = False
-self._is_in_notebook = False
-# Check if the runtime is within an interactive environment, i.e., ipython.
-try:
-  from IPython import get_ipython  # pylint: disable=import-error
-  if get_ipython():
-self._is_in_ipython = True
-if 'IPKernelApp' in get_ipython().config:
-  self._is_in_notebook = True
-except ImportError:
-  pass
+self._is_in_ipython, self._is_in_notebook = is_interactive()
 
 Review comment:
   Conventionally, `is_xxx` functions return a boolean. Returning a pair will 
be especially surprising if one writes statements like `if is_interactive()` 
and the return value is `(False, False)` (which as a non-zero-length tuple 
evaluates to `True`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341402)
Time Spent: 9h 20m  (was: 9h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=341401&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-341401
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 11/Nov/19 18:11
Start Date: 11/Nov/19 18:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 341401)
Time Spent: 9h 10m  (was: 9h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340780
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 08/Nov/19 22:18
Start Date: 08/Nov/19 22:18
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-552011987
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340780)
Time Spent: 9h  (was: 8h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340292
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 08/Nov/19 01:59
Start Date: 08/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-551349679
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340292)
Time Spent: 8h 50m  (was: 8h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340235
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 23:43
Start Date: 07/Nov/19 23:43
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-551317534
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340235)
Time Spent: 8h 40m  (was: 8.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340125
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 20:03
Start Date: 07/Nov/19 20:03
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343848058
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   Discussed with David and Sam. Since we also want to track jobs started from 
notebook even if the user never uses `InteractiveRunner`, checking the 
environment might just be the only way to do it.
   By putting the logic into a try-except block as it is, we could avoid 
introducing `ipython` dependency into `DataflowRunner`. If the `[interactive]` 
dependency is never installed and current execution_path has never imported 
`ipython`, the code would just never be executed.
   
   I'll move the logic into a standalone utility module and import it in 
DataflowRunner to do the check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340125)
Time Spent: 8.5h  (was: 8h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340120&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340120
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 19:57
Start Date: 07/Nov/19 19:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343845181
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   Putting this discussion on next Monday's agenda and will remove changes to 
the API.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340120)
Time Spent: 8h 20m  (was: 8h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=340119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340119
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 19:56
Start Date: 07/Nov/19 19:56
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343844948
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   I'll go with the check environment route and make it a standalone utility 
module in the interactive package.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340119)
Time Spent: 8h 10m  (was: 8h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339650&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339650
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:44
Start Date: 07/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343401907
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   > Do you think we should put it into a separate PR,
   
   Yes. At least let's have a seperate discussion for API changes like this.
   
   >  or simply not supporting it at all?
   
   Maybe not. This could be just a override for the interactive runners run() 
(e.g. run_with(NewRunner, NewOptions). At least, let's discuss with all 
stakeholders.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339650)
Time Spent: 8h  (was: 7h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339649
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 07/Nov/19 00:41
Start Date: 07/Nov/19 00:41
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r343401313
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   Could we move 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L100:L102
 to a common utility function, and each runner if they want could call this 
without worrying about require additional imports?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339649)
Time Spent: 7h 50m  (was: 7h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339085
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:22
Start Date: 06/Nov/19 00:22
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clarify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported but not necessarily used.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339085)
Time Spent: 7h 40m  (was: 7.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339084
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:20
Start Date: 06/Nov/19 00:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clarify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339084)
Time Spent: 7.5h  (was: 7h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339083
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:17
Start Date: 06/Nov/19 00:17
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks!
   If we track `interactive` as a property of runner, we cannot implicitly pass 
along the property from runner to runner.
   
   And if we deduce `interactive` from the environment, we'll introduce new 
dependencies into DataflowRunner. See below comment.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339083)
Time Spent: 7h 20m  (was: 7h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339082
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:16
Start Date: 06/Nov/19 00:16
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison, we don't introduce new dependency into 
DataflowRunner.
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including at least ipython into 
DataflowRunner.
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339082)
Time Spent: 7h 10m  (was: 7h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339081
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:14
Start Date: 06/Nov/19 00:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   I think we have a little bit trade off here:
   1.  What we have here: Determining if the job is started from a pipeline 
that was originally bundled with an Interactive Runner.
   Doing it with string comparison
   2. Deduce if the job is started from a notebook environment.
   We'll introduce [interactive] dependencies including ipython into 
DataflowRunner.
   
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339081)
Time Spent: 7h  (was: 6h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339080
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:12
Start Date: 06/Nov/19 00:12
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks!
   If we track `interactive` as a property of runner, we cannot implicitly pass 
along the property from runner to runner.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339080)
Time Spent: 6h 50m  (was: 6h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339064
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 06/Nov/19 00:01
Start Date: 06/Nov/19 00:01
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   This will label Dataflow jobs from any pipeline originally bundled with 
arbitrary runner in any kind of ipython-notebook as long as 
`interactive_environment` module in `interactive` package has been 
(transitively) imported.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339064)
Time Spent: 6h 40m  (was: 6.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339061
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:57
Start Date: 05/Nov/19 23:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342858336
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   I see your point! Yes, I have the 
[capability](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/interactive_environment.py#L131)
 to check if current interpreted code is in a notebook or not. This branch will 
need a rebase against master to take those changes.
   
   To clartify the process:
   When a DataflowRunner tries to run a job from a given pipeline,
   1. Check if the module `interactive_environment` is imported by checking the 
`sys.modules` dictionary;
   2. Check if `current_env().is_in_notebook`;
   3. If yes, label the job. 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339061)
Time Spent: 6.5h  (was: 6h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339054
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:48
Start Date: 05/Nov/19 23:48
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855715
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   IIRC, we want to allow the user to switch to `DataflowRunner` using the 
`p.run()` pattern instead of limiting the user to `Runner().run_pipeline(p, 
options)`.
   
   Do you think we should put it into a separate PR, or simply not supporting 
it at all?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339054)
Time Spent: 6h 20m  (was: 6h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339053
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 23:46
Start Date: 05/Nov/19 23:46
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342855200
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   Thanks! I'll go with your suggestion below to deduce if the job is started 
from a notebook environment instead of determining if the job is started from a 
pipeline that was originally bundled with an Interactive Runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339053)
Time Spent: 6h 10m  (was: 6h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339031
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342831745
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
 
 Review comment:
   Why are we adding runner and options parameters here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339031)
Time Spent: 5h 50m  (was: 5h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339030
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342830295
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -172,6 +172,10 @@ def __init__(self, runner=None, options=None, argv=None):
 # If a transform is applied and the full label is already in the set
 # then the transform will have to be cloned with a new label.
 self.applied_labels = set()
+# A boolean value indicating whether the pipeline is created in an
+# interactive environment such as interactive notebooks. Initialized as
+# None. The value is set ad hoc when `pipeline.run()` is invoked.
+self.interactive = None
 
 Review comment:
   I believe one more suggestion from the previous PR was to not keepting track 
of interactivity as part of the Pipeline but making it a property of the runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339030)
Time Spent: 5h 40m  (was: 5.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=339032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339032
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 05/Nov/19 22:32
Start Date: 05/Nov/19 22:32
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r342832459
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,16 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-dataflow-notebook if pipeline is initiated from interactive
+# runner.
+if pipeline.interactive:
 
 Review comment:
   The change could be limited to:
   - here detect, whether we are in interactive environment or not. (For 
example, check whether certain imports are loaded?) Could also be a utility 
method somewhere, to check whether this in an interactive method.
   - If yes, add the labels.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339032)
Time Spent: 6h  (was: 5h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337933
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 04/Nov/19 03:40
Start Date: 04/Nov/19 03:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-549216682
 
 
   This is pretty much the same PR as before, except it checks the runner via 
string matching. It LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337933)
Time Spent: 5.5h  (was: 5h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337642
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:04
Start Date: 02/Nov/19 01:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-548995354
 
 
   R: @pabloem could you make a first review pass?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337642)
Time Spent: 5h 20m  (was: 5h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=336366&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-336366
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 30/Oct/19 18:32
Start Date: 30/Oct/19 18:32
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-548053063
 
 
   R: @aaltay 
   R: @pabloem 
   Sorry for the previous rollback. I've removed the dependency of 
interactive_runner from pipeline, so there shouldn't be import issues now. 
Thanks!
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 336366)
Time Spent: 5h 10m  (was: 5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335801
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 29/Oct/19 22:40
Start Date: 29/Oct/19 22:40
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-547660969
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335801)
Time Spent: 5h  (was: 4h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335684
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 29/Oct/19 18:41
Start Date: 29/Oct/19 18:41
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-547571605
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335684)
Time Spent: 4h 50m  (was: 4h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335222
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 28/Oct/19 22:07
Start Date: 28/Oct/19 22:07
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-547166274
 
 
   Precommits fail due to missing docker container. I'm working on it now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335222)
Time Spent: 4h 40m  (was: 4.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335164
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 28/Oct/19 20:36
Start Date: 28/Oct/19 20:36
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-547134650
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335164)
Time Spent: 4.5h  (was: 4h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335059
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 28/Oct/19 17:32
Start Date: 28/Oct/19 17:32
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-547060373
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335059)
Time Spent: 4h 20m  (was: 4h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335049
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 28/Oct/19 17:17
Start Date: 28/Oct/19 17:17
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-547053125
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335049)
Time Spent: 4h 10m  (was: 4h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=335039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335039
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 28/Oct/19 17:12
Start Date: 28/Oct/19 17:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-547050887
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335039)
Time Spent: 4h  (was: 3h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334570
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 26/Oct/19 21:08
Start Date: 26/Oct/19 21:08
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-546639550
 
 
   Run Python PreCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334570)
Time Spent: 3h 50m  (was: 3h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334431
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:39
Start Date: 25/Oct/19 23:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-546545161
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334431)
Time Spent: 3h 40m  (was: 3.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334343&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334343
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 19:58
Start Date: 25/Oct/19 19:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9887: 
[release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label 
Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887
 
 
   … from Notebook"
   
   This reverts commit 1a8391da9222ab8d0493b0007bd60bdbeeb5e275.
   
   **This is a cherry pick of PR #9879**
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/b

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334374&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334374
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:15
Start Date: 25/Oct/19 21:15
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-546516062
 
 
   +1. We should not be importing the interactive runner (it's causing problems 
with tests as well), and interactivity should not be a property of the 
pipeline, but of the runner (and I'd prefer a design that avoid passing an 
interactive bit around everywhere). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334374)
Time Spent: 3.5h  (was: 3h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334368
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:55
Start Date: 25/Oct/19 20:55
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r339235798
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
+  interactive=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+a ValueError is raised. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+Additionally, an interactive field can be set to override the pipeline's
+self.interactive field to mark current pipeline as being initiated from an
+interactive environment.
+"""
+if interactive:
+  self.interactive = interactive
+elif (type(self.runner).__module__
+  == 'apache_beam.runners.interactive.interactive_runner' and
+  type(self.runner).__name__ == 'InteractiveRunner'):
 
 Review comment:
   This is the difference from previous 
[PR](https://www.google.com/url?q=https://github.com/apache/beam/pull/9854). 
   All runners are using  "new-style" classes in Python, the 
`type(obj).__module__/__name__` should always work. Please let me know if there 
would be backward incompatible cases. 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334368)
Time Spent: 3h 20m  (was: 3h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334352
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:21
Start Date: 25/Oct/19 20:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-546498911
 
 
   cc: @Ardagan 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334352)
Time Spent: 3h 10m  (was: 3h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334265&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334265
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:14
Start Date: 25/Oct/19 18:14
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885
 
 
   1. Changed the pipeline.run() API to allow a runner and an option
   parameter so that a pipeline initially bundled w/ an interactive runner
   can be directly run by other runners from notebook.
   2. Implicitly added the necessary source information through user labels
   when the user does p.run(runner=DataflowRunner(), options=options) or
   DataflowRunner().run_pipeline(p, options).
   3. User '--labels' doesn't support character '.' or '"'. When applying
   version related label, replace '.' w/ '_'. Avoid enclosing any label
   with '"'.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Post

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334197
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 15:23
Start Date: 25/Oct/19 15:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9879: Revert "[BEAM-8457] 
Label Dataflow jobs from Notebook"
URL: https://github.com/apache/beam/pull/9879#issuecomment-546398359
 
 
   @pabloem or @KevinGG could one of you cherry pick this to the release branch?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334197)
Time Spent: 2h 40m  (was: 2.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334195
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 15:23
Start Date: 25/Oct/19 15:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9879: Revert 
"[BEAM-8457] Label Dataflow jobs from Notebook"
URL: https://github.com/apache/beam/pull/9879
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334195)
Time Spent: 2.5h  (was: 2h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333794
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 00:27
Start Date: 25/Oct/19 00:27
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9879: Revert "[BEAM-8457] 
Label Dataflow jobs from Notebook"
URL: https://github.com/apache/beam/pull/9879#issuecomment-546152188
 
 
   R: @KevinGG 
   cc: @charlesccychen 
   cc: @Ardagan -- This also need to goto the release branch.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333794)
Time Spent: 2h 20m  (was: 2h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333793&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333793
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 00:26
Start Date: 25/Oct/19 00:26
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9879: Revert 
"[BEAM-8457] Label Dataflow jobs from Notebook"
URL: https://github.com/apache/beam/pull/9879
 
 
   Reverts apache/beam#9854
   
   pipeline code references interactive_runner without guarding for optional 
imports.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333793)
Time Spent: 2h 10m  (was: 2h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333792
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 00:25
Start Date: 25/Oct/19 00:25
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-546151918
 
 
   Let's roll this back. I do not think we should be importing interactive in 
pipeline.py. And also my understanding is that runner will keep track of the 
interactivity not the pipeline.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333792)
Time Spent: 2h  (was: 1h 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=333790&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333790
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 00:20
Start Date: 25/Oct/19 00:20
Worklog Time Spent: 10m 
  Work Description: charlesccychen commented on issue #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-546150981
 
 
   @KevinGG @pabloem  I believe this PR may be causing issues in environments 
without IPython because it unconditionally imports the interactive runner (and 
thereby IPython).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333790)
Time Spent: 1h 50m  (was: 1h 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332850
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 23/Oct/19 20:25
Start Date: 23/Oct/19 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332850)
Time Spent: 1.5h  (was: 1h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332851
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 23/Oct/19 20:25
Start Date: 23/Oct/19 20:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-545620626
 
 
   Thanks Ning
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332851)
Time Spent: 1h 40m  (was: 1.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332750
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 23/Oct/19 18:06
Start Date: 23/Oct/19 18:06
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r338199293
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +405,46 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
 
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
+  runner_in_use = runner
+  options_in_use = options
+elif not runner and options:
+  raise ValueError('Parameter runner is not given when parameter options '
+   'is given.')
+elif not options and runner:
+  raise ValueError('Parameter options is not given when parameter runner '
+   'is given.')
 # When possible, invoke a round trip through the runner API.
 if test_runner_api and self._verify_runner_api_compatible():
   return Pipeline.from_runner_api(
   self.to_runner_api(use_fake_coders=True),
-  self.runner,
-  self._options).run(False)
+  runner_in_use,
+  options_in_use,
+  interactive=self.interactive).run(False)
 
 Review comment:
   No, it's not necessary. On a second thought, I can just move the logic to 
determine `interactive` ad hoc in `run()` and put the `interactive` field as a 
parameter in the `run()` method. Then I don't even need to change the Pipeline 
constructor.
   
   Also, I've added this to the interactive_runner for below use case:
   
`InteractiveRunner(underlying_runner=DataflowRunner()).run_pipeline(Pipeline(DataflowRunner()))`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332750)
Time Spent: 1h 20m  (was: 1h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Blocker
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332355&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332355
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 23/Oct/19 01:15
Start Date: 23/Oct/19 01:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337813433
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +405,46 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
 
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
+  runner_in_use = runner
+  options_in_use = options
+elif not runner and options:
+  raise ValueError('Parameter runner is not given when parameter options '
+   'is given.')
+elif not options and runner:
+  raise ValueError('Parameter options is not given when parameter runner '
+   'is given.')
 # When possible, invoke a round trip through the runner API.
 if test_runner_api and self._verify_runner_api_compatible():
   return Pipeline.from_runner_api(
   self.to_runner_api(use_fake_coders=True),
-  self.runner,
-  self._options).run(False)
+  runner_in_use,
+  options_in_use,
+  interactive=self.interactive).run(False)
 
 Review comment:
   Did you find that this was necessary? I don't think we should change the 
signature of the `from_runner_api` call. The pipeline protobuf should contain 
all the necessary information... Though I'd defer to @robertwb on this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332355)
Time Spent: 1h 10m  (was: 1h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332279&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332279
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 23:05
Start Date: 22/Oct/19 23:05
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337787464
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,15 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-notebook if pipeline is initiated from interactive runner.
+from apache_beam.runners.interactive import interactive_runner
+if isinstance(pipeline.runner, interactive_runner.InteractiveRunner):
 
 Review comment:
   I've missed the path where a new Pipeline is created and `run()` is invoked 
again.
   Yes, all of these would be possible.
   I've added an `interactive` parameter at the constructor level for 
`Pipeline` using default value `None`. `run()` and `from_runner_api()` will 
pass the `None` or `bool` value down no matter how the user chains the runners. 
I'm not very confident with the naming but the change should be backward 
compatible for Beam.
   
   Currently, I'm running into a problem when testing. Once I set `labels`, 
Dataflow job will fail immediately and throw `Error processing pipeline.` 
error. There will be no job graph, no worker started, no logs. Looks like when 
there is user label in the job request, Dataflow cannot convert the work item 
into internal representation.
   
   I'll do some investigation and figure out why.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332279)
Time Spent: 1h  (was: 50m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332274
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 22:58
Start Date: 22/Oct/19 22:58
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337785938
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +396,40 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
 
 Review comment:
   You're right! This will surprise the user. I've changed it to throw error if 
either is not provided instead of ignoring the input by default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332274)
Time Spent: 50m  (was: 40m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332231&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332231
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:23
Start Date: 22/Oct/19 21:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337757120
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
 ##
 @@ -360,6 +360,15 @@ def visit_transform(self, transform_node):
 
   def run_pipeline(self, pipeline, options):
 """Remotely executes entire pipeline or parts reachable from node."""
+# Label goog-notebook if pipeline is initiated from interactive runner.
+from apache_beam.runners.interactive import interactive_runner
+if isinstance(pipeline.runner, interactive_runner.InteractiveRunner):
 
 Review comment:
   This seems fine - but what if we go with the `runner_in_use` codepath? Would 
users do: `p.run(runner=InteractiveRunner(DataflowRunner()), options=...)`? Or 
would users create a pipeline with InteractiveRunner and then do 
`p.run(runner=DataflowRunner()...`? Is it poissible for users to do `p = 
beam.Pipeline()`, and then do 
`InteractiveRunner().run_pipeline(p)`/`InteractiveRunner(DataflowRunner()).run_pipeline(p)`?
   
   IIUC users would have to pass the interactive runner in `p = 
beam.Pipeline()` to activate the interactive mode, right? InteractiveRunner is 
not automatically selected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332231)
Time Spent: 40m  (was: 0.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332230
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 21:23
Start Date: 22/Oct/19 21:23
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#discussion_r337751395
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +396,40 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
+  def run(self, test_runner_api=True, runner=None, options=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+the default runner will run the pipeline with the original options
+assigned to the pipeline. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+"""
+runner_in_use = self.runner
+options_in_use = self._options
+if runner and options:
 
 Review comment:
   What if either runner or options are not provided? Should that throw an 
error? Currently, if only one is provided, it'll be ignored - and that would be 
quite surprising for users.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332230)
Time Spent: 0.5h  (was: 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332213&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332213
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:45
Start Date: 22/Oct/19 20:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9854: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854
 
 
   1. Changed the pipeline.run() API to allow a runner and an option
   parameter so that a pipeline initially bundled w/ an interactive runner
   can be directly run by other runners from notebook.
   2. Implicitly added the necessary source information through user labels
   when the user does p.run(runner=DataflowRunner(), options=options) or
   DataflowRunner().run_pipeline(p, options).
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/i

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=332214&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332214
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 22/Oct/19 20:45
Start Date: 22/Oct/19 20:45
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-545146836
 
 
   R: @pabloem 
   PTAL, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332214)
Time Spent: 20m  (was: 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)