[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=157770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-157770
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 23/Oct/18 19:05
Start Date: 23/Oct/18 19:05
Worklog Time Spent: 10m 
  Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] 
Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-432365710
 
 
   > How does this interact with installing the packages in boot.go. Would not 
this 
(https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L104) 
fail?
   
   A quick update after tracing the code. @aaltay  Got the issue pinned with 
hints from @herohde.  
   
   **Root Cause.** When making testing runs, I didn't specify 
worker_harness_container_image, assuming the most recent code version would be 
used.  However, the boot.go currently used in prod, lags behind by still using 
the internal one in google3 (third_party/.../python_fnapi/boot.go).   That is 
the key thing missing in my testing run.   
   
   That being said, the external python boot.go does not need further change 
either because it lists all the depended files explicitly, thus is already 
error-prone when an expected file exists. 
   
   **What next.** Multiple choices regarding how to align this effort with the 
worker image migration, as well as some other integration testing migrations.  
Let me dive into this a bit more and get back soon. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 157770)
Time Spent: 5h 50m  (was: 5h 40m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=156472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-156472
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 19/Oct/18 20:33
Start Date: 19/Oct/18 20:33
Worklog Time Spent: 10m 
  Work Description: aaltay closed pull request #6747: [BEAM-5637]Improve 
docs on worker jar option and add verification.
URL: https://github.com/apache/beam/pull/6747
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index db7e7087ffe..4b4fbf640d2 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -554,7 +554,9 @@ def _add_argparse_args(cls, parser):
 '--dataflow_worker_jar',
 dest='dataflow_worker_jar',
 type=str,
-help='Dataflow worker jar.'
+help='Dataflow worker jar file. If specified, the jar file is staged '
+ 'in GCS, then gets loaded by workers. End users usually '
+ 'should not use this feature.'
 )
 
   def validate(self, validator):
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 09e4f7d58da..ecaeda07c46 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -351,7 +351,7 @@ def run_pipeline(self, pipeline):
 setup_options.beam_plugins = plugins
 
 # Elevate "min_cpu_platform" to pipeline option, but using the existing
-# experiment
+# experiment.
 debug_options = pipeline._options.view_as(DebugOptions)
 worker_options = pipeline._options.view_as(WorkerOptions)
 if worker_options.min_cpu_platform:
@@ -384,6 +384,11 @@ def run_pipeline(self, pipeline):
 
 dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None)
 if dataflow_worker_jar is not None:
+  if not apiclient._use_fnapi(pipeline._options):
+logging.fatal(
+'Typical end users should not use this worker jar feature. '
+'It can only be used when fnapi is enabled.')
+
   experiments = ["use_staged_dataflow_worker_jar"]
   if debug_options.experiments is not None:
 experiments = list(set(experiments + debug_options.experiments))


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 156472)
Time Spent: 5.5h  (was: 5h 20m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=156374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-156374
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 19/Oct/18 15:40
Start Date: 19/Oct/18 15:40
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6747: [DRAFT] 
[BEAM-5637]Improve docs on worker jar option and add verification.
URL: https://github.com/apache/beam/pull/6747#issuecomment-431406456
 
 
   Please double check the accuracy.  Thanks.
   
   R: @aaltay 
   C: @herohde 
   
   Also, the other issue Ahmet brought up in original PR/6680, I will need to 
trace the go code a little to understand its behavior, whatever fix we decided 
to do, will be in a separate PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 156374)
Time Spent: 5h 20m  (was: 5h 10m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155183
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 22:44
Start Date: 16/Oct/18 22:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6680: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#discussion_r225731222
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser):
 type=str,
 help='GCE minimum CPU platform. Default is determined by GCP.'
 )
+parser.add_argument(
 
 Review comment:
   In light of the discussion here on the dev@ list related to runner options 
(https://lists.apache.org/thread.html/78fe33dc41b04886f5355d66d50359265bfa2985580bb70f79c53545@%3Cdev.beam.apache.org%3E).
 Would it be better to expose this as a runner option?
   
   @robertwb 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155183)
Time Spent: 4h 50m  (was: 4h 40m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155184
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 22:44
Start Date: 16/Oct/18 22:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on a change in pull request #6680: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#discussion_r225732036
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser):
 type=str,
 help='GCE minimum CPU platform. Default is determined by GCP.'
 )
+parser.add_argument(
+'--dataflow_worker_jar',
+dest='dataflow_worker_jar',
+type=str,
+help='Dataflow worker jar.'
 
 Review comment:
   Could you update the description here. 
   
   We would not expect users to use this option typically. Biggest use case is 
probably development related changes. And it also cannot be used for legacy 
pipelines either. (Should this be an error, if fn api experiment is not set but 
this flag is used?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155184)
Time Spent: 5h  (was: 4h 50m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155170
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 22:38
Start Date: 16/Oct/18 22:38
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430425707
 
 
   How does this interact with installing the packages in boot.go. Would not 
this 
(https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L104) 
fail?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155170)
Time Spent: 4h 40m  (was: 4.5h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155151
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 22:06
Start Date: 16/Oct/18 22:06
Worklog Time Spent: 10m 
  Work Description: pabloem closed pull request #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/sdks/python/apache_beam/options/pipeline_options.py 
b/sdks/python/apache_beam/options/pipeline_options.py
index a0059dbb381..357c97ea6da 100644
--- a/sdks/python/apache_beam/options/pipeline_options.py
+++ b/sdks/python/apache_beam/options/pipeline_options.py
@@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser):
 type=str,
 help='GCE minimum CPU platform. Default is determined by GCP.'
 )
+parser.add_argument(
+'--dataflow_worker_jar',
+dest='dataflow_worker_jar',
+type=str,
+help='Dataflow worker jar.'
+)
 
   def validate(self, validator):
 errors = []
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py 
b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index 1acd3488524..4143f2dbb1d 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -381,6 +381,13 @@ def run_pipeline(self, pipeline):
 self.dataflow_client = apiclient.DataflowApplicationClient(
 pipeline._options)
 
+dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None)
+if dataflow_worker_jar is not None:
+  experiments = ["use_staged_dataflow_worker_jar"]
+  if debug_options.experiments is not None:
+experiments = list(set(experiments + debug_options.experiments))
+  debug_options.experiments = experiments
+
 # Create the job description and send a request to the service. The result
 # can be None if there is no need to send a request to the service (e.g.
 # template creation). If a request was sent and failed then the call will
diff --git a/sdks/python/apache_beam/runners/portability/stager.py 
b/sdks/python/apache_beam/runners/portability/stager.py
index ef7401ac6aa..cd7e24fce51 100644
--- a/sdks/python/apache_beam/runners/portability/stager.py
+++ b/sdks/python/apache_beam/runners/portability/stager.py
@@ -59,6 +59,7 @@
 from apache_beam.internal import pickler
 from apache_beam.io.filesystems import FileSystems
 from apache_beam.options.pipeline_options import SetupOptions
+from apache_beam.options.pipeline_options import WorkerOptions
 # TODO(angoenka): Remove reference to dataflow internal names
 from apache_beam.runners.dataflow.internal import names
 from apache_beam.utils import processes
@@ -123,8 +124,7 @@ def stage_job_resources(self,
 
 Returns:
   A list of file names (no paths) for the resources staged. All the
-  files
-  are assumed to be staged at staging_location.
+  files are assumed to be staged at staging_location.
 
 Raises:
   RuntimeError: If files specified are not found or error encountered
@@ -256,6 +256,14 @@ def stage_job_resources(self,
 'The file "%s" cannot be found. Its location was specified by '
 'the --sdk_location command-line option.' % sdk_path)
 
+worker_options = options.view_as(WorkerOptions)
+dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None)
+if dataflow_worker_jar is not None:
+  jar_staged_filename = 'dataflow-worker.jar'
+  staged_path = FileSystems.join(staging_location, jar_staged_filename)
+  self.stage_artifact(dataflow_worker_jar, staged_path)
+  resources.append(jar_staged_filename)
+
 # Delete all temp files created while staging job resources.
 shutil.rmtree(temp_dir)
 retrieval_token = self.commit_manifest()


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155151)
Time Spent: 4.5h  (was: 4h 20m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> 

[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155083
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 19:30
Start Date: 16/Oct/18 19:30
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430369135
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 155083)
Time Spent: 4h 20m  (was: 4h 10m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154953
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 15:59
Start Date: 16/Oct/18 15:59
Worklog Time Spent: 10m 
  Work Description: HuangLED removed a comment on issue #6680: [BEAM-5637] 
Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430101823
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154953)
Time Spent: 4h 10m  (was: 4h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154934
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 15:42
Start Date: 16/Oct/18 15:42
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430288120
 
 
   > Is it possible to add an integration test, using a jar built from the repo?
   
   Yes. That is planned (with tracking JIRA BEAM-5703), will be done in 
separated PRs.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154934)
Time Spent: 4h  (was: 3h 50m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154677
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 09:18
Start Date: 16/Oct/18 09:18
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430164368
 
 
   Is it possible to add an integration test, using a jar built from the repo?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154677)
Time Spent: 3h 50m  (was: 3h 40m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154598
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 05:03
Start Date: 16/Oct/18 05:03
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430101823
 
 
   Run Python PostCommit


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154598)
Time Spent: 3h 40m  (was: 3.5h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154593
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 16/Oct/18 04:17
Start Date: 16/Oct/18 04:17
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on a change in pull request #6680: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#discussion_r225392436
 
 

 ##
 File path: sdks/python/jar.txt
 ##
 @@ -0,0 +1 @@
+
 
 Review comment:
   Could you please remove this file?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154593)
Time Spent: 3.5h  (was: 3h 20m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154538
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 15/Oct/18 23:58
Start Date: 15/Oct/18 23:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430054356
 
 
   LGTM, modulo Boyuan's comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154538)
Time Spent: 3h 20m  (was: 3h 10m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154536
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 15/Oct/18 23:54
Start Date: 15/Oct/18 23:54
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on a change in pull request #6680: 
[BEAM-5637] Python support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#discussion_r225353704
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/stager.py
 ##
 @@ -256,6 +256,14 @@ def stage_job_resources(self,
 'The file "%s" cannot be found. Its location was specified by '
 'the --sdk_location command-line option.' % sdk_path)
 
+worker_options = options.view_as(WorkerOptions)
+if hasattr(worker_options, 'dataflow_worker_jar') and \
 
 Review comment:
   interesting read.  However in this particular case (pipelineOption, which is 
nothing more than just a data field) we are completely safe?Thoughts? 
   
   That being said, I will apply it for this PR anyway.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154536)
Time Spent: 3h 10m  (was: 3h)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar

2018-10-15 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154478
 ]

ASF GitHub Bot logged work on BEAM-5637:


Author: ASF GitHub Bot
Created on: 15/Oct/18 21:54
Start Date: 15/Oct/18 21:54
Worklog Time Spent: 10m 
  Work Description: herohde commented on issue #6680: [BEAM-5637] Python 
support for custom dataflow worker jar
URL: https://github.com/apache/beam/pull/6680#issuecomment-430028150
 
 
   LGTM, but someone with python experience should review: @pabloem ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 154478)
Time Spent: 3h  (was: 2h 50m)

> Python support for custom dataflow worker jar
> -
>
> Key: BEAM-5637
> URL: https://issues.apache.org/jira/browse/BEAM-5637
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Ruoyun Huang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> One of the slightly subtle aspects is that we would need to ignore one of the 
> staged jars for portable Python jobs. That requires a change to the Python 
> boot code: 
> https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)