[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=157770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-157770 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 23/Oct/18 19:05 Start Date: 23/Oct/18 19:05 Worklog Time Spent: 10m Work Description: HuangLED edited a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-432365710 > How does this interact with installing the packages in boot.go. Would not this (https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L104) fail? A quick update after tracing the code. @aaltay Got the issue pinned with hints from @herohde. **Root Cause.** When making testing runs, I didn't specify worker_harness_container_image, assuming the most recent code version would be used. However, the boot.go currently used in prod, lags behind by still using the internal one in google3 (third_party/.../python_fnapi/boot.go). That is the key thing missing in my testing run. That being said, the external python boot.go does not need further change either because it lists all the depended files explicitly, thus is already error-prone when an expected file exists. **What next.** Multiple choices regarding how to align this effort with the worker image migration, as well as some other integration testing migrations. Let me dive into this a bit more and get back soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 157770) Time Spent: 5h 50m (was: 5h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=156472=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-156472 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 19/Oct/18 20:33 Start Date: 19/Oct/18 20:33 Worklog Time Spent: 10m Work Description: aaltay closed pull request #6747: [BEAM-5637]Improve docs on worker jar option and add verification. URL: https://github.com/apache/beam/pull/6747 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index db7e7087ffe..4b4fbf640d2 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -554,7 +554,9 @@ def _add_argparse_args(cls, parser): '--dataflow_worker_jar', dest='dataflow_worker_jar', type=str, -help='Dataflow worker jar.' +help='Dataflow worker jar file. If specified, the jar file is staged ' + 'in GCS, then gets loaded by workers. End users usually ' + 'should not use this feature.' ) def validate(self, validator): diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 09e4f7d58da..ecaeda07c46 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -351,7 +351,7 @@ def run_pipeline(self, pipeline): setup_options.beam_plugins = plugins # Elevate "min_cpu_platform" to pipeline option, but using the existing -# experiment +# experiment. debug_options = pipeline._options.view_as(DebugOptions) worker_options = pipeline._options.view_as(WorkerOptions) if worker_options.min_cpu_platform: @@ -384,6 +384,11 @@ def run_pipeline(self, pipeline): dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) if dataflow_worker_jar is not None: + if not apiclient._use_fnapi(pipeline._options): +logging.fatal( +'Typical end users should not use this worker jar feature. ' +'It can only be used when fnapi is enabled.') + experiments = ["use_staged_dataflow_worker_jar"] if debug_options.experiments is not None: experiments = list(set(experiments + debug_options.experiments)) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 156472) Time Spent: 5.5h (was: 5h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=156374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-156374 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 19/Oct/18 15:40 Start Date: 19/Oct/18 15:40 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6747: [DRAFT] [BEAM-5637]Improve docs on worker jar option and add verification. URL: https://github.com/apache/beam/pull/6747#issuecomment-431406456 Please double check the accuracy. Thanks. R: @aaltay C: @herohde Also, the other issue Ahmet brought up in original PR/6680, I will need to trace the go code a little to understand its behavior, whatever fix we decided to do, will be in a separate PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 156374) Time Spent: 5h 20m (was: 5h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155183 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:44 Start Date: 16/Oct/18 22:44 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225731222 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( Review comment: In light of the discussion here on the dev@ list related to runner options (https://lists.apache.org/thread.html/78fe33dc41b04886f5355d66d50359265bfa2985580bb70f79c53545@%3Cdev.beam.apache.org%3E). Would it be better to expose this as a runner option? @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155183) Time Spent: 4h 50m (was: 4h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155184 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:44 Start Date: 16/Oct/18 22:44 Worklog Time Spent: 10m Work Description: aaltay commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225732036 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' Review comment: Could you update the description here. We would not expect users to use this option typically. Biggest use case is probably development related changes. And it also cannot be used for legacy pipelines either. (Should this be an error, if fn api experiment is not set but this flag is used?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155184) Time Spent: 5h (was: 4h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155170 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:38 Start Date: 16/Oct/18 22:38 Worklog Time Spent: 10m Work Description: aaltay commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430425707 How does this interact with installing the packages in boot.go. Would not this (https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L104) fail? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155170) Time Spent: 4h 40m (was: 4.5h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155151=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155151 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 22:06 Start Date: 16/Oct/18 22:06 Worklog Time Spent: 10m Work Description: pabloem closed pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/options/pipeline_options.py b/sdks/python/apache_beam/options/pipeline_options.py index a0059dbb381..357c97ea6da 100644 --- a/sdks/python/apache_beam/options/pipeline_options.py +++ b/sdks/python/apache_beam/options/pipeline_options.py @@ -520,6 +520,12 @@ def _add_argparse_args(cls, parser): type=str, help='GCE minimum CPU platform. Default is determined by GCP.' ) +parser.add_argument( +'--dataflow_worker_jar', +dest='dataflow_worker_jar', +type=str, +help='Dataflow worker jar.' +) def validate(self, validator): errors = [] diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py index 1acd3488524..4143f2dbb1d 100644 --- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py +++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py @@ -381,6 +381,13 @@ def run_pipeline(self, pipeline): self.dataflow_client = apiclient.DataflowApplicationClient( pipeline._options) +dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) +if dataflow_worker_jar is not None: + experiments = ["use_staged_dataflow_worker_jar"] + if debug_options.experiments is not None: +experiments = list(set(experiments + debug_options.experiments)) + debug_options.experiments = experiments + # Create the job description and send a request to the service. The result # can be None if there is no need to send a request to the service (e.g. # template creation). If a request was sent and failed then the call will diff --git a/sdks/python/apache_beam/runners/portability/stager.py b/sdks/python/apache_beam/runners/portability/stager.py index ef7401ac6aa..cd7e24fce51 100644 --- a/sdks/python/apache_beam/runners/portability/stager.py +++ b/sdks/python/apache_beam/runners/portability/stager.py @@ -59,6 +59,7 @@ from apache_beam.internal import pickler from apache_beam.io.filesystems import FileSystems from apache_beam.options.pipeline_options import SetupOptions +from apache_beam.options.pipeline_options import WorkerOptions # TODO(angoenka): Remove reference to dataflow internal names from apache_beam.runners.dataflow.internal import names from apache_beam.utils import processes @@ -123,8 +124,7 @@ def stage_job_resources(self, Returns: A list of file names (no paths) for the resources staged. All the - files - are assumed to be staged at staging_location. + files are assumed to be staged at staging_location. Raises: RuntimeError: If files specified are not found or error encountered @@ -256,6 +256,14 @@ def stage_job_resources(self, 'The file "%s" cannot be found. Its location was specified by ' 'the --sdk_location command-line option.' % sdk_path) +worker_options = options.view_as(WorkerOptions) +dataflow_worker_jar = getattr(worker_options, 'dataflow_worker_jar', None) +if dataflow_worker_jar is not None: + jar_staged_filename = 'dataflow-worker.jar' + staged_path = FileSystems.join(staging_location, jar_staged_filename) + self.stage_artifact(dataflow_worker_jar, staged_path) + resources.append(jar_staged_filename) + # Delete all temp files created while staging job resources. shutil.rmtree(temp_dir) retrieval_token = self.commit_manifest() This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155151) Time Spent: 4.5h (was: 4h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 >
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=155083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-155083 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 19:30 Start Date: 16/Oct/18 19:30 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430369135 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 155083) Time Spent: 4h 20m (was: 4h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154953 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 15:59 Start Date: 16/Oct/18 15:59 Worklog Time Spent: 10m Work Description: HuangLED removed a comment on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430101823 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154953) Time Spent: 4h 10m (was: 4h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154934=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154934 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 15:42 Start Date: 16/Oct/18 15:42 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430288120 > Is it possible to add an integration test, using a jar built from the repo? Yes. That is planned (with tracking JIRA BEAM-5703), will be done in separated PRs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154934) Time Spent: 4h (was: 3h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154677=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154677 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 09:18 Start Date: 16/Oct/18 09:18 Worklog Time Spent: 10m Work Description: robertwb commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430164368 Is it possible to add an integration test, using a jar built from the repo? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154677) Time Spent: 3h 50m (was: 3h 40m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154598 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 05:03 Start Date: 16/Oct/18 05:03 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430101823 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154598) Time Spent: 3h 40m (was: 3.5h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154593=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154593 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 16/Oct/18 04:17 Start Date: 16/Oct/18 04:17 Worklog Time Spent: 10m Work Description: boyuanzz commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225392436 ## File path: sdks/python/jar.txt ## @@ -0,0 +1 @@ + Review comment: Could you please remove this file? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154593) Time Spent: 3.5h (was: 3h 20m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154538 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 15/Oct/18 23:58 Start Date: 15/Oct/18 23:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430054356 LGTM, modulo Boyuan's comment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154538) Time Spent: 3h 20m (was: 3h 10m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154536 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 15/Oct/18 23:54 Start Date: 15/Oct/18 23:54 Worklog Time Spent: 10m Work Description: HuangLED commented on a change in pull request #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#discussion_r225353704 ## File path: sdks/python/apache_beam/runners/portability/stager.py ## @@ -256,6 +256,14 @@ def stage_job_resources(self, 'The file "%s" cannot be found. Its location was specified by ' 'the --sdk_location command-line option.' % sdk_path) +worker_options = options.view_as(WorkerOptions) +if hasattr(worker_options, 'dataflow_worker_jar') and \ Review comment: interesting read. However in this particular case (pipelineOption, which is nothing more than just a data field) we are completely safe?Thoughts? That being said, I will apply it for this PR anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154536) Time Spent: 3h 10m (was: 3h) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5637) Python support for custom dataflow worker jar
[ https://issues.apache.org/jira/browse/BEAM-5637?focusedWorklogId=154478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-154478 ] ASF GitHub Bot logged work on BEAM-5637: Author: ASF GitHub Bot Created on: 15/Oct/18 21:54 Start Date: 15/Oct/18 21:54 Worklog Time Spent: 10m Work Description: herohde commented on issue #6680: [BEAM-5637] Python support for custom dataflow worker jar URL: https://github.com/apache/beam/pull/6680#issuecomment-430028150 LGTM, but someone with python experience should review: @pabloem ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 154478) Time Spent: 3h (was: 2h 50m) > Python support for custom dataflow worker jar > - > > Key: BEAM-5637 > URL: https://issues.apache.org/jira/browse/BEAM-5637 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Ruoyun Huang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > One of the slightly subtle aspects is that we would need to ignore one of the > staged jars for portable Python jobs. That requires a change to the Python > boot code: > https://github.com/apache/beam/blob/66d7c865b7267f388ee60752891a9141fad43774/sdks/python/container/boot.go#L104 -- This message was sent by Atlassian JIRA (v7.6.3#76005)