[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467691#comment-16467691 ] Luke Cwik commented on BEAM-2927: - pull/5302 has current progress. The remaining issue is to clean up some legacy usages of side inputs via a non portable way to get tests to pass. Also to put in the side input optimization. > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Luke Cwik >Priority: Blocker > Labels: portability > Fix For: 2.5.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423602#comment-16423602 ] Aljoscha Krettek commented on BEAM-2927: The problematic piece is this: https://github.com/apache/beam/blob/1f681bb1687aaa1ec23741e87f8622e6a4e59f7d/sdks/python/apache_beam/runners/worker/sdk_worker.py#L72 The python harness tries to create a state client that uses the control API descriptor when starting the worker. However, the worker harness should use the state API descriptor that it gets from the {{ProcessBundleDescriptor}} to access state and side inputs. cc [~lcwik] > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw >Priority: Major > Labels: portability > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420719#comment-16420719 ] Thomas Weise commented on BEAM-2927: Note that I run into this even though the pipeline does not use side inputs. {code:java} with beam.Pipeline(runner=runner, options=pipeline_options) as p: (p | 'Create' >> beam.Create(['hello', 'world']) #| 'Read' >> ReadFromText("gs://dataflow-samples/shakespeare/kinglear.txt") | 'Split' >> (beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x)) .with_output_types(unicode)) #| 'PairWithOne' >> beam.Map(lambda x: (x, 1)) #| 'GroupAndSum' >> beam.CombinePerKey(sum) #| beam.Map(lambda x: logging.info("Got %s", x) or (x, 1)) ){code} {code:java} 2018/03/30 16:26:46 Initializing python harness: /opt/apache/beam/boot --id=-123 --logging_endpoint=docker.for.mac.host.internal:52193 --artifact_endpoint=docker.for.mac.host.internal:52194 --provision_endpoint=docker.for.mac.host.internal:52195 --control_endpoint=docker.for.mac.host.internal:52191 --semi_persist_dir=/tmp/semi_persistent_dir2308232882860594197 2018/03/30 16:26:46 Executing: python -m apache_beam.runners.worker.sdk_worker_main Exception in thread read_state: Traceback (most recent call last): File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/local/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 262, in pull_responses for response in responses: File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 350, in next return self._next() File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 341, in _next raise self _Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNIMPLEMENTED, Method not found: org.apache.beam.model.fn_execution.v1.BeamFnState/State)>{code} > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw >Priority: Major > Labels: portability > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420703#comment-16420703 ] Luke Cwik commented on BEAM-2927: - When [~thw] tried to run a pipeline on Flink containing side inputs, he got: *_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNIMPLEMENTED, Method not found: org.apache.beam.model.fn_execution.v1.BeamFnState/State)>* > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw >Priority: Major > Labels: portability > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411749#comment-16411749 ] Ahmet Altay commented on BEAM-2927: --- [https://github.com/apache/beam/pull/4781] introduced a failure in post commit tests for: apache_beam.examples.wordcount_it_test.WordCountIT failed jenkins run: [https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/4472/consoleFull] == ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT) -- Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py", line 66, in test_wordcount_it wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts)) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py", line 115, in run result = p.run() File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py", line 369, in run self.to_runner_api(), self.runner, self._options).run(False) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py", line 382, in run return self.runner.run_pipeline(self) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 57, in run_pipeline self.result.wait_until_finish() File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1084, in wait_until_finish (self.state, getattr(self._runner, 'last_error_msg', None)), self) DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 609, in do_work work_executor.execute() File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute op.start() File "apache_beam/runners/worker/operations.py", line 340, in apache_beam.runners.worker.operations.DoOperation.start def start(self): File "apache_beam/runners/worker/operations.py", line 341, in apache_beam.runners.worker.operations.DoOperation.start with self.scoped_start_state: File "apache_beam/runners/worker/operations.py", line 373, in apache_beam.runners.worker.operations.DoOperation.start self.dofn_runner = common.DoFnRunner( File "apache_beam/runners/common.py", line 483, in apache_beam.runners.common.DoFnRunner.__init__ self.do_fn_invoker = DoFnInvoker.create_invoker( File "apache_beam/runners/common.py", line 203, in apache_beam.runners.common.DoFnInvoker.create_invoker return PerWindowInvoker( File "apache_beam/runners/common.py", line 313, in apache_beam.runners.common.PerWindowInvoker.__init__ input_args, input_kwargs, [si[global_window] for si in side_inputs]) File "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py", line 62, in __getitem__ self._cache[window] = self._view_class._from_runtime_iterable( AttributeError: type object '_DataflowIterableSideInput' has no attribute '_from_runtime_iterable' > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw >Priority: Major > Labels: portability > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214222#comment-16214222 ] ASF GitHub Bot commented on BEAM-2927: -- Github user asfgit closed the pull request at: https://github.com/apache/beam/pull/4020 > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw > Labels: portability > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (BEAM-2927) Python SDK support for portable side input
[ https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211992#comment-16211992 ] ASF GitHub Bot commented on BEAM-2927: -- GitHub user robertwb opened a pull request: https://github.com/apache/beam/pull/4020 [BEAM-2927] Python SDK support for portable side input Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Make sure there is a [JIRA issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes. - [ ] Each commit in the pull request should have a meaningful subject line and body. - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue. - [ ] Write a pull request description that is detailed enough to understand what the pull request does, how, and why. - [ ] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will be performed on your pull request automatically. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/robertwb/incubator-beam side-inputs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/4020.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4020 commit 45c046308e1afeb6d4fcf18a8a3861a12e9933f9 Author: Robert Bradshaw Date: 2017-10-19T19:11:44Z Implement FnApi side inputs in Python. commit 202dfc74e58e9736bb40ed0382fa03dd072b7ab4 Author: Robert Bradshaw Date: 2017-10-19T23:59:34Z cleanup commit 198c864aca35b6bca1313e2f216dbb40c73b27e0 Author: Robert Bradshaw Date: 2017-10-20T00:02:02Z Revert unneeded changes. commit 3dfe862220afbc5e57d7ac674a329cb5e66164cf Author: Robert Bradshaw Date: 2017-10-20T00:20:40Z lint commit abc8cd910559a5a8ac40f752d59091e6a2c38cd3 Author: Robert Bradshaw Date: 2017-10-20T00:48:59Z more cleanup > Python SDK support for portable side input > -- > > Key: BEAM-2927 > URL: https://issues.apache.org/jira/browse/BEAM-2927 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Henning Rohde >Assignee: Robert Bradshaw > Labels: portability > -- This message was sent by Atlassian JIRA (v6.4.14#64029)