[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2018-05-08 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467691#comment-16467691
 ] 

Luke Cwik commented on BEAM-2927:
-

pull/5302 has current progress.

The remaining issue is to clean up some legacy usages of side inputs via a non 
portable way to get tests to pass. Also to put in the side input optimization.

> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Luke Cwik
>Priority: Blocker
>  Labels: portability
> Fix For: 2.5.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2018-04-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16423602#comment-16423602
 ] 

Aljoscha Krettek commented on BEAM-2927:


The problematic piece is this: 
https://github.com/apache/beam/blob/1f681bb1687aaa1ec23741e87f8622e6a4e59f7d/sdks/python/apache_beam/runners/worker/sdk_worker.py#L72

The python harness tries to create a state client that uses the control API 
descriptor when starting the worker. However, the worker harness should use the 
state API descriptor that it gets from the {{ProcessBundleDescriptor}} to 
access state and side inputs.

cc [~lcwik]

> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2018-03-30 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420719#comment-16420719
 ] 

Thomas Weise commented on BEAM-2927:


Note that I run into this even though the pipeline does not use side inputs. 
{code:java}
  with beam.Pipeline(runner=runner, options=pipeline_options) as p:

    (p

        | 'Create' >> beam.Create(['hello', 'world'])

        #| 'Read' >> 
ReadFromText("gs://dataflow-samples/shakespeare/kinglear.txt")

        | 'Split' >> (beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))

                        .with_output_types(unicode))

        #| 'PairWithOne' >> beam.Map(lambda x: (x, 1))

        #| 'GroupAndSum' >> beam.CombinePerKey(sum)

        #| beam.Map(lambda x: logging.info("Got %s", x) or (x, 1))

){code}
{code:java}
2018/03/30 16:26:46 Initializing python harness: /opt/apache/beam/boot 
--id=-123 --logging_endpoint=docker.for.mac.host.internal:52193 
--artifact_endpoint=docker.for.mac.host.internal:52194 
--provision_endpoint=docker.for.mac.host.internal:52195 
--control_endpoint=docker.for.mac.host.internal:52191 
--semi_persist_dir=/tmp/semi_persistent_dir2308232882860594197

2018/03/30 16:26:46 Executing: python -m 
apache_beam.runners.worker.sdk_worker_main

Exception in thread read_state:

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner

    self.run()

  File "/usr/local/lib/python2.7/threading.py", line 754, in run

    self.__target(*self.__args, **self.__kwargs)

  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 262, in pull_responses

    for response in responses:

  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 350, in 
next

    return self._next()

  File "/usr/local/lib/python2.7/site-packages/grpc/_channel.py", line 341, in 
_next

    raise self

_Rendezvous: <_Rendezvous of RPC that terminated with 
(StatusCode.UNIMPLEMENTED, Method not found: 
org.apache.beam.model.fn_execution.v1.BeamFnState/State)>{code}

> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2018-03-30 Thread Luke Cwik (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420703#comment-16420703
 ] 

Luke Cwik commented on BEAM-2927:
-

When [~thw] tried to run a pipeline on Flink containing side inputs, he got:
*_Rendezvous: <_Rendezvous of RPC that terminated with 
(StatusCode.UNIMPLEMENTED, Method not found: 
org.apache.beam.model.fn_execution.v1.BeamFnState/State)>*

> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2018-03-23 Thread Ahmet Altay (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411749#comment-16411749
 ] 

Ahmet Altay commented on BEAM-2927:
---

[https://github.com/apache/beam/pull/4781] introduced a failure in post commit 
tests for: apache_beam.examples.wordcount_it_test.WordCountIT

failed jenkins run: 
[https://builds.apache.org/view/A-D/view/Beam/job/beam_PostCommit_Python_Verify/4472/consoleFull]
==
ERROR: test_wordcount_it (apache_beam.examples.wordcount_it_test.WordCountIT)
--
Traceback (most recent call last):
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount_it_test.py",
 line 66, in test_wordcount_it
wordcount.run(test_pipeline.get_full_options_as_args(**extra_opts))
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/examples/wordcount.py",
 line 115, in run
result = p.run()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
 line 369, in run
self.to_runner_api(), self.runner, self._options).run(False)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/pipeline.py",
 line 382, in run
return self.runner.run_pipeline(self)
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py",
 line 57, in run_pipeline
self.result.wait_until_finish()
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py",
 line 1084, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", 
line 609, in do_work
work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
line 167, in execute
op.start()
  File "apache_beam/runners/worker/operations.py", line 340, in 
apache_beam.runners.worker.operations.DoOperation.start
def start(self):
  File "apache_beam/runners/worker/operations.py", line 341, in 
apache_beam.runners.worker.operations.DoOperation.start
with self.scoped_start_state:
  File "apache_beam/runners/worker/operations.py", line 373, in 
apache_beam.runners.worker.operations.DoOperation.start
self.dofn_runner = common.DoFnRunner(
  File "apache_beam/runners/common.py", line 483, in 
apache_beam.runners.common.DoFnRunner.__init__
self.do_fn_invoker = DoFnInvoker.create_invoker(
  File "apache_beam/runners/common.py", line 203, in 
apache_beam.runners.common.DoFnInvoker.create_invoker
return PerWindowInvoker(
  File "apache_beam/runners/common.py", line 313, in 
apache_beam.runners.common.PerWindowInvoker.__init__
input_args, input_kwargs, [si[global_window] for si in side_inputs])
  File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py", 
line 62, in __getitem__
self._cache[window] = self._view_class._from_runtime_iterable(
AttributeError: type object '_DataflowIterableSideInput' has no attribute 
'_from_runtime_iterable'
 

> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2017-10-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214222#comment-16214222
 ] 

ASF GitHub Bot commented on BEAM-2927:
--

Github user asfgit closed the pull request at:

https://github.com/apache/beam/pull/4020


> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (BEAM-2927) Python SDK support for portable side input

2017-10-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211992#comment-16211992
 ] 

ASF GitHub Bot commented on BEAM-2927:
--

GitHub user robertwb opened a pull request:

https://github.com/apache/beam/pull/4020

[BEAM-2927] Python SDK support for portable side input

Follow this checklist to help us incorporate your contribution quickly and 
easily:

 - [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
 - [ ] Each commit in the pull request should have a meaningful subject 
line and body.
 - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
 - [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
 - [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
 - [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).

---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/robertwb/incubator-beam side-inputs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/beam/pull/4020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4020


commit 45c046308e1afeb6d4fcf18a8a3861a12e9933f9
Author: Robert Bradshaw 
Date:   2017-10-19T19:11:44Z

Implement FnApi side inputs in Python.

commit 202dfc74e58e9736bb40ed0382fa03dd072b7ab4
Author: Robert Bradshaw 
Date:   2017-10-19T23:59:34Z

cleanup

commit 198c864aca35b6bca1313e2f216dbb40c73b27e0
Author: Robert Bradshaw 
Date:   2017-10-20T00:02:02Z

Revert unneeded changes.

commit 3dfe862220afbc5e57d7ac674a329cb5e66164cf
Author: Robert Bradshaw 
Date:   2017-10-20T00:20:40Z

lint

commit abc8cd910559a5a8ac40f752d59091e6a2c38cd3
Author: Robert Bradshaw 
Date:   2017-10-20T00:48:59Z

more cleanup




> Python SDK support for portable side input
> --
>
> Key: BEAM-2927
> URL: https://issues.apache.org/jira/browse/BEAM-2927
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Henning Rohde
>Assignee: Robert Bradshaw
>  Labels: portability
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)