[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and do not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061206#comment-17061206 ] David Yan commented on BEAM-8551: - `pip check` is another way to check for broken dependencies. > Beam Python containers should include all Beam SDK dependencies, and do not > have conflicting dependencies > - > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9530) Add `pip check` to ensure good python dependencies
[ https://issues.apache.org/jira/browse/BEAM-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan closed BEAM-9530. --- Fix Version/s: Not applicable Resolution: Duplicate > Add `pip check` to ensure good python dependencies > -- > > Key: BEAM-9530 > URL: https://issues.apache.org/jira/browse/BEAM-9530 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: David Yan >Priority: Major > Fix For: Not applicable > > > We should add {{pip check}} after pip install in our tests to make sure there > is no incompatibility. {{pip install}} does not return an error exit code > for broken dependencies for historical reasons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with each other
[ https://issues.apache.org/jira/browse/BEAM-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17061158#comment-17061158 ] David Yan commented on BEAM-9510: - Also related: BEAM-9530 > Dependencies in base_image_requirements.txt are not compatible with each other > -- > > Key: BEAM-9510 > URL: https://issues.apache.org/jira/browse/BEAM-9510 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Assignee: Hannah Jiang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] > says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, > google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 > But they are incompatible with each other: > ERROR: google-cloud-bigquery 1.24.0 has requirement > google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: google-cloud-bigtable 0.32.1 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have > grpcio 1.22.0 which is incompatible. > ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", > but you'll have scipy 1.2.2 which is incompatible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9530) Add `pip check` to ensure good python dependencies
David Yan created BEAM-9530: --- Summary: Add `pip check` to ensure good python dependencies Key: BEAM-9530 URL: https://issues.apache.org/jira/browse/BEAM-9530 Project: Beam Issue Type: Improvement Components: sdk-py-harness Reporter: David Yan We should add {{pip check}} after pip install in our tests to make sure there is no incompatibility. {{pip install}} does not return an error exit code for broken dependencies for historical reasons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with each other
[ https://issues.apache.org/jira/browse/BEAM-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-9510: Summary: Dependencies in base_image_requirements.txt are not compatible with each other (was: Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps) > Dependencies in base_image_requirements.txt are not compatible with each other > -- > > Key: BEAM-9510 > URL: https://issues.apache.org/jira/browse/BEAM-9510 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Priority: Major > > [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] > says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, > google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 > But they are incompatible with each other: > ERROR: google-cloud-bigquery 1.24.0 has requirement > google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: google-cloud-bigtable 0.32.1 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have > grpcio 1.22.0 which is incompatible. > ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", > but you'll have scipy 1.2.2 which is incompatible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps
David Yan created BEAM-9510: --- Summary: Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps Key: BEAM-9510 URL: https://issues.apache.org/jira/browse/BEAM-9510 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: David Yan [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 But they are incompatible with each other: ERROR: google-cloud-bigquery 1.24.0 has requirement google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 which is incompatible. ERROR: google-cloud-bigtable 0.32.1 has requirement google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 which is incompatible. ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have grpcio 1.22.0 which is incompatible. ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.2.2 which is incompatible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9508) Python installation fails if grpc_tools is not installed
[ https://issues.apache.org/jira/browse/BEAM-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17060427#comment-17060427 ] David Yan commented on BEAM-9508: - This is fixed by installing mypy-protobuf, which is not immediately obvious from the stacktrace. I'll leave this ticket open for a better error message. > Python installation fails if grpc_tools is not installed > > > Key: BEAM-9508 > URL: https://issues.apache.org/jira/browse/BEAM-9508 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: David Yan >Priority: Major > > When installing from master branch, I'm getting an exception below. Looks > like the ImportError exception handling throws an exception itself. I'll > manually install grpc_tools and try again but the handling of ImportError has > issues. > > ``` > Traceback (most recent call last): > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 292, > in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, > in _bootstrap > self.run() > File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in > run > self._target(*self._args, **self._kwargs) > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 378, > in _install_grpcio_tools_and_generate_proto_files > generate_proto_files(force=force) > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 315, > in generate_proto_files > protoc_gen_mypy = _find_protoc_gen_mypy() > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 233, > in _find_protoc_gen_mypy > (fname, ', '.join(search_paths))) > RuntimeError: Could not find protoc-gen-mypy in > /root/apache-beam-custom/bin, /root/apache-beam-custom/bin, /usr/local/bin, > /opt/conda/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, > /bin > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9508) Python installation fails if grpc_tools is not installed
David Yan created BEAM-9508: --- Summary: Python installation fails if grpc_tools is not installed Key: BEAM-9508 URL: https://issues.apache.org/jira/browse/BEAM-9508 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: David Yan When installing from master branch, I'm getting an exception below. Looks like the ImportError exception handling throws an exception itself. I'll manually install grpc_tools and try again but the handling of ImportError has issues. ``` Traceback (most recent call last): File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 292, in generate_proto_files from grpc_tools import protoc ModuleNotFoundError: No module named 'grpc_tools' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 378, in _install_grpcio_tools_and_generate_proto_files generate_proto_files(force=force) File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 315, in generate_proto_files protoc_gen_mypy = _find_protoc_gen_mypy() File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 233, in _find_protoc_gen_mypy (fname, ', '.join(search_paths))) RuntimeError: Could not find protoc-gen-mypy in /root/apache-beam-custom/bin, /root/apache-beam-custom/bin, /usr/local/bin, /opt/conda/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9487) GBKs on unbounded pcolls with global windows and no triggers should fail
[ https://issues.apache.org/jira/browse/BEAM-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-9487: Labels: EaseOfUse starter (was: starter) > GBKs on unbounded pcolls with global windows and no triggers should fail > > > Key: BEAM-9487 > URL: https://issues.apache.org/jira/browse/BEAM-9487 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > Labels: EaseOfUse, starter > > This, according to "4.2.2.1 GroupByKey and unbounded PCollections" in > https://beam.apache.org/documentation/programming-guide/. > bq. If you do apply GroupByKey or CoGroupByKey to a group of unbounded > PCollections without setting either a non-global windowing strategy, a > trigger strategy, or both for each collection, Beam generates an > IllegalStateException error at pipeline construction time. > Example where this doesn't happen in Python SDK: > https://stackoverflow.com/questions/60623246/merge-pcollection-with-apache-beam > I also believe that this unit test should fail, since test_stream is > unbounded, uses global window, and has no triggers. > {code} > def test_global_window_gbk_fail(self): > with TestPipeline() as p: > test_stream = TestStream() > _ = p | test_stream | GroupByKey() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-3453. - Fix Version/s: 2.20.0 Resolution: Fixed > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Assignee: David Yan >Priority: Major > Fix For: 2.20.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan reassigned BEAM-3453: --- Assignee: David Yan > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Assignee: David Yan >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033859#comment-17033859 ] David Yan commented on BEAM-3453: - This is fixed by [GitHub Pull Request #10762|https://github.com/apache/beam/pull/10762]. > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline
David Yan created BEAM-8415: --- Summary: Improve error message when adding a PTransform with a name that already exists in the pipeline Key: BEAM-8415 URL: https://issues.apache.org/jira/browse/BEAM-8415 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: David Yan Currently, when trying to apply a PTransform with a name that already exists in the pipeline, it returns a confusing error: Transform "XXX" does not have a stable unique label. This will prevent updating of pipelines. To apply a transform with a specified label write pvalue | "label" >> transform We'd like to improve this error message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-7982) Dataflow runner needs to identify the new format of metric names for distribution metrics
David Yan created BEAM-7982: --- Summary: Dataflow runner needs to identify the new format of metric names for distribution metrics Key: BEAM-7982 URL: https://issues.apache.org/jira/browse/BEAM-7982 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan For example, [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py#L157] uses [MAX], [MIN], etc. but the new format will be _MAX, _MIN, etc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7957) Warn at job submit time if a step is named with a / or empty in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7957: Summary: Warn at job submit time if a step is named with a / or empty in DataflowRunner (was: Warn users if a step is named with a / or empty in DataflowRunner) > Warn at job submit time if a step is named with a / or empty in DataflowRunner > -- > > Key: BEAM-7957 > URL: https://issues.apache.org/jira/browse/BEAM-7957 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: David Yan >Priority: Major > > When a job with an empty step name or a step name that has a "/" in it, it > quietly breaks the job graph in the Dataflow UI. We should at least warn the > user at job submit time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7957) Warn users if a step is named with a / or empty in DataflowRunner
David Yan created BEAM-7957: --- Summary: Warn users if a step is named with a / or empty in DataflowRunner Key: BEAM-7957 URL: https://issues.apache.org/jira/browse/BEAM-7957 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan When a job with an empty step name or a step name that has a "/" in it, it quietly breaks the job graph in the Dataflow UI. We should at least warn the user at job submit time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-7876. - Resolution: Fixed Fix Version/s: 2.15.0 > Interactive Beam example does not work with Python3 > --- > > Key: BEAM-7876 > URL: https://issues.apache.org/jira/browse/BEAM-7876 > Project: Beam > Issue Type: Bug > Components: examples-python >Reporter: David Yan >Priority: Major > Fix For: 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When going through the example > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] > using Jupyter Notebook running in Python 3, the run() method throws an error > the following error: > {{TypeError Traceback (most recent call last)}} > {{ in }} > {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} > {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} > {{> 5 result = p.run()}} > {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 404 self.to_runner_api(use_fake_coders=True),}} > {{ 405 self.runner,}} > {{--> 406 self._options).run(False)}} > {{ 407 }} > {{ 408 if > self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 417 finally:}} > {{ 418 shutil.rmtree(tmpdir)}} > {{--> 419 return self.runner.run_pipeline(self, self._options)}} > {{ 420 }} > {{ 421 def > __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py > in run_pipeline(self, pipeline, options)}} > {{ 142 cache_manager=self._cache_manager,}} > {{ 143 pipeline_graph_renderer=self._renderer)}} > {{--> 144 display.start_periodic_update()}} > {{ 145 result = pipeline_to_execute.run()}} > {{ 146 > result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in start_periodic_update(self)}} > {{ 158 def start_periodic_update(self):}} > {{ 159 """Start a thread that periodically updates the display."""}} > {{--> 160 self.update_display(True)}} > {{ 161 self._periodic_update = True}} > {{ > 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in update_display(self, force)}} > {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} > {{ 150 self._pipeline_graph)}} > {{--> 151 display.display(display.HTML(rendered_graph))}} > {{ 152 }} > {{ 153 > _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in __init__(self, data, url, filename, metadata)}} > {{ 691 return prefix.startswith("")}} > {{ 692 }} > {{--> 693 if warn():}} > {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} > {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, > metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in warn()}} > {{ 689 prefix = data[:10].lower()}} > {{ 690 suffix = data[-10:].lower()}} > {{--> 691 return prefix.startswith(" suffix.endswith("")}} > {{ 692 }} > {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple > of bytes, not str}} > > > > This does not happen with Python 2. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Status: Open (was: Triage Needed) > Interactive Beam example does not work with Python3 > --- > > Key: BEAM-7876 > URL: https://issues.apache.org/jira/browse/BEAM-7876 > Project: Beam > Issue Type: Bug > Components: examples-python >Reporter: David Yan >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > When going through the example > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] > using Jupyter Notebook running in Python 3, the run() method throws an error > the following error: > {{TypeError Traceback (most recent call last)}} > {{ in }} > {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} > {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} > {{> 5 result = p.run()}} > {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 404 self.to_runner_api(use_fake_coders=True),}} > {{ 405 self.runner,}} > {{--> 406 self._options).run(False)}} > {{ 407 }} > {{ 408 if > self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 417 finally:}} > {{ 418 shutil.rmtree(tmpdir)}} > {{--> 419 return self.runner.run_pipeline(self, self._options)}} > {{ 420 }} > {{ 421 def > __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py > in run_pipeline(self, pipeline, options)}} > {{ 142 cache_manager=self._cache_manager,}} > {{ 143 pipeline_graph_renderer=self._renderer)}} > {{--> 144 display.start_periodic_update()}} > {{ 145 result = pipeline_to_execute.run()}} > {{ 146 > result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in start_periodic_update(self)}} > {{ 158 def start_periodic_update(self):}} > {{ 159 """Start a thread that periodically updates the display."""}} > {{--> 160 self.update_display(True)}} > {{ 161 self._periodic_update = True}} > {{ > 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in update_display(self, force)}} > {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} > {{ 150 self._pipeline_graph)}} > {{--> 151 display.display(display.HTML(rendered_graph))}} > {{ 152 }} > {{ 153 > _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in __init__(self, data, url, filename, metadata)}} > {{ 691 return prefix.startswith("")}} > {{ 692 }} > {{--> 693 if warn():}} > {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} > {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, > metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in warn()}} > {{ 689 prefix = data[:10].lower()}} > {{ 690 suffix = data[-10:].lower()}} > {{--> 691 return prefix.startswith(" suffix.endswith("")}} > {{ 692 }} > {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple > of bytes, not str}} > > > > This does not happen with Python 2. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Description: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{> 5 result = p.run()}} {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{--> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{--> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{--> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{--> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{--> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692 }} {{--> 693 if warn():}} {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn()}} {{ 689 prefix = data[:10].lower()}} {{ 690 suffix = data[-10:].lower()}} {{--> 691 return prefix.startswith("")}} {{ 692 }} {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple of bytes, not str}} This does not happen with Python 2. was: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{ > 5 result = p.run()}} {{ 6 result.wait_until_finish()~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{ --> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{ --> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{ --> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{ --> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{ --> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Description: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{ > 5 result = p.run()}} {{ 6 result.wait_until_finish()~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{ --> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{ --> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{ --> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{ --> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{ --> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692 }} {{ --> 693 if warn():}} {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata)~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn()}} {{ 689 prefix = data[:10].lower()}} {{ 690 suffix = data[-10:].lower()}} {{ --> 691 return prefix.startswith("")}} {{ 692 }} {{ 693 if warn():TypeError: startswith first arg must be bytes or a tuple of bytes, not str }} This does not happen with Python 2. was: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error: TypeError Traceback (most recent call last) in 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x) 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3) > 5 result = p.run() 6 result.wait_until_finish() ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 404 self.to_runner_api(use_fake_coders=True), 405 self.runner, --> 406 self._options).run(False) 407 408 if self._options.view_as(TypeOptions).runtime_type_check: ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 417 finally: 418 shutil.rmtree(tmpdir) --> 419 return self.runner.run_pipeline(self, self._options) 420 421 def __enter__(self): ~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options) 142 cache_manager=self._cache_manager, 143 pipeline_graph_renderer=self._renderer) --> 144 display.start_periodic_update() 145 result = pipeline_to_execute.run() 146 result.wait_until_finish() ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self) 158 def start_periodic_update(self): 159 """Start a thread that periodically updates the display.""" --> 160 self.update_display(True) 161 self._periodic_update = True 162 ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force) 149 rendered_graph = self._renderer.render_pipeline_graph( 150 self._pipeline_graph) --> 151 display.display(display.HTML(rendered_graph)) 152 153 _display_progress('Running...') ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata) 691 return prefix.startswith("") 692 --> 693 if warn(): 694 warnings.warn("Consider using IPython.display.IFrame instead") 695 super(HTML, self).__init__(data=data, url=url, filename=filename,
[jira] [Created] (BEAM-7876) Interactive Beam example does not work with Python3
David Yan created BEAM-7876: --- Summary: Interactive Beam example does not work with Python3 Key: BEAM-7876 URL: https://issues.apache.org/jira/browse/BEAM-7876 Project: Beam Issue Type: Bug Components: examples-python Reporter: David Yan When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error: TypeError Traceback (most recent call last) in 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x) 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3) > 5 result = p.run() 6 result.wait_until_finish() ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 404 self.to_runner_api(use_fake_coders=True), 405 self.runner, --> 406 self._options).run(False) 407 408 if self._options.view_as(TypeOptions).runtime_type_check: ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 417 finally: 418 shutil.rmtree(tmpdir) --> 419 return self.runner.run_pipeline(self, self._options) 420 421 def __enter__(self): ~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options) 142 cache_manager=self._cache_manager, 143 pipeline_graph_renderer=self._renderer) --> 144 display.start_periodic_update() 145 result = pipeline_to_execute.run() 146 result.wait_until_finish() ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self) 158 def start_periodic_update(self): 159 """Start a thread that periodically updates the display.""" --> 160 self.update_display(True) 161 self._periodic_update = True 162 ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force) 149 rendered_graph = self._renderer.render_pipeline_graph( 150 self._pipeline_graph) --> 151 display.display(display.HTML(rendered_graph)) 152 153 _display_progress('Running...') ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata) 691 return prefix.startswith("") 692 --> 693 if warn(): 694 warnings.warn("Consider using IPython.display.IFrame instead") 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata) ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn() 689 prefix = data[:10].lower() 690 suffix = data[-10:].lower() --> 691 return prefix.startswith("") 692 693 if warn(): TypeError: startswith first arg must be bytes or a tuple of bytes, not str -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7408) Beam Programming Guide inconsistencies
[ https://issues.apache.org/jira/browse/BEAM-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-7408. - Resolution: Fixed > Beam Programming Guide inconsistencies > -- > > Key: BEAM-7408 > URL: https://issues.apache.org/jira/browse/BEAM-7408 > Project: Beam > Issue Type: Improvement > Components: website >Affects Versions: Not applicable >Reporter: David Yan >Priority: Major > Labels: documentation, newbie > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > [https://beam.apache.org/documentation/programming-guide/] > > Pipeline option example: > > Examples in Java, Python and Go are not consistent. Java has myCustomOption, > while Python and Go have "input" and "output". > > When Python is chosen, the doc says --myCustomOption=value is supported, > which only corresponds to the java example. > > Reading from external source: > > Java, Python and Go are not consistent. Python example reads from a GCS file, > while others specify a generic file. > [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: > The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7408) Beam Programming Guide inconsistencies
[ https://issues.apache.org/jira/browse/BEAM-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16857005#comment-16857005 ] David Yan commented on BEAM-7408: - Yes, thank you. :) > Beam Programming Guide inconsistencies > -- > > Key: BEAM-7408 > URL: https://issues.apache.org/jira/browse/BEAM-7408 > Project: Beam > Issue Type: Improvement > Components: website >Affects Versions: Not applicable >Reporter: David Yan >Priority: Major > Labels: documentation, newbie > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > [https://beam.apache.org/documentation/programming-guide/] > > Pipeline option example: > > Examples in Java, Python and Go are not consistent. Java has myCustomOption, > while Python and Go have "input" and "output". > > When Python is chosen, the doc says --myCustomOption=value is supported, > which only corresponds to the java example. > > Reading from external source: > > Java, Python and Go are not consistent. Python example reads from a GCS file, > while others specify a generic file. > [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: > The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7408) Beam Programming Guide inconsistencies
David Yan created BEAM-7408: --- Summary: Beam Programming Guide inconsistencies Key: BEAM-7408 URL: https://issues.apache.org/jira/browse/BEAM-7408 Project: Beam Issue Type: Improvement Components: website Reporter: David Yan [https://beam.apache.org/documentation/programming-guide/] Pipeline option example: Examples in Java, Python and Go are not consistent. Java has myCustomOption, while Python and Go have "input" and "output". When Python is chosen, the doc says --myCustomOption=value is supported, which only corresponds to the java example. Reading from external source: Java, Python and Go are not consistent. Python example reads from a GCS file, while others specify a generic file. [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7215) Wordcount example page does not tell the user to create the maven project using archetype
David Yan created BEAM-7215: --- Summary: Wordcount example page does not tell the user to create the maven project using archetype Key: BEAM-7215 URL: https://issues.apache.org/jira/browse/BEAM-7215 Project: Beam Issue Type: Improvement Components: website Reporter: David Yan [https://beam.apache.org/get-started/wordcount-example/#wordcount-example] does not have a link back to [https://beam.apache.org/get-started/quickstart-java/#get-the-wordcount-code]. If the user just follows the instructions in the first link (from a search engine let's say), they would get: {{$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://clouddfe-test/staging-$USER --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://world-readable-mkcq69tkcu/$USER/result.txt" -Pdataflow-runner [INFO] Scanning for projects... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 0.068 s [INFO] Finished at: 2019-05-02T13:32:15-07:00 [INFO] Final Memory: 23M/1948M [INFO] [WARNING] The requested profile "dataflow-runner" could not be activated because it does not exist. [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/usr/local/google/home/davidyan/beam). Please verify you invoked Maven from the correct directory. -> [Help 1]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7020) Reduce the log severity of profiling agent discovery
David Yan created BEAM-7020: --- Summary: Reduce the log severity of profiling agent discovery Key: BEAM-7020 URL: https://issues.apache.org/jira/browse/BEAM-7020 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan Example: [https://github.com/apache/beam/blob/b953645ed6db837d24284d7fe1fe091e7309f821/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java#L138] These should not be at warning severity, even if the profiling agent is not present since it's in most cases users do not run their jobs with profiling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6918) Github link requires login and example link is broken
David Yan created BEAM-6918: --- Summary: Github link requires login and example link is broken Key: BEAM-6918 URL: https://issues.apache.org/jira/browse/BEAM-6918 Project: Beam Issue Type: Improvement Components: examples-python Reporter: David Yan Two minor issues in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] 1. git clone g...@github.com:apache/beam.git requires the user to be logged in, while https://github.com/apache/beam does not. 2. Spaces in the example link need to be escaped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)