[jira] [Assigned] (BEAM-10291) Lull detection log to include full thread dump
[ https://issues.apache.org/jira/browse/BEAM-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan reassigned BEAM-10291: Assignee: David Yan > Lull detection log to include full thread dump > -- > > Key: BEAM-10291 > URL: https://issues.apache.org/jira/browse/BEAM-10291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: David Yan >Assignee: David Yan >Priority: P2 > Labels: stale-P2 > Time Spent: 6h 10m > Remaining Estimate: 0h > > What we have today is a thread dump of the thread that's stuck, but in many > cases (most notably BQ) I/O happens in a separate thread that is not included > in the dump. Ideally, we'd need to have a full thread dump of the entire > process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10291) Lull detection log to include full thread dump
[ https://issues.apache.org/jira/browse/BEAM-10291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-10291: - Status: Resolved (was: Open) > Lull detection log to include full thread dump > -- > > Key: BEAM-10291 > URL: https://issues.apache.org/jira/browse/BEAM-10291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: David Yan >Assignee: David Yan >Priority: P2 > Labels: stale-P2 > Time Spent: 6h 10m > Remaining Estimate: 0h > > What we have today is a thread dump of the thread that's stuck, but in many > cases (most notably BQ) I/O happens in a separate thread that is not included > in the dump. Ideally, we'd need to have a full thread dump of the entire > process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186178#comment-17186178 ] David Yan edited comment on BEAM-8551 at 8/28/20, 12:16 AM: Also BEAM-10827 is the latest issue that is caused by lack of dependency presubmit check. I'm raising the priority of this issue. was (Author: davidyan): Also BEAM-10827 is another issue that is caused by lack of dependency presubmit check. I'm raising the priority of this issue. > Beam Python containers should include all Beam SDK dependencies, and not have > conflicting dependencies > -- > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P1 > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186178#comment-17186178 ] David Yan commented on BEAM-8551: - Also BEAM-10827 is another issue that is caused by lack of dependency presubmit check. I'm raising the priority of this issue. > Beam Python containers should include all Beam SDK dependencies, and not have > conflicting dependencies > -- > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P2 > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-8551: Priority: P1 (was: P2) > Beam Python containers should include all Beam SDK dependencies, and not have > conflicting dependencies > -- > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: P1 > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline
[ https://issues.apache.org/jira/browse/BEAM-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161538#comment-17161538 ] David Yan commented on BEAM-8415: - For Java, looks like it's done when the pipeline is in the [validate|[https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L591]] stage rather than when adding the PTransform to the pipeline like in Python so we cannot just use the error message we use in Python for Java. Should we just change the term "stable unique" to just "unique"? I'm not sure what "stable unique" means since the PTransform label AFAIK cannot be changed after the pipeline has been submitted. > Improve error message when adding a PTransform with a name that already > exists in the pipeline > -- > > Key: BEAM-8415 > URL: https://issues.apache.org/jira/browse/BEAM-8415 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-py-core >Reporter: David Yan >Priority: P2 > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, when trying to apply a PTransform with a name that already exists > in the pipeline, it returns a confusing error: > Transform "XXX" does not have a stable unique label. This will prevent > updating of pipelines. To apply a transform with a specified label write > pvalue | "label" >> transform > We'd like to improve this error message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10291) Lull detection log to include full thread dump
David Yan created BEAM-10291: Summary: Lull detection log to include full thread dump Key: BEAM-10291 URL: https://issues.apache.org/jira/browse/BEAM-10291 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan Assignee: David Yan What we have today is a thread dump of the thread that's stuck, but in many cases (most notably BQ) I/O happens in a separate thread that is not included in the dump. Ideally, we'd need to have a full thread dump of the entire process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-10247) google-api-core 1.20.0 is incompatible with the pinned version of grpc
[ https://issues.apache.org/jira/browse/BEAM-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-10247. -- Fix Version/s: 2.23.0 Resolution: Fixed > google-api-core 1.20.0 is incompatible with the pinned version of grpc > -- > > Key: BEAM-10247 > URL: https://issues.apache.org/jira/browse/BEAM-10247 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Assignee: David Yan >Priority: P1 > Fix For: 2.23.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > It looks like the google-api-core 1.20.0 has an issue with required > dependency or the lack thereof. This is causing this issue when using > datastore: > > {{Traceback (most recent call last):}} > {{ File "./query_license.py", line 11, in }} > {{ from google.cloud import datastore}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", > line 62, in }} > {{ from google.cloud.datastore.batch import Batch}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 24, in }} > {{ from google.cloud.datastore import helpers}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/helpers.py", > line 29, in }} > {{ from google.cloud.datastore_v1.proto import datastore_pb2}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/__init__.py", > line 18, in }} > {{ from google.cloud.datastore_v1.gapic import datastore_client}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", > line 22, in }} > {{ import google.api_core.gapic_v1.client_info}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/__init__.py", > line 26, in }} > {{ from google.api_core.gapic_v1 import method_async # noqa: F401}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/method_async.py", > line 20, in }} > {{ from google.api_core import general_helpers, grpc_helpers_async}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/grpc_helpers_async.py", > line 25, in }} > {{ from grpc.experimental import aio}} > {{ ImportError: cannot import name 'aio' from 'grpc.experimental' > (/root/apache-beam-custom/lib/python3.7/site-packages/grpc/experimental/__init__.py)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10247) google-api-core 1.20.0 is incompatible with the pinned version of grpc
[ https://issues.apache.org/jira/browse/BEAM-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133814#comment-17133814 ] David Yan commented on BEAM-10247: -- This issue is the exact same issue described in [https://github.com/googleapis/python-api-core/issues/40] > google-api-core 1.20.0 is incompatible with the pinned version of grpc > -- > > Key: BEAM-10247 > URL: https://issues.apache.org/jira/browse/BEAM-10247 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Assignee: David Yan >Priority: P1 > Time Spent: 10m > Remaining Estimate: 0h > > It looks like the google-api-core 1.20.0 has an issue with required > dependency or the lack thereof. This is causing this issue when using > datastore: > > {{Traceback (most recent call last):}} > {{ File "./query_license.py", line 11, in }} > {{ from google.cloud import datastore}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", > line 62, in }} > {{ from google.cloud.datastore.batch import Batch}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 24, in }} > {{ from google.cloud.datastore import helpers}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/helpers.py", > line 29, in }} > {{ from google.cloud.datastore_v1.proto import datastore_pb2}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/__init__.py", > line 18, in }} > {{ from google.cloud.datastore_v1.gapic import datastore_client}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", > line 22, in }} > {{ import google.api_core.gapic_v1.client_info}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/__init__.py", > line 26, in }} > {{ from google.api_core.gapic_v1 import method_async # noqa: F401}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/method_async.py", > line 20, in }} > {{ from google.api_core import general_helpers, grpc_helpers_async}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/grpc_helpers_async.py", > line 25, in }} > {{ from grpc.experimental import aio}} > {{ ImportError: cannot import name 'aio' from 'grpc.experimental' > (/root/apache-beam-custom/lib/python3.7/site-packages/grpc/experimental/__init__.py)}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10247) google-api-core 1.20.0 is incompatible with the pinned version of grpc
[ https://issues.apache.org/jira/browse/BEAM-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-10247: - Description: It looks like the google-api-core 1.20.0 has an issue with required dependency or the lack thereof. This is causing this issue when using datastore: {{Traceback (most recent call last):}} {{ File "./query_license.py", line 11, in }} {{ from google.cloud import datastore}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", line 62, in }} {{ from google.cloud.datastore.batch import Batch}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", line 24, in }} {{ from google.cloud.datastore import helpers}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/helpers.py", line 29, in }} {{ from google.cloud.datastore_v1.proto import datastore_pb2}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/__init__.py", line 18, in }} {{ from google.cloud.datastore_v1.gapic import datastore_client}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", line 22, in }} {{ import google.api_core.gapic_v1.client_info}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/__init__.py", line 26, in }} {{ from google.api_core.gapic_v1 import method_async # noqa: F401}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/method_async.py", line 20, in }} {{ from google.api_core import general_helpers, grpc_helpers_async}} {{ File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/grpc_helpers_async.py", line 25, in }} {{ from grpc.experimental import aio}} {{ ImportError: cannot import name 'aio' from 'grpc.experimental' (/root/apache-beam-custom/lib/python3.7/site-packages/grpc/experimental/__init__.py)}} was: It looks like the google-api-core 1.20.0 has an issue with required dependency or the lack thereof. This is causing this issue when using datastore: ``` {{Traceback (most recent call last): File "./query_license.py", line 11, in from google.cloud import datastore File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", line 62, in from google.cloud.datastore.batch import Batch File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", line 24, in from google.cloud.datastore import helpers File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/helpers.py", line 29, in from google.cloud.datastore_v1.proto import datastore_pb2 File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/__init__.py", line 18, in from google.cloud.datastore_v1.gapic import datastore_client File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", line 22, in import google.api_core.gapic_v1.client_info File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/__init__.py", line 26, in from google.api_core.gapic_v1 import method_async # noqa: F401 File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/method_async.py", line 20, in from google.api_core import general_helpers, grpc_helpers_async File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/grpc_helpers_async.py", line 25, in from grpc.experimental import aio ImportError: cannot import name 'aio' from 'grpc.experimental' (/root/apache-beam-custom/lib/python3.7/site-packages/grpc/experimental/__init__.py)}} {{```}} > google-api-core 1.20.0 is incompatible with the pinned version of grpc > -- > > Key: BEAM-10247 > URL: https://issues.apache.org/jira/browse/BEAM-10247 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Assignee: David Yan >Priority: P1 > > It looks like the google-api-core 1.20.0 has an issue with required > dependency or the lack thereof. This is causing this issue when using > datastore: > > {{Traceback (most recent call last):}} > {{ File "./query_license.py", line 11, in }} > {{ from google.cloud import datastore}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", > line 62, in }} > {{ from google.cloud.datastore.batch import Batch}} > {{ File > "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 24, in }} > {{ from google.cloud.datastore import helpers}} > {{ File > "/root/apache-beam-custom/lib/python3.
[jira] [Created] (BEAM-10247) google-api-core 1.20.0 is incompatible with the pinned version of grpc
David Yan created BEAM-10247: Summary: google-api-core 1.20.0 is incompatible with the pinned version of grpc Key: BEAM-10247 URL: https://issues.apache.org/jira/browse/BEAM-10247 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: David Yan Assignee: David Yan It looks like the google-api-core 1.20.0 has an issue with required dependency or the lack thereof. This is causing this issue when using datastore: ``` {{Traceback (most recent call last): File "./query_license.py", line 11, in from google.cloud import datastore File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/__init__.py", line 62, in from google.cloud.datastore.batch import Batch File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/batch.py", line 24, in from google.cloud.datastore import helpers File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore/helpers.py", line 29, in from google.cloud.datastore_v1.proto import datastore_pb2 File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/__init__.py", line 18, in from google.cloud.datastore_v1.gapic import datastore_client File "/root/apache-beam-custom/lib/python3.7/site-packages/google/cloud/datastore_v1/gapic/datastore_client.py", line 22, in import google.api_core.gapic_v1.client_info File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/__init__.py", line 26, in from google.api_core.gapic_v1 import method_async # noqa: F401 File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/gapic_v1/method_async.py", line 20, in from google.api_core import general_helpers, grpc_helpers_async File "/root/apache-beam-custom/lib/python3.7/site-packages/google/api_core/grpc_helpers_async.py", line 25, in from grpc.experimental import aio ImportError: cannot import name 'aio' from 'grpc.experimental' (/root/apache-beam-custom/lib/python3.7/site-packages/grpc/experimental/__init__.py)}} {{```}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8551) Beam Python containers should include all Beam SDK dependencies, and do not have conflicting dependencies
[ https://issues.apache.org/jira/browse/BEAM-8551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061206#comment-17061206 ] David Yan commented on BEAM-8551: - `pip check` is another way to check for broken dependencies. > Beam Python containers should include all Beam SDK dependencies, and do not > have conflicting dependencies > - > > Key: BEAM-8551 > URL: https://issues.apache.org/jira/browse/BEAM-8551 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Major > > Checks could be introduced during container creation, and be enforced by > ValidatesContainer test suites. We could: > - Check pip output or status code for incompatible dependency errors. > - Remove internet access when installing apache-beam in the container, to > makes sure all dependencies are installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9530) Add `pip check` to ensure good python dependencies
[ https://issues.apache.org/jira/browse/BEAM-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan closed BEAM-9530. --- Fix Version/s: Not applicable Resolution: Duplicate > Add `pip check` to ensure good python dependencies > -- > > Key: BEAM-9530 > URL: https://issues.apache.org/jira/browse/BEAM-9530 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: David Yan >Priority: Major > Fix For: Not applicable > > > We should add {{pip check}} after pip install in our tests to make sure there > is no incompatibility. {{pip install}} does not return an error exit code > for broken dependencies for historical reasons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with each other
[ https://issues.apache.org/jira/browse/BEAM-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061158#comment-17061158 ] David Yan commented on BEAM-9510: - Also related: BEAM-9530 > Dependencies in base_image_requirements.txt are not compatible with each other > -- > > Key: BEAM-9510 > URL: https://issues.apache.org/jira/browse/BEAM-9510 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Assignee: Hannah Jiang >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] > says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, > google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 > But they are incompatible with each other: > ERROR: google-cloud-bigquery 1.24.0 has requirement > google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: google-cloud-bigtable 0.32.1 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have > grpcio 1.22.0 which is incompatible. > ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", > but you'll have scipy 1.2.2 which is incompatible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9530) Add `pip check` to ensure good python dependencies
David Yan created BEAM-9530: --- Summary: Add `pip check` to ensure good python dependencies Key: BEAM-9530 URL: https://issues.apache.org/jira/browse/BEAM-9530 Project: Beam Issue Type: Improvement Components: sdk-py-harness Reporter: David Yan We should add {{pip check}} after pip install in our tests to make sure there is no incompatibility. {{pip install}} does not return an error exit code for broken dependencies for historical reasons. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with each other
[ https://issues.apache.org/jira/browse/BEAM-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-9510: Summary: Dependencies in base_image_requirements.txt are not compatible with each other (was: Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps) > Dependencies in base_image_requirements.txt are not compatible with each other > -- > > Key: BEAM-9510 > URL: https://issues.apache.org/jira/browse/BEAM-9510 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: David Yan >Priority: Major > > [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] > says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, > google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 > But they are incompatible with each other: > ERROR: google-cloud-bigquery 1.24.0 has requirement > google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: google-cloud-bigtable 0.32.1 has requirement > google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 > which is incompatible. > ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have > grpcio 1.22.0 which is incompatible. > ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", > but you'll have scipy 1.2.2 which is incompatible. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9510) Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps
David Yan created BEAM-9510: --- Summary: Dependencies in base_image_requirements.txt are not compatible with apache-beam pypi deps Key: BEAM-9510 URL: https://issues.apache.org/jira/browse/BEAM-9510 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: David Yan [https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt#L56] says it requires google-cloud-bigquery==1.24.0, google-cloud-core==1.0.2, google-cloud-bigtable==0.32.1, grpc-1.22.0 and tensorflow-2.1.0 But they are incompatible with each other: ERROR: google-cloud-bigquery 1.24.0 has requirement google-cloud-core<2.0dev,>=1.1.0, but you'll have google-cloud-core 1.0.2 which is incompatible. ERROR: google-cloud-bigtable 0.32.1 has requirement google-cloud-core<0.30dev,>=0.29.0, but you'll have google-cloud-core 1.0.2 which is incompatible. ERROR: tensorboard 2.1.1 has requirement grpcio>=1.24.3, but you'll have grpcio 1.22.0 which is incompatible. ERROR: tensorflow 2.1.0 has requirement scipy==1.4.1; python_version >= "3", but you'll have scipy 1.2.2 which is incompatible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9508) Python installation fails if grpc_tools is not installed
[ https://issues.apache.org/jira/browse/BEAM-9508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060427#comment-17060427 ] David Yan commented on BEAM-9508: - This is fixed by installing mypy-protobuf, which is not immediately obvious from the stacktrace. I'll leave this ticket open for a better error message. > Python installation fails if grpc_tools is not installed > > > Key: BEAM-9508 > URL: https://issues.apache.org/jira/browse/BEAM-9508 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: David Yan >Priority: Major > > When installing from master branch, I'm getting an exception below. Looks > like the ImportError exception handling throws an exception itself. I'll > manually install grpc_tools and try again but the handling of ImportError has > issues. > > ``` > Traceback (most recent call last): > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 292, > in generate_proto_files > from grpc_tools import protoc > ModuleNotFoundError: No module named 'grpc_tools' > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, > in _bootstrap > self.run() > File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in > run > self._target(*self._args, **self._kwargs) > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 378, > in _install_grpcio_tools_and_generate_proto_files > generate_proto_files(force=force) > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 315, > in generate_proto_files > protoc_gen_mypy = _find_protoc_gen_mypy() > File > "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 233, > in _find_protoc_gen_mypy > (fname, ', '.join(search_paths))) > RuntimeError: Could not find protoc-gen-mypy in > /root/apache-beam-custom/bin, /root/apache-beam-custom/bin, /usr/local/bin, > /opt/conda/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, > /bin > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9508) Python installation fails if grpc_tools is not installed
David Yan created BEAM-9508: --- Summary: Python installation fails if grpc_tools is not installed Key: BEAM-9508 URL: https://issues.apache.org/jira/browse/BEAM-9508 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: David Yan When installing from master branch, I'm getting an exception below. Looks like the ImportError exception handling throws an exception itself. I'll manually install grpc_tools and try again but the handling of ImportError has issues. ``` Traceback (most recent call last): File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 292, in generate_proto_files from grpc_tools import protoc ModuleNotFoundError: No module named 'grpc_tools' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 378, in _install_grpcio_tools_and_generate_proto_files generate_proto_files(force=force) File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 315, in generate_proto_files protoc_gen_mypy = _find_protoc_gen_mypy() File "/root/apache-beam-custom/packages/beam/sdks/python/gen_protos.py", line 233, in _find_protoc_gen_mypy (fname, ', '.join(search_paths))) RuntimeError: Could not find protoc-gen-mypy in /root/apache-beam-custom/bin, /root/apache-beam-custom/bin, /usr/local/bin, /opt/conda/bin, /usr/local/sbin, /usr/local/bin, /usr/sbin, /usr/bin, /sbin, /bin ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9487) GBKs on unbounded pcolls with global windows and no triggers should fail
[ https://issues.apache.org/jira/browse/BEAM-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-9487: Labels: EaseOfUse starter (was: starter) > GBKs on unbounded pcolls with global windows and no triggers should fail > > > Key: BEAM-9487 > URL: https://issues.apache.org/jira/browse/BEAM-9487 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Priority: Major > Labels: EaseOfUse, starter > > This, according to "4.2.2.1 GroupByKey and unbounded PCollections" in > https://beam.apache.org/documentation/programming-guide/. > bq. If you do apply GroupByKey or CoGroupByKey to a group of unbounded > PCollections without setting either a non-global windowing strategy, a > trigger strategy, or both for each collection, Beam generates an > IllegalStateException error at pipeline construction time. > Example where this doesn't happen in Python SDK: > https://stackoverflow.com/questions/60623246/merge-pcollection-with-apache-beam > I also believe that this unit test should fail, since test_stream is > unbounded, uses global window, and has no triggers. > {code} > def test_global_window_gbk_fail(self): > with TestPipeline() as p: > test_stream = TestStream() > _ = p | test_stream | GroupByKey() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-3453. - Fix Version/s: 2.20.0 Resolution: Fixed > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Assignee: David Yan >Priority: Major > Fix For: 2.20.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan reassigned BEAM-3453: --- Assignee: David Yan > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Assignee: David Yan >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3453) Allow usage of public Google PubSub topics in Python DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033859#comment-17033859 ] David Yan commented on BEAM-3453: - This is fixed by [GitHub Pull Request #10762|https://github.com/apache/beam/pull/10762]. > Allow usage of public Google PubSub topics in Python DirectRunner > - > > Key: BEAM-3453 > URL: https://issues.apache.org/jira/browse/BEAM-3453 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 2.2.0 >Reporter: Charles Chen >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Currently, the Beam Python DirectRunner does not allow the usage of data from > public Google Cloud PubSub topics. We should allow this functionality so > that users can more easily test Beam Python's streaming functionality. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline
David Yan created BEAM-8415: --- Summary: Improve error message when adding a PTransform with a name that already exists in the pipeline Key: BEAM-8415 URL: https://issues.apache.org/jira/browse/BEAM-8415 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: David Yan Currently, when trying to apply a PTransform with a name that already exists in the pipeline, it returns a confusing error: Transform "XXX" does not have a stable unique label. This will prevent updating of pipelines. To apply a transform with a specified label write pvalue | "label" >> transform We'd like to improve this error message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-7982) Dataflow runner needs to identify the new format of metric names for distribution metrics
David Yan created BEAM-7982: --- Summary: Dataflow runner needs to identify the new format of metric names for distribution metrics Key: BEAM-7982 URL: https://issues.apache.org/jira/browse/BEAM-7982 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan For example, [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/dataflow/dataflow_metrics.py#L157] uses [MAX], [MIN], etc. but the new format will be _MAX, _MIN, etc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7957) Warn at job submit time if a step is named with a / or empty in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-7957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7957: Summary: Warn at job submit time if a step is named with a / or empty in DataflowRunner (was: Warn users if a step is named with a / or empty in DataflowRunner) > Warn at job submit time if a step is named with a / or empty in DataflowRunner > -- > > Key: BEAM-7957 > URL: https://issues.apache.org/jira/browse/BEAM-7957 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: David Yan >Priority: Major > > When a job with an empty step name or a step name that has a "/" in it, it > quietly breaks the job graph in the Dataflow UI. We should at least warn the > user at job submit time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (BEAM-7957) Warn users if a step is named with a / or empty in DataflowRunner
David Yan created BEAM-7957: --- Summary: Warn users if a step is named with a / or empty in DataflowRunner Key: BEAM-7957 URL: https://issues.apache.org/jira/browse/BEAM-7957 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan When a job with an empty step name or a step name that has a "/" in it, it quietly breaks the job graph in the Dataflow UI. We should at least warn the user at job submit time. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-7876. - Resolution: Fixed Fix Version/s: 2.15.0 > Interactive Beam example does not work with Python3 > --- > > Key: BEAM-7876 > URL: https://issues.apache.org/jira/browse/BEAM-7876 > Project: Beam > Issue Type: Bug > Components: examples-python >Reporter: David Yan >Priority: Major > Fix For: 2.15.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > When going through the example > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] > using Jupyter Notebook running in Python 3, the run() method throws an error > the following error: > {{TypeError Traceback (most recent call last)}} > {{ in }} > {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} > {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} > {{> 5 result = p.run()}} > {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 404 self.to_runner_api(use_fake_coders=True),}} > {{ 405 self.runner,}} > {{--> 406 self._options).run(False)}} > {{ 407 }} > {{ 408 if > self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 417 finally:}} > {{ 418 shutil.rmtree(tmpdir)}} > {{--> 419 return self.runner.run_pipeline(self, self._options)}} > {{ 420 }} > {{ 421 def > __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py > in run_pipeline(self, pipeline, options)}} > {{ 142 cache_manager=self._cache_manager,}} > {{ 143 pipeline_graph_renderer=self._renderer)}} > {{--> 144 display.start_periodic_update()}} > {{ 145 result = pipeline_to_execute.run()}} > {{ 146 > result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in start_periodic_update(self)}} > {{ 158 def start_periodic_update(self):}} > {{ 159 """Start a thread that periodically updates the display."""}} > {{--> 160 self.update_display(True)}} > {{ 161 self._periodic_update = True}} > {{ > 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in update_display(self, force)}} > {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} > {{ 150 self._pipeline_graph)}} > {{--> 151 display.display(display.HTML(rendered_graph))}} > {{ 152 }} > {{ 153 > _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in __init__(self, data, url, filename, metadata)}} > {{ 691 return prefix.startswith("")}} > {{ 692 }} > {{--> 693 if warn():}} > {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} > {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, > metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in warn()}} > {{ 689 prefix = data[:10].lower()}} > {{ 690 suffix = data[-10:].lower()}} > {{--> 691 return prefix.startswith(" suffix.endswith("")}} > {{ 692 }} > {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple > of bytes, not str}} > > > > This does not happen with Python 2. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Status: Open (was: Triage Needed) > Interactive Beam example does not work with Python3 > --- > > Key: BEAM-7876 > URL: https://issues.apache.org/jira/browse/BEAM-7876 > Project: Beam > Issue Type: Bug > Components: examples-python >Reporter: David Yan >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > When going through the example > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] > using Jupyter Notebook running in Python 3, the run() method throws an error > the following error: > {{TypeError Traceback (most recent call last)}} > {{ in }} > {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} > {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} > {{> 5 result = p.run()}} > {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 404 self.to_runner_api(use_fake_coders=True),}} > {{ 405 self.runner,}} > {{--> 406 self._options).run(False)}} > {{ 407 }} > {{ 408 if > self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py > in run(self, test_runner_api)}} > {{ 417 finally:}} > {{ 418 shutil.rmtree(tmpdir)}} > {{--> 419 return self.runner.run_pipeline(self, self._options)}} > {{ 420 }} > {{ 421 def > __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py > in run_pipeline(self, pipeline, options)}} > {{ 142 cache_manager=self._cache_manager,}} > {{ 143 pipeline_graph_renderer=self._renderer)}} > {{--> 144 display.start_periodic_update()}} > {{ 145 result = pipeline_to_execute.run()}} > {{ 146 > result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in start_periodic_update(self)}} > {{ 158 def start_periodic_update(self):}} > {{ 159 """Start a thread that periodically updates the display."""}} > {{--> 160 self.update_display(True)}} > {{ 161 self._periodic_update = True}} > {{ > 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py > in update_display(self, force)}} > {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} > {{ 150 self._pipeline_graph)}} > {{--> 151 display.display(display.HTML(rendered_graph))}} > {{ 152 }} > {{ 153 > _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in __init__(self, data, url, filename, metadata)}} > {{ 691 return prefix.startswith("")}} > {{ 692 }} > {{--> 693 if warn():}} > {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} > {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, > metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py > in warn()}} > {{ 689 prefix = data[:10].lower()}} > {{ 690 suffix = data[-10:].lower()}} > {{--> 691 return prefix.startswith(" suffix.endswith("")}} > {{ 692 }} > {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple > of bytes, not str}} > > > > This does not happen with Python 2. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Description: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{> 5 result = p.run()}} {{ 6 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{--> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:}}{{~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{--> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):}}{{~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{--> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{--> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162}}{{~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{--> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692 }} {{--> 693 if warn():}} {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata)}}{{~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn()}} {{ 689 prefix = data[:10].lower()}} {{ 690 suffix = data[-10:].lower()}} {{--> 691 return prefix.startswith("")}} {{ 692 }} {{ 693 if warn():}}{{TypeError: startswith first arg must be bytes or a tuple of bytes, not str}} This does not happen with Python 2. was: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{ > 5 result = p.run()}} {{ 6 result.wait_until_finish()~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{ --> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{ --> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{ --> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{ --> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{ --> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692 }
[jira] [Updated] (BEAM-7876) Interactive Beam example does not work with Python3
[ https://issues.apache.org/jira/browse/BEAM-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan updated BEAM-7876: Description: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error the following error: {{TypeError Traceback (most recent call last)}} {{ in }} {{ 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)}} {{ 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3)}} {{ > 5 result = p.run()}} {{ 6 result.wait_until_finish()~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 404 self.to_runner_api(use_fake_coders=True),}} {{ 405 self.runner,}} {{ --> 406 self._options).run(False)}} {{ 407 }} {{ 408 if self._options.view_as(TypeOptions).runtime_type_check:~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api)}} {{ 417 finally:}} {{ 418 shutil.rmtree(tmpdir)}} {{ --> 419 return self.runner.run_pipeline(self, self._options)}} {{ 420 }} {{ 421 def __enter__(self):~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options)}} {{ 142 cache_manager=self._cache_manager,}} {{ 143 pipeline_graph_renderer=self._renderer)}} {{ --> 144 display.start_periodic_update()}} {{ 145 result = pipeline_to_execute.run()}} {{ 146 result.wait_until_finish()~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self)}} {{ 158 def start_periodic_update(self):}} {{ 159 """Start a thread that periodically updates the display."""}} {{ --> 160 self.update_display(True)}} {{ 161 self._periodic_update = True}} {{ 162~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force)}} {{ 149 rendered_graph = self._renderer.render_pipeline_graph(}} {{ 150 self._pipeline_graph)}} {{ --> 151 display.display(display.HTML(rendered_graph))}} {{ 152 }} {{ 153 _display_progress('Running...')~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata)}} {{ 691 return prefix.startswith("")}} {{ 692 }} {{ --> 693 if warn():}} {{ 694 warnings.warn("Consider using IPython.display.IFrame instead")}} {{ 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata)~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn()}} {{ 689 prefix = data[:10].lower()}} {{ 690 suffix = data[-10:].lower()}} {{ --> 691 return prefix.startswith("")}} {{ 692 }} {{ 693 if warn():TypeError: startswith first arg must be bytes or a tuple of bytes, not str }} This does not happen with Python 2. was: When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error: TypeError Traceback (most recent call last) in 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x) 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3) > 5 result = p.run() 6 result.wait_until_finish() ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 404 self.to_runner_api(use_fake_coders=True), 405 self.runner, --> 406 self._options).run(False) 407 408 if self._options.view_as(TypeOptions).runtime_type_check: ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 417 finally: 418 shutil.rmtree(tmpdir) --> 419 return self.runner.run_pipeline(self, self._options) 420 421 def __enter__(self): ~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options) 142 cache_manager=self._cache_manager, 143 pipeline_graph_renderer=self._renderer) --> 144 display.start_periodic_update() 145 result = pipeline_to_execute.run() 146 result.wait_until_finish() ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self) 158 def start_periodic_update(self): 159 """Start a thread that periodically updates the display.""" --> 160 self.update_display(True) 161 self._periodic_update = True 162 ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force) 149 rendered_graph = self._renderer.render_pipeline_graph( 150 self._pipeline_graph) --> 151 display.display(display.HTML(rendered_graph)) 152 153 _display_progress('Running...') ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata) 691 return prefix.startswith("") 692 --> 693 if warn(): 694 warnings.warn("Consider using IPython.display.IFrame instead") 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metada
[jira] [Created] (BEAM-7876) Interactive Beam example does not work with Python3
David Yan created BEAM-7876: --- Summary: Interactive Beam example does not work with Python3 Key: BEAM-7876 URL: https://issues.apache.org/jira/browse/BEAM-7876 Project: Beam Issue Type: Bug Components: examples-python Reporter: David Yan When going through the example [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] using Jupyter Notebook running in Python 3, the run() method throws an error: TypeError Traceback (most recent call last) in 3 squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x) 4 cubes = init_pcoll | 'Cube' >> beam.Map(lambda x: x**3) > 5 result = p.run() 6 result.wait_until_finish() ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 404 self.to_runner_api(use_fake_coders=True), 405 self.runner, --> 406 self._options).run(False) 407 408 if self._options.view_as(TypeOptions).runtime_type_check: ~/beam/sdks/python/apache_beam/pipeline.py in run(self, test_runner_api) 417 finally: 418 shutil.rmtree(tmpdir) --> 419 return self.runner.run_pipeline(self, self._options) 420 421 def __enter__(self): ~/beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py in run_pipeline(self, pipeline, options) 142 cache_manager=self._cache_manager, 143 pipeline_graph_renderer=self._renderer) --> 144 display.start_periodic_update() 145 result = pipeline_to_execute.run() 146 result.wait_until_finish() ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in start_periodic_update(self) 158 def start_periodic_update(self): 159 """Start a thread that periodically updates the display.""" --> 160 self.update_display(True) 161 self._periodic_update = True 162 ~/beam/sdks/python/apache_beam/runners/interactive/display/display_manager.py in update_display(self, force) 149 rendered_graph = self._renderer.render_pipeline_graph( 150 self._pipeline_graph) --> 151 display.display(display.HTML(rendered_graph)) 152 153 _display_progress('Running...') ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in __init__(self, data, url, filename, metadata) 691 return prefix.startswith("") 692 --> 693 if warn(): 694 warnings.warn("Consider using IPython.display.IFrame instead") 695 super(HTML, self).__init__(data=data, url=url, filename=filename, metadata=metadata) ~/beam/sdks/python/notebook3/lib/python3.6/site-packages/IPython/core/display.py in warn() 689 prefix = data[:10].lower() 690 suffix = data[-10:].lower() --> 691 return prefix.startswith("") 692 693 if warn(): TypeError: startswith first arg must be bytes or a tuple of bytes, not str -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (BEAM-7408) Beam Programming Guide inconsistencies
[ https://issues.apache.org/jira/browse/BEAM-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Yan resolved BEAM-7408. - Resolution: Fixed > Beam Programming Guide inconsistencies > -- > > Key: BEAM-7408 > URL: https://issues.apache.org/jira/browse/BEAM-7408 > Project: Beam > Issue Type: Improvement > Components: website >Affects Versions: Not applicable >Reporter: David Yan >Priority: Major > Labels: documentation, newbie > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > [https://beam.apache.org/documentation/programming-guide/] > > Pipeline option example: > > Examples in Java, Python and Go are not consistent. Java has myCustomOption, > while Python and Go have "input" and "output". > > When Python is chosen, the doc says --myCustomOption=value is supported, > which only corresponds to the java example. > > Reading from external source: > > Java, Python and Go are not consistent. Python example reads from a GCS file, > while others specify a generic file. > [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: > The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7408) Beam Programming Guide inconsistencies
[ https://issues.apache.org/jira/browse/BEAM-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857005#comment-16857005 ] David Yan commented on BEAM-7408: - Yes, thank you. :) > Beam Programming Guide inconsistencies > -- > > Key: BEAM-7408 > URL: https://issues.apache.org/jira/browse/BEAM-7408 > Project: Beam > Issue Type: Improvement > Components: website >Affects Versions: Not applicable >Reporter: David Yan >Priority: Major > Labels: documentation, newbie > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > [https://beam.apache.org/documentation/programming-guide/] > > Pipeline option example: > > Examples in Java, Python and Go are not consistent. Java has myCustomOption, > while Python and Go have "input" and "output". > > When Python is chosen, the doc says --myCustomOption=value is supported, > which only corresponds to the java example. > > Reading from external source: > > Java, Python and Go are not consistent. Python example reads from a GCS file, > while others specify a generic file. > [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: > The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7408) Beam Programming Guide inconsistencies
David Yan created BEAM-7408: --- Summary: Beam Programming Guide inconsistencies Key: BEAM-7408 URL: https://issues.apache.org/jira/browse/BEAM-7408 Project: Beam Issue Type: Improvement Components: website Reporter: David Yan [https://beam.apache.org/documentation/programming-guide/] Pipeline option example: Examples in Java, Python and Go are not consistent. Java has myCustomOption, while Python and Go have "input" and "output". When Python is chosen, the doc says --myCustomOption=value is supported, which only corresponds to the java example. Reading from external source: Java, Python and Go are not consistent. Python example reads from a GCS file, while others specify a generic file. [https://beam.apache.org/documentation/programming-guide/#applying-transforms]: The last workflow graph does not correspond to the code example. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7215) Wordcount example page does not tell the user to create the maven project using archetype
David Yan created BEAM-7215: --- Summary: Wordcount example page does not tell the user to create the maven project using archetype Key: BEAM-7215 URL: https://issues.apache.org/jira/browse/BEAM-7215 Project: Beam Issue Type: Improvement Components: website Reporter: David Yan [https://beam.apache.org/get-started/wordcount-example/#wordcount-example] does not have a link back to [https://beam.apache.org/get-started/quickstart-java/#get-the-wordcount-code]. If the user just follows the instructions in the first link (from a search engine let's say), they would get: {{$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--runner=DataflowRunner --gcpTempLocation=gs://clouddfe-test/staging-$USER --inputFile=gs://apache-beam-samples/shakespeare/* --output=gs://world-readable-mkcq69tkcu/$USER/result.txt" -Pdataflow-runner [INFO] Scanning for projects... [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 0.068 s [INFO] Finished at: 2019-05-02T13:32:15-07:00 [INFO] Final Memory: 23M/1948M [INFO] [WARNING] The requested profile "dataflow-runner" could not be activated because it does not exist. [ERROR] The goal you specified requires a project to execute but there is no POM in this directory (/usr/local/google/home/davidyan/beam). Please verify you invoked Maven from the correct directory. -> [Help 1]}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7020) Reduce the log severity of profiling agent discovery
David Yan created BEAM-7020: --- Summary: Reduce the log severity of profiling agent discovery Key: BEAM-7020 URL: https://issues.apache.org/jira/browse/BEAM-7020 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: David Yan Example: [https://github.com/apache/beam/blob/b953645ed6db837d24284d7fe1fe091e7309f821/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/profiler/ScopedProfiler.java#L138] These should not be at warning severity, even if the profiling agent is not present since it's in most cases users do not run their jobs with profiling. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6918) Github link requires login and example link is broken
David Yan created BEAM-6918: --- Summary: Github link requires login and example link is broken Key: BEAM-6918 URL: https://issues.apache.org/jira/browse/BEAM-6918 Project: Beam Issue Type: Improvement Components: examples-python Reporter: David Yan Two minor issues in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/README.md] 1. git clone g...@github.com:apache/beam.git requires the user to be logged in, while https://github.com/apache/beam does not. 2. Spaces in the example link need to be escaped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)