[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111786 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 14/Jun/18 05:28 Start Date: 14/Jun/18 05:28 Worklog Time Spent: 10m Work Description: jbonofre closed pull request #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/io/gcp/pubsub.py b/sdks/python/apache_beam/io/gcp/pubsub.py index e45dd23bfef..6db45bdbfa5 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub.py +++ b/sdks/python/apache_beam/io/gcp/pubsub.py @@ -108,7 +108,7 @@ class ReadFromPubSub(PTransform): # Implementation note: This ``PTransform`` is overridden by Directrunner. def __init__(self, topic=None, subscription=None, id_label=None, - with_attributes=False, timestamp_attribute=None): + timestamp_attribute=None): """Initializes ``ReadFromPubSub``. Args: @@ -118,12 +118,8 @@ def __init__(self, topic=None, subscription=None, id_label=None, deduplication of messages. If not provided, we cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. In this case, deduplication of the stream will be strictly best effort. - with_attributes: -True - output elements will be :class:`~PubsubMessage` objects. -False - output elements will be of type ``str`` (message payload only). timestamp_attribute: Message value to use as element timestamp. If None, uses message publishing time as the timestamp. -Note that this argument doesn't require with_attributes=True. Timestamp values should be in one of two formats: @@ -135,12 +131,13 @@ def __init__(self, topic=None, subscription=None, id_label=None, units smaller than milliseconds) may be ignored. """ super(ReadFromPubSub, self).__init__() -self.with_attributes = with_attributes +# TODO(BEAM-4536): Add with_attributes to kwargs once fixed. +self.with_attributes = False self._source = _PubSubSource( topic=topic, subscription=subscription, id_label=id_label, -with_attributes=with_attributes, +with_attributes=self.with_attributes, timestamp_attribute=timestamp_attribute) def expand(self, pvalue): @@ -174,8 +171,7 @@ def __init__(self, topic=None, subscription=None, id_label=None): def expand(self, pvalue): p = (pvalue.pipeline - | ReadFromPubSub(self.topic, self.subscription, self.id_label, - with_attributes=False) + | ReadFromPubSub(self.topic, self.subscription, self.id_label) | 'DecodeString' >> Map(lambda b: b.decode('utf-8'))) p.element_type = text_type return p diff --git a/sdks/python/apache_beam/io/gcp/pubsub_test.py b/sdks/python/apache_beam/io/gcp/pubsub_test.py index f987947d454..165c072abb1 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_test.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_test.py @@ -63,7 +63,7 @@ def test_expand_with_topic(self): p.options.view_as(StandardOptions).streaming = True pcoll = (p | ReadFromPubSub('projects/fakeprj/topics/a_topic', - None, 'a_label', with_attributes=False, + None, 'a_label', timestamp_attribute=None) | beam.Map(lambda x: x)) self.assertEqual(str, pcoll.element_type) @@ -87,7 +87,7 @@ def test_expand_with_subscription(self): pcoll = (p | ReadFromPubSub( None, 'projects/fakeprj/subscriptions/a_subscription', - 'a_label', with_attributes=False, timestamp_attribute=None) + 'a_label', timestamp_attribute=None) | beam.Map(lambda x: x)) self.assertEqual(str, pcoll.element_type) @@ -107,16 +107,17 @@ def test_expand_with_subscription(self): def test_expand_with_no_topic_or_subscription(self): with self.assertRaisesRegexp( ValueError, "Either a topic or subscription must be provided."): - ReadFromPubSub(None, None, 'a_label', with_attributes=False, + ReadFromPubSub(None, None, 'a_label', timestamp_attribute=None) def test_expand_with_both_topic_and_subscription(self): with self.assertRaisesRegexp( ValueError, "Only one of topic or subscription should be provided."): ReadFromPubSub('a_topic', 'a_subs
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111710 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 22:28 Start Date: 13/Jun/18 22:28 Worklog Time Spent: 10m Work Description: pabloem commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397100509 I've filed https://issues.apache.org/jira/projects/BEAM/issues/BEAM-4558 I marked it as a release blocker, but is it? @swegner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111710) Time Spent: 2h 40m (was: 2.5h) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 2h 40m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111685 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 21:56 Start Date: 13/Jun/18 21:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397100509 I've filed https://issues.apache.org/jira/projects/BEAM/issues/BEAM-4558 I marked it as a release blocker, but is it? @swegner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111685) Time Spent: 2.5h (was: 2h 20m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 2.5h > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111680 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 21:48 Start Date: 13/Jun/18 21:48 Worklog Time Spent: 10m Work Description: swegner commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397098734 FYI, the failing test [`org.apache.beam.runners.direct.portable.ReferenceRunnerTest.pipelineExecution`](https://github.com/apache/beam/blob/da22e1808ec372e526c5af7c8bf483e72baef1eb/runners/direct-java/src/test/java/org/apache/beam/runners/direct/portable/ReferenceRunnerTest.java), I recognize from other PRs as being flaky ([history](https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/6441/testReport/junit/org.apache.beam.runners.direct.portable/ReferenceRunnerTest/pipelineExecution/history/)). This PR seems unrelated and seems unlikely that it caused the regression. /cc @reuvenlax @tgroh as the owners for this test. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111680) Time Spent: 2h 20m (was: 2h 10m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 2h 20m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111674 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 21:37 Start Date: 13/Jun/18 21:37 Worklog Time Spent: 10m Work Description: pabloem commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397095959 I triggered a build on the release branch itself, and it failed: https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/6441/consoleFull This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111674) Time Spent: 2h 10m (was: 2h) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 2h 10m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111666&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111666 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 21:29 Start Date: 13/Jun/18 21:29 Worklog Time Spent: 10m Work Description: udim commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397093621 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111666) Time Spent: 2h (was: 1h 50m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 2h > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111649&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111649 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 20:47 Start Date: 13/Jun/18 20:47 Worklog Time Spent: 10m Work Description: udim commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397080770 @jbonofre `./gradlew :beam-runners-direct-java:test` was broken for me on my local machine. I did a `git pull --rebase` and that fixed it. I then pushed the commit again in the hopes that it'd fix the precommit somehow, however now https://github.com/apache/beam/pull/5611 has disabled running Java precommits. I've asked @swegner to revert https://github.com/apache/beam/pull/5611 so we can verify that Java precommits are not broken. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111649) Time Spent: 1h 50m (was: 1h 40m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h 50m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111637 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 20:22 Start Date: 13/Jun/18 20:22 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397073078 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111637) Time Spent: 1.5h (was: 1h 20m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1.5h > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111638 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 20:23 Start Date: 13/Jun/18 20:23 Worklog Time Spent: 10m Work Description: udim commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397073242 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111638) Time Spent: 1h 40m (was: 1.5h) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h 40m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111632 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 20:13 Start Date: 13/Jun/18 20:13 Worklog Time Spent: 10m Work Description: udim commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397070137 Looking at java breakage. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111632) Time Spent: 1h 20m (was: 1h 10m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h 20m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=111621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-111621 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 13/Jun/18 19:25 Start Date: 13/Jun/18 19:25 Worklog Time Spent: 10m Work Description: jbonofre commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-397056429 Same issue on this PR: `java.lang.IllegalStateException: sendHeaders has already been called`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 111621) Time Spent: 1h 10m (was: 1h) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h 10m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=73&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-73 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 18:02 Start Date: 12/Jun/18 18:02 Worklog Time Spent: 10m Work Description: udim commented on issue #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607#issuecomment-396680775 R: @jbonofre This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 73) Time Spent: 1h (was: 50m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 1h > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=69&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-69 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 17:56 Start Date: 12/Jun/18 17:56 Worklog Time Spent: 10m Work Description: udim opened a new pull request #5607: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5607 Cherrypick for 2.5.0 Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 69) Time Spent: 50m (was: 40m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 50m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=66&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-66 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 17:50 Start Date: 12/Jun/18 17:50 Worklog Time Spent: 10m Work Description: chamikaramj closed pull request #5605: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5605 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/python/apache_beam/io/gcp/pubsub.py b/sdks/python/apache_beam/io/gcp/pubsub.py index e45dd23bfef..6db45bdbfa5 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub.py +++ b/sdks/python/apache_beam/io/gcp/pubsub.py @@ -108,7 +108,7 @@ class ReadFromPubSub(PTransform): # Implementation note: This ``PTransform`` is overridden by Directrunner. def __init__(self, topic=None, subscription=None, id_label=None, - with_attributes=False, timestamp_attribute=None): + timestamp_attribute=None): """Initializes ``ReadFromPubSub``. Args: @@ -118,12 +118,8 @@ def __init__(self, topic=None, subscription=None, id_label=None, deduplication of messages. If not provided, we cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. In this case, deduplication of the stream will be strictly best effort. - with_attributes: -True - output elements will be :class:`~PubsubMessage` objects. -False - output elements will be of type ``str`` (message payload only). timestamp_attribute: Message value to use as element timestamp. If None, uses message publishing time as the timestamp. -Note that this argument doesn't require with_attributes=True. Timestamp values should be in one of two formats: @@ -135,12 +131,13 @@ def __init__(self, topic=None, subscription=None, id_label=None, units smaller than milliseconds) may be ignored. """ super(ReadFromPubSub, self).__init__() -self.with_attributes = with_attributes +# TODO(BEAM-4536): Add with_attributes to kwargs once fixed. +self.with_attributes = False self._source = _PubSubSource( topic=topic, subscription=subscription, id_label=id_label, -with_attributes=with_attributes, +with_attributes=self.with_attributes, timestamp_attribute=timestamp_attribute) def expand(self, pvalue): @@ -174,8 +171,7 @@ def __init__(self, topic=None, subscription=None, id_label=None): def expand(self, pvalue): p = (pvalue.pipeline - | ReadFromPubSub(self.topic, self.subscription, self.id_label, - with_attributes=False) + | ReadFromPubSub(self.topic, self.subscription, self.id_label) | 'DecodeString' >> Map(lambda b: b.decode('utf-8'))) p.element_type = text_type return p diff --git a/sdks/python/apache_beam/io/gcp/pubsub_test.py b/sdks/python/apache_beam/io/gcp/pubsub_test.py index f987947d454..165c072abb1 100644 --- a/sdks/python/apache_beam/io/gcp/pubsub_test.py +++ b/sdks/python/apache_beam/io/gcp/pubsub_test.py @@ -63,7 +63,7 @@ def test_expand_with_topic(self): p.options.view_as(StandardOptions).streaming = True pcoll = (p | ReadFromPubSub('projects/fakeprj/topics/a_topic', - None, 'a_label', with_attributes=False, + None, 'a_label', timestamp_attribute=None) | beam.Map(lambda x: x)) self.assertEqual(str, pcoll.element_type) @@ -87,7 +87,7 @@ def test_expand_with_subscription(self): pcoll = (p | ReadFromPubSub( None, 'projects/fakeprj/subscriptions/a_subscription', - 'a_label', with_attributes=False, timestamp_attribute=None) + 'a_label', timestamp_attribute=None) | beam.Map(lambda x: x)) self.assertEqual(str, pcoll.element_type) @@ -107,16 +107,17 @@ def test_expand_with_subscription(self): def test_expand_with_no_topic_or_subscription(self): with self.assertRaisesRegexp( ValueError, "Either a topic or subscription must be provided."): - ReadFromPubSub(None, None, 'a_label', with_attributes=False, + ReadFromPubSub(None, None, 'a_label', timestamp_attribute=None) def test_expand_with_both_topic_and_subscription(self): with self.assertRaisesRegexp( ValueError, "Only one of topic or subscription should be provided."): ReadFromPubSub('a_topic', 'a_s
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=57&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-57 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 17:28 Start Date: 12/Jun/18 17:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #5605: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5605#issuecomment-396670482 LGTM. Waiting for tests to pass to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 57) Time Spent: 0.5h (was: 20m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=48&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-48 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 17:23 Start Date: 12/Jun/18 17:23 Worklog Time Spent: 10m Work Description: udim commented on issue #5605: [BEAM-4536] Remove with_attributes keyword from ReadFromPubSub. URL: https://github.com/apache/beam/pull/5605#issuecomment-396668846 R: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 48) Time Spent: 20m (was: 10m) > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4536) Python SDK: Pubsub reading with_attributes broken for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-4536?focusedWorklogId=38&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-38 ] ASF GitHub Bot logged work on BEAM-4536: Author: ASF GitHub Bot Created on: 12/Jun/18 17:08 Start Date: 12/Jun/18 17:08 Worklog Time Spent: 10m Work Description: udim opened a new pull request #5605: [BEAM-4536] Remove with_attributes keyword from ReadFromPubsub. URL: https://github.com/apache/beam/pull/5605 BEAM-4536: with_attributes is broken for Dataflow. This commit removes the feature for the 2.5.0 release of Beam. Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). It will help us expedite review of your Pull Request if you tag someone (e.g. `@username`) to look at it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 38) Time Spent: 10m Remaining Estimate: 0h > Python SDK: Pubsub reading with_attributes broken for Dataflow > -- > > Key: BEAM-4536 > URL: https://issues.apache.org/jira/browse/BEAM-4536 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.5.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Time Spent: 10m > Remaining Estimate: 0h > > Using > [ReadFromPubsub|https://github.com/apache/beam/blob/e30e0c807321934e862358e1e3be32dc74374aeb/sdks/python/apache_beam/io/gcp/pubsub.py#L106](with_attributes=True) > will fail on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)