[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=82484=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82484 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 20/Mar/18 21:18 Start Date: 20/Mar/18 21:18 Worklog Time Spent: 10m Work Description: robertwb closed pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/game/UserScore.java b/examples/java/src/main/java/org/apache/beam/examples/complete/game/UserScore.java index 8205baf35e3..0ebd744e0cf 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/game/UserScore.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/game/UserScore.java @@ -128,6 +128,7 @@ public Long getTimestamp() { @ProcessElement public void processElement(ProcessContext c) { + System.out.println("GOT " + c.element()); String[] components = c.element().split(","); try { String user = components[0].trim(); diff --git a/examples/java/src/main/java/org/apache/beam/examples/complete/game/injector/InjectorUtils.java b/examples/java/src/main/java/org/apache/beam/examples/complete/game/injector/InjectorUtils.java index 1667f3a4691..6bad1cbe616 100644 --- a/examples/java/src/main/java/org/apache/beam/examples/complete/game/injector/InjectorUtils.java +++ b/examples/java/src/main/java/org/apache/beam/examples/complete/game/injector/InjectorUtils.java @@ -86,6 +86,7 @@ public static String getFullyQualifiedTopicName( */ public static void createTopic(Pubsub client, String fullTopicName) throws IOException { +System.out.println("fullTopicName " + fullTopicName); try { client.projects().topics().get(fullTopicName).execute(); } catch (GoogleJsonResponseException e) { diff --git a/sdks/python/apache_beam/testing/data/trigger_transcripts.yaml b/sdks/python/apache_beam/testing/data/trigger_transcripts.yaml index a736e943ac5..1244736c1ce 100644 --- a/sdks/python/apache_beam/testing/data/trigger_transcripts.yaml +++ b/sdks/python/apache_beam/testing/data/trigger_transcripts.yaml @@ -121,7 +121,7 @@ transcript: - input: [4]# no output - input: [5] - expect: -- {window: [1, 14], values: [1, 2, 3, 4, 5], timestamp: 14, early: true} +- {window: [1, 14], values: [1, 2, 3, 4, 5], timestamp: 15, early: true} - input: [6] - watermark: 100 - expect: diff --git a/sdks/python/apache_beam/transforms/trigger.py b/sdks/python/apache_beam/transforms/trigger.py index 7d03240f941..d47c740d0e1 100644 --- a/sdks/python/apache_beam/transforms/trigger.py +++ b/sdks/python/apache_beam/transforms/trigger.py @@ -1083,6 +1083,7 @@ def merge(_, to_be_merged, merge_result): # pylint: disable=no-self-argument for unused_value, timestamp in elements) if element_output_time >= output_watermark)) if output_time is not None: +state.clear_state(window, self.WATERMARK_HOLD) state.add_state(window, self.WATERMARK_HOLD, output_time) context = state.at(window, self.clock) diff --git a/sdks/python/apache_beam/transforms/trigger_test.py b/sdks/python/apache_beam/transforms/trigger_test.py index 2247b01..a765493ae81 100644 --- a/sdks/python/apache_beam/transforms/trigger_test.py +++ b/sdks/python/apache_beam/transforms/trigger_test.py @@ -65,10 +65,21 @@ class TriggerTest(unittest.TestCase): def run_trigger_simple(self, window_fn, trigger_fn, accumulation_mode, timestamped_data, expected_panes, *groupings, **kwargs): +# Groupings is a list of integers indicating the (uniform) size of bundles +# to try. For example, if timestamped_data has elements [a, b, c, d, e] +# then groupings=(5, 2) would first run the test with everything in the same +# bundle, and then re-run the test with bundling [a, b], [c, d], [e]. +# A negative value will reverse the order, e.g. -2 would result in bundles +# [e, d], [c, b], [a]. This is useful for deterministic triggers in testing +# that the output is not a function of ordering or bundling. +# If empty, defaults to bundles of size 1 in the given order. late_data = kwargs.pop('late_data', []) assert not kwargs def bundle_data(data, size): + if size < 0: +data = list(data)[::-1] +size = -size bundle = [] for timestamp, elem in data: windows =
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=82017=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-82017 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 19/Mar/18 20:27 Start Date: 19/Mar/18 20:27 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175573879 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -65,10 +65,21 @@ class TriggerTest(unittest.TestCase): def run_trigger_simple(self, window_fn, trigger_fn, accumulation_mode, timestamped_data, expected_panes, *groupings, **kwargs): +# Groupings is a list of integers indicating the (uniform) size of bundles +# to try. For example, if timestamped_data is has elements [a, b, c, d, e] Review comment: "is has" This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 82017) Time Spent: 1h 40m (was: 1.5h) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81991 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 19/Mar/18 19:50 Start Date: 19/Mar/18 19:50 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175563315 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: > **robertwb** wrote: > Oh, yes, nice eye! > > And fixing this makes some of the non-deterministic ones fail... Enabling only explicitly on some tests. Thanks! Could you also add a comment at the beginning of this method on the behavior of `groupings`? It is not obvious without a close reading. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81991) Time Spent: 1.5h (was: 1h 20m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81909 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 19/Mar/18 16:09 Start Date: 19/Mar/18 16:09 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175492279 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: Oh, yes, nice eye! And fixing this makes some of the non-deterministic ones fail... Enabling only explicitly on some tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81909) Time Spent: 1h 20m (was: 1h 10m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81525=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81525 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 17/Mar/18 07:49 Start Date: 17/Mar/18 07:49 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175250988 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: What I mean is that the variable `grouping` (without "s" at the end) is never used in the rest of the function. Is this a typo? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81525) Time Spent: 1h 10m (was: 1h) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81524 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 17/Mar/18 07:49 Start Date: 17/Mar/18 07:49 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175250980 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: What I mean is that the variable `grouping` (without "s" at the end) is never used in the rest of the function. Is this a typo? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81524) Time Spent: 1h (was: 50m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81523 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 17/Mar/18 07:49 Start Date: 17/Mar/18 07:49 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175250980 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: What I mean is that the variable `grouping` (without "s" at the end) is never used in the rest of the function. Is this a typo? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81523) Time Spent: 50m (was: 40m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81510 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 17/Mar/18 05:28 Start Date: 17/Mar/18 05:28 Worklog Time Spent: 10m Work Description: robertwb commented on issue #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#issuecomment-373895826 Jenkins: retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81510) Time Spent: 40m (was: 0.5h) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81509 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 17/Mar/18 05:27 Start Date: 17/Mar/18 05:27 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175248717 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: It expands grouping by also adding negative values, which also tests adding elements in reverse order. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81509) Time Spent: 0.5h (was: 20m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81435 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 16/Mar/18 23:37 Start Date: 16/Mar/18 23:37 Worklog Time Spent: 10m Work Description: charlesccychen commented on a change in pull request #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#discussion_r175236867 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -81,16 +84,8 @@ def bundle_data(data, size): if not groupings: groupings = [1] +grouping = list(groupings) + [-group_by for group_by in groupings] Review comment: What is this? This is currently a no-op. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81435) Time Spent: 20m (was: 10m) > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3865) Incorrect timestamp on merging window outputs.
[ https://issues.apache.org/jira/browse/BEAM-3865?focusedWorklogId=81341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-81341 ] ASF GitHub Bot logged work on BEAM-3865: Author: ASF GitHub Bot Created on: 16/Mar/18 20:12 Start Date: 16/Mar/18 20:12 Worklog Time Spent: 10m Work Description: robertwb commented on issue #4879: [BEAM-3865] Fix watermark hold handling bug. URL: https://github.com/apache/beam/pull/4879#issuecomment-373831533 R: @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 81341) Time Spent: 10m Remaining Estimate: 0h > Incorrect timestamp on merging window outputs. > -- > > Key: BEAM-3865 > URL: https://issues.apache.org/jira/browse/BEAM-3865 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Robert Bradshaw >Assignee: Ahmet Altay >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Looks like we're setting multiple watermark holds with one arbitrarily being > held. -- This message was sent by Atlassian JIRA (v7.6.3#76005)