[jira] [Work logged] (BEAM-6557) Add IPython notebooks for quickstarts and custom I/O
[ https://issues.apache.org/jira/browse/BEAM-6557?focusedWorklogId=230568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230568 ] ASF GitHub Bot logged work on BEAM-6557: Author: ASF GitHub Bot Created on: 22/Apr/19 02:35 Start Date: 22/Apr/19 02:35 Worklog Time Spent: 10m Work Description: melap commented on issue #7679: [BEAM-6557] Adds an interactive "Try Apache Beam" page URL: https://github.com/apache/beam/pull/7679#issuecomment-485306953 I made some minor edit suggestions and added a language switcher in a couple commits -- PTAL. The content LGTM. I'm not familiar with the gradle build files though, so not a good person to review those. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230568) Time Spent: 3h 40m (was: 3.5h) > Add IPython notebooks for quickstarts and custom I/O > > > Key: BEAM-6557 > URL: https://issues.apache.org/jira/browse/BEAM-6557 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: David Cavazos >Assignee: David Cavazos >Priority: Minor > Labels: triaged > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5775) Make the spark runner not serialize data unless spark is spilling to disk
[ https://issues.apache.org/jira/browse/BEAM-5775?focusedWorklogId=230558=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230558 ] ASF GitHub Bot logged work on BEAM-5775: Author: ASF GitHub Bot Created on: 22/Apr/19 00:19 Start Date: 22/Apr/19 00:19 Worklog Time Spent: 10m Work Description: mikekap commented on pull request #8371: [BEAM-5775] Move (most) of the batch spark pipelines' transformations to using lazy serialization. URL: https://github.com/apache/beam/pull/8371 This avoids unnecessary serialization. For example, if a groupByKey is happening & part of the shuffle ends up on the current worker, we'll avoid the unnecessary serialize/deserialize cycle. The main actual change in this PR (other than replacing `byte[]` with `ValueAndCoderSerializable`) is in `GroupNonMergingWindowsFunctions`. The semantics are slightly different in that we defer to Spark's serializer to serialize the values. This allows the previous optimization to keep working in a lazy way - if there are a lot of windows for a single value, Spark *should* serialize them only once since it's the same reference. In case Kryo is being used, the option `spark.kryo.referenceTracking` controls this behavior & defaults to true. For Java serialization, it's the only behavior available. I didn't touch spark streaming in this PR because I'm not sure how to address the backwards compatibility problem. Any thoughts there? R: @iemejia Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-5822) Vendor bytebuddy
[ https://issues.apache.org/jira/browse/BEAM-5822?focusedWorklogId=230534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230534 ] ASF GitHub Bot logged work on BEAM-5822: Author: ASF GitHub Bot Created on: 21/Apr/19 21:09 Start Date: 21/Apr/19 21:09 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8357: [BEAM-5822] Vendor bytebuddy URL: https://github.com/apache/beam/pull/8357#issuecomment-485282514 There are not very good instructions. We have only published a few vendored artifacts. You can read the mailing lists for the discussions and votes. That is the best I know of. What you will need to do is check in the vendored project first in a separate pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230534) Time Spent: 1h (was: 50m) > Vendor bytebuddy > > > Key: BEAM-5822 > URL: https://issues.apache.org/jira/browse/BEAM-5822 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3489) Expose the message id of received messages within PubsubMessage
[ https://issues.apache.org/jira/browse/BEAM-3489?focusedWorklogId=230531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230531 ] ASF GitHub Bot logged work on BEAM-3489: Author: ASF GitHub Bot Created on: 21/Apr/19 20:18 Start Date: 21/Apr/19 20:18 Worklog Time Spent: 10m Work Description: thinhha commented on issue #8370: [BEAM-3489] add PubSub messageId in PubsubMessage URL: https://github.com/apache/beam/pull/8370#issuecomment-485279607 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230531) Time Spent: 0.5h (was: 20m) > Expose the message id of received messages within PubsubMessage > --- > > Key: BEAM-3489 > URL: https://issues.apache.org/jira/browse/BEAM-3489 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Thinh Ha >Priority: Minor > Labels: newbie, starter > Time Spent: 0.5h > Remaining Estimate: 0h > > This task is about passing forward the message id from the pubsub proto to > the java PubsubMessage. > Add a message id field to PubsubMessage. > Update the coder for PubsubMessage to encode the message id. > Update the translation from the Pubsub proto message to the Dataflow message: > https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5795) Can SQL Query 5 be simplified?
[ https://issues.apache.org/jira/browse/BEAM-5795?focusedWorklogId=230530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230530 ] ASF GitHub Bot logged work on BEAM-5795: Author: ASF GitHub Bot Created on: 21/Apr/19 19:54 Start Date: 21/Apr/19 19:54 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #6757: [BEAM-5795] Simplify SQL Query 5 URL: https://github.com/apache/beam/pull/6757#issuecomment-485278143 Looks like the plan for the current version even results in a join with a condition that is non-equijoin, which we don't support. So there's more to it. Probably the plan for the revised version is the same, but not necessarily. So that would be another benefit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230530) Time Spent: 1h 10m (was: 1h) > Can SQL Query 5 be simplified? > -- > > Key: BEAM-5795 > URL: https://issues.apache.org/jira/browse/BEAM-5795 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > > The original CQL query uses the ALL operator over the set of rows that are > within a certain period from the watermark. We instead have a fancy join, due > to windowing. Nonetheless, can this be simplified? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7127) Timer parameter not supported in timer callback
[ https://issues.apache.org/jira/browse/BEAM-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822730#comment-16822730 ] Thomas Weise commented on BEAM-7127: {code:java} File "/Users/tweise/src/beam/sdks/python/apache_beam/examples/flink/flink_state.py", line 78, in run | 'statefulCount' >> beam.ParDo(StateTimerFn()) File "apache_beam/transforms/core.py", line 979, in __init__ super(ParDo, self).__init__(fn, *args, **kwargs) File "apache_beam/transforms/ptransform.py", line 689, in __init__ self.fn = pickler.loads(pickler.dumps(self.fn)) File "apache_beam/internal/pickler.py", line 230, in dumps s = dill.dumps(o) File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/dill/_dill.py", line 294, in dumps dump(obj, file, protocol, byref, fmode, recurse)#, strictio) File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/dill/_dill.py", line 287, in dump pik.dump(obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 331, in save self.save_reduce(obj=obj, *rv) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 396, in save_reduce save(cls) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "apache_beam/internal/pickler.py", line 107, in wrapper return fun(pickler, obj) File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/dill/_dill.py", line 1315, in save_type obj.__bases__, _dict), obj=obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 401, in save_reduce save(args) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 562, in save_tuple save(element) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "apache_beam/internal/pickler.py", line 198, in new_save_module_dict return old_save_module_dict(pickler, obj) File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/dill/_dill.py", line 902, in save_module_dict StockPickler.save_dict(pickler, obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 681, in _batch_setitems save(v) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/Users/tweise/python-ve/beam/lib/python2.7/site-packages/dill/_dill.py", line 1394, in save_function obj.__dict__), obj=obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 405, in save_reduce self.memoize(obj) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 244, in memoize assert id(obj) not in self.memo AssertionError{code} > Timer parameter not supported in timer callback > --- > > Key: BEAM-7127 > URL: https://issues.apache.org/jira/browse/BEAM-7127 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Priority: Major > Labels: portability > > Referencing the timer in its on_timer callback produces a recursive pickle > error. > {code:java} > @userstate.on_timer(timer_spec) > def process_timer(self, timer_1=beam.DoFn.TimerParam(timer_spec)): > {code} > Unit test: > [https://github.com/apache/beam/blob/cbe4dfbdbe5d0da5152568853ee5e17334dd1b54/sdks/python/apache_beam/transforms/userstate_test.py#L67] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7127) Timer parameter not supported in timer callback
Thomas Weise created BEAM-7127: -- Summary: Timer parameter not supported in timer callback Key: BEAM-7127 URL: https://issues.apache.org/jira/browse/BEAM-7127 Project: Beam Issue Type: Bug Components: sdk-py-harness Affects Versions: 2.12.0 Reporter: Thomas Weise Referencing the timer in its on_timer callback produces a recursive pickle error. {code:java} @userstate.on_timer(timer_spec) def process_timer(self, timer_1=beam.DoFn.TimerParam(timer_spec)): {code} Unit test: [https://github.com/apache/beam/blob/cbe4dfbdbe5d0da5152568853ee5e17334dd1b54/sdks/python/apache_beam/transforms/userstate_test.py#L67] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3489) Expose the message id of received messages within PubsubMessage
[ https://issues.apache.org/jira/browse/BEAM-3489?focusedWorklogId=230519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230519 ] ASF GitHub Bot logged work on BEAM-3489: Author: ASF GitHub Bot Created on: 21/Apr/19 15:16 Start Date: 21/Apr/19 15:16 Worklog Time Spent: 10m Work Description: thinhha commented on issue #8370: [BEAM-3489] add PubSub messageId in PubsubMessage URL: https://github.com/apache/beam/pull/8370#issuecomment-485259621 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230519) Time Spent: 20m (was: 10m) > Expose the message id of received messages within PubsubMessage > --- > > Key: BEAM-3489 > URL: https://issues.apache.org/jira/browse/BEAM-3489 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Thinh Ha >Priority: Minor > Labels: newbie, starter > Time Spent: 20m > Remaining Estimate: 0h > > This task is about passing forward the message id from the pubsub proto to > the java PubsubMessage. > Add a message id field to PubsubMessage. > Update the coder for PubsubMessage to encode the message id. > Update the translation from the Pubsub proto message to the Dataflow message: > https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3489) Expose the message id of received messages within PubsubMessage
[ https://issues.apache.org/jira/browse/BEAM-3489?focusedWorklogId=230518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230518 ] ASF GitHub Bot logged work on BEAM-3489: Author: ASF GitHub Bot Created on: 21/Apr/19 15:11 Start Date: 21/Apr/19 15:11 Worklog Time Spent: 10m Work Description: thinhha commented on pull request #8370: [BEAM-3489] add PubSub messageId in PubsubMessage URL: https://github.com/apache/beam/pull/8370 Make the `recordId` field from `PubsubClient.IncomingMessage` available as a new field called `messageId` in `PubsubMessage`. Updated the coder for `PubsubMessage` to encode and decode the `messageId`. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=230514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230514 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 21/Apr/19 13:59 Start Date: 21/Apr/19 13:59 Worklog Time Spent: 10m Work Description: tweise commented on issue #8351: [BEAM-7112] [flink] Defer state cleanup till bundle completion URL: https://github.com/apache/beam/pull/8351#issuecomment-485253923 @mxm thanks for the expedited review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230514) Time Spent: 3h 40m (was: 3.5h) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Time Spent: 3h 40m > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=230513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230513 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 21/Apr/19 13:58 Start Date: 21/Apr/19 13:58 Worklog Time Spent: 10m Work Description: tweise commented on pull request #8351: [BEAM-7112] [flink] Defer state cleanup till bundle completion URL: https://github.com/apache/beam/pull/8351#discussion_r277169275 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -400,8 +398,35 @@ private void setTimer(WindowedValue timerElement, TimerInternals.TimerDa } } + @SuppressWarnings("ByteBufferBackingArray") + private void fireCleanupTimers() { +while (!cleanupTimers.isEmpty()) { + InternalTimer timer = cleanupTimers.remove(); + final ByteBuffer encodedKey = (ByteBuffer) timer.getKey(); + try { +// still need to process as timer, see CleanupTimer +stateBackendLock.lock(); Review comment: Probably, it would also require to first check that there is at least one timer before calling this method to not take the lock unnecessarily. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230513) Time Spent: 3.5h (was: 3h 20m) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Time Spent: 3.5h > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=230512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230512 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 21/Apr/19 13:56 Start Date: 21/Apr/19 13:56 Worklog Time Spent: 10m Work Description: tweise commented on pull request #8351: [BEAM-7112] [flink] Defer state cleanup till bundle completion URL: https://github.com/apache/beam/pull/8351 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230512) Time Spent: 3h 20m (was: 3h 10m) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Time Spent: 3h 20m > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3489) Expose the message id of received messages within PubsubMessage
[ https://issues.apache.org/jira/browse/BEAM-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thinh Ha updated BEAM-3489: --- Attachment: index.html > Expose the message id of received messages within PubsubMessage > --- > > Key: BEAM-3489 > URL: https://issues.apache.org/jira/browse/BEAM-3489 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Luke Cwik >Assignee: Ahmed El.Hussaini >Priority: Minor > Labels: newbie, starter > Attachments: index.html > > > This task is about passing forward the message id from the pubsub proto to > the java PubsubMessage. > Add a message id field to PubsubMessage. > Update the coder for PubsubMessage to encode the message id. > Update the translation from the Pubsub proto message to the Dataflow message: > https://github.com/apache/beam/blob/2e275264b21db45787833502e5e42907b05e28b8/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubUnboundedSource.java#L976 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7112) State cleanup interferes with user timer callback
[ https://issues.apache.org/jira/browse/BEAM-7112?focusedWorklogId=230509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230509 ] ASF GitHub Bot logged work on BEAM-7112: Author: ASF GitHub Bot Created on: 21/Apr/19 12:06 Start Date: 21/Apr/19 12:06 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8351: [BEAM-7112] [flink] Defer state cleanup till bundle completion URL: https://github.com/apache/beam/pull/8351#discussion_r277164943 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java ## @@ -400,8 +398,35 @@ private void setTimer(WindowedValue timerElement, TimerInternals.TimerDa } } + @SuppressWarnings("ByteBufferBackingArray") + private void fireCleanupTimers() { +while (!cleanupTimers.isEmpty()) { + InternalTimer timer = cleanupTimers.remove(); + final ByteBuffer encodedKey = (ByteBuffer) timer.getKey(); + try { +// still need to process as timer, see CleanupTimer +stateBackendLock.lock(); Review comment: Consider only locking once before processing all the timers, but probably negligible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 230509) Time Spent: 3h 10m (was: 3h) > State cleanup interferes with user timer callback > - > > Key: BEAM-7112 > URL: https://issues.apache.org/jira/browse/BEAM-7112 > Project: Beam > Issue Type: Bug > Components: runner-flink >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability-flink > Time Spent: 3h 10m > Remaining Estimate: 0h > > Cleanup timers and user timers are fired at the watermark. Processing of > timers in the SDK worker is asynchronous, so it is possible that the state is > already removed when the user timer callback executes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)