[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 already resolved by https://github.com/apache/spark/pull/21746 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 Hi @tdas sorry for delay. My email for github account: chorn4...@gmail.com This looks fine to me, we can close this PR (and jira ticket) when yours is merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21676 ping ^^^ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21676 hey @c-horn , I am ready to merge your PR, and to add you as coauthor i think i need to know your email address i the github account. Can you provide me that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 @tdas I merged your changes into my branch, test passed, thank you ð --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21676 Here is my solution based on my suggestion - https://github.com/apache/spark/pull/21746 I stole your unit test from this PR :) Thank you! I will add you as a co-author in that PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21676 The offset log contains the watermark value that is going to be used in the batch corresponding to that offset. For example, "checkpoint/offsets/10" will contain the watermark value to be used for batch 10. The problem is that when batch 10 completes and new watermark values is computed, it is not saved in a persistent location until batch 11 is planned and "offsets/11" is written out. In trigger.once, this never happens as the query is terminated as soon as batch 10 completes. So the new watermark value is not saved. If the query running in trigger.once mode right from the beginning, that is batch 0, then no new watermark value is ever written, and so the watermark shows up always as 0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 I was under the assumption that the offset log contained this data? https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L32 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L81 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L260 It does not seem to pull the correct data, however. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/21676 I think the right solution is to record the updateat watermark in the commit log, so that the updated watermark can be read back from the commit log next time the stream is started. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 Changing `OneTimeExecutor` like this resolves this issue: ``` case class OneTimeExecutor() extends TriggerExecutor { /** * Execute a single batch using `batchRunner`. */ - override def execute(batchRunner: () => Boolean): Unit = batchRunner() + override def execute(batchRunner: () => Boolean): Unit = batchRunner() && batchRunner() } ``` ... but the type becomes semantically incorrect. Is this an acceptable solution? it appears that a lot of the `MicroBatchExecution` code makes assumptions about state from the previous batch, which may or may not be realized in the first iteration of a stream restart. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21676 @tdas @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org