[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-23 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
already resolved by https://github.com/apache/spark/pull/21746


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-23 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
Hi @tdas sorry for delay.
My email for github account: chorn4...@gmail.com

This looks fine to me, we can close this PR (and jira ticket) when yours is 
merged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-23 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21676
  
ping ^^^


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-20 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21676
  
hey @c-horn , I am ready to merge your PR, and to add you as coauthor i 
think i need to know your email address i the github account. Can you provide 
me that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-11 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
@tdas I merged your changes into my branch, test passed, thank you 👍 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-11 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21676
  
Here is my solution based on my suggestion - 
https://github.com/apache/spark/pull/21746
I stole your unit test from this PR :) Thank you! I will add you as a 
co-author in that PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-10 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21676
  
The offset log contains the watermark value that is going to be used in the 
batch corresponding to that offset. For example, "checkpoint/offsets/10" will 
contain the watermark value to be used for batch 10. The problem is that when 
batch 10 completes and new watermark values is computed, it is not saved in a 
persistent location until batch 11 is planned and "offsets/11" is written out. 
In trigger.once, this never happens as the query is terminated as soon as batch 
10 completes. So the new watermark value is not saved. If the query running in 
trigger.once mode right from the beginning, that is batch 0, then no new 
watermark value is ever written, and so the watermark shows  up always as 0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-06 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
I was under the assumption that the offset log contained this data?


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L32

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/OffsetSeq.scala#L81

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L260

It does not seem to pull the correct data, however.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-05 Thread tdas
Github user tdas commented on the issue:

https://github.com/apache/spark/pull/21676
  
I think the right solution is to record the updateat watermark in the 
commit log, so that the updated watermark can be read back from the commit log 
next time the stream is started.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-07-02 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
Changing `OneTimeExecutor` like this resolves this issue:
```
case class OneTimeExecutor() extends TriggerExecutor {

  /**
   * Execute a single batch using `batchRunner`.
   */
-  override def execute(batchRunner: () => Boolean): Unit = batchRunner()
+  override def execute(batchRunner: () => Boolean): Unit = batchRunner() 
&& batchRunner()
}
```
... but the type becomes semantically incorrect.

Is this an acceptable solution? it appears that a lot of the 
`MicroBatchExecution` code makes assumptions about state from the previous 
batch, which may or may not be realized in the first iteration of a stream 
restart.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21676: [SPARK-24699][SS][WIP] Watermark / Append mode should wo...

2018-06-29 Thread c-horn
Github user c-horn commented on the issue:

https://github.com/apache/spark/pull/21676
  
@tdas @marmbrus


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org