[ 
https://issues.apache.org/jira/browse/SPARK-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684341#comment-16684341
 ] 

Tom Bar Yacov edited comment on SPARK-26008 at 11/12/18 8:40 PM:
-----------------------------------------------------------------

I think this should be a wish not a question. I reopened to allow review of the 
wish and discuss any technical risks of such implementation.


was (Author: tombarya):
I believe this is more a wish not a question. I reopened to allow review of the 
wish and discuss any technical risks of such implementation.

> Structured Streaming Manual clock for simulation
> ------------------------------------------------
>
>                 Key: SPARK-26008
>                 URL: https://issues.apache.org/jira/browse/SPARK-26008
>             Project: Spark
>          Issue Type: Wish
>          Components: Structured Streaming
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Tom Bar Yacov
>            Priority: Major
>
> Structured streaming Internal {color:#333333}StreamTest{color} class allows 
> to test incremental logic and verify outputs between multiple triggers. It 
> support changing the internal spark clock to get full deterministic 
> simulation of the incremental state and APIs. This is not possible outside 
> tests since {color:#333333}DataStreamWriter{color} hides the triggerClock 
> parameter and is final.
> This can be very useful not only in unit test mode but also for a real 
> running query. for example when you have all the Kafka historical data 
> persisted to hdfs with its Kafka timestamp and you want to "play"  the data 
> and simulate the streaming application output as if  running on this data in 
> live streaming including incremental output between triggers.
> Currently I can simulate multiple triggers and incremental logic for some of 
> the APIs, but for APIs that depend on the execution clock like 
> {color:#333333}mapGroupsWithState{color} with execution based timeout I did 
> not find a way to do this.
> I would like to allow passing an externally controlled clock as parameter to 
> DataStreamWriter and to the query itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to