[ https://issues.apache.org/jira/browse/SPARK-26008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684341#comment-16684341 ]
Tom Bar Yacov edited comment on SPARK-26008 at 11/12/18 8:40 PM: ----------------------------------------------------------------- I think this should be a wish not a question. I reopened to allow review of the wish and discuss any technical risks of such implementation. was (Author: tombarya): I believe this is more a wish not a question. I reopened to allow review of the wish and discuss any technical risks of such implementation. > Structured Streaming Manual clock for simulation > ------------------------------------------------ > > Key: SPARK-26008 > URL: https://issues.apache.org/jira/browse/SPARK-26008 > Project: Spark > Issue Type: Wish > Components: Structured Streaming > Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 > Reporter: Tom Bar Yacov > Priority: Major > > Structured streaming Internal {color:#333333}StreamTest{color} class allows > to test incremental logic and verify outputs between multiple triggers. It > support changing the internal spark clock to get full deterministic > simulation of the incremental state and APIs. This is not possible outside > tests since {color:#333333}DataStreamWriter{color} hides the triggerClock > parameter and is final. > This can be very useful not only in unit test mode but also for a real > running query. for example when you have all the Kafka historical data > persisted to hdfs with its Kafka timestamp and you want to "play" the data > and simulate the streaming application output as if running on this data in > live streaming including incremental output between triggers. > Currently I can simulate multiple triggers and incremental logic for some of > the APIs, but for APIs that depend on the execution clock like > {color:#333333}mapGroupsWithState{color} with execution based timeout I did > not find a way to do this. > I would like to allow passing an externally controlled clock as parameter to > DataStreamWriter and to the query itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org