Hi, We intend to run adhoc windowed continuous queries on spark streaming data. The queries could be registered/deregistered dynamically or can be submitted through command line. Currently Spark streaming doesn’t allow adding any new inputs, transformations, and output operations after starting a StreamingContext. But doing following code changes in DStream.scala allows me to create an window on DStream even after StreamingContext has started (in StreamingContextState.ACTIVE).
1) In DStream.validateAtInit() Allowed adding new inputs, transformations, and output operations after starting a streaming context 2) In DStream.persist() Allowed to change storage level of an DStream after streaming context has started Ultimately the window api just does slice on the parentRDD and returns allRDDsInWindow. We create DataFrames out of these RDDs from this particular WindowedDStream, and evaluate queries on those DataFrames. 1) Do you see any challenges and consequences with this approach ? 2) Will these on the fly created WindowedDStreams be accounted properly in Runtime and memory management? 3) What is the reason we do not allow creating new windows with StreamingContextState.ACTIVE state? 4) Does it make sense to add our own implementation of WindowedDStream in this case? - Yogesh