[ 
https://issues.apache.org/jira/browse/SPARK-10071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732406#comment-14732406
 ] 

Apache Spark commented on SPARK-10071:
--------------------------------------

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/8624

> QueueInputDStream Should Allow Checkpointing
> --------------------------------------------
>
>                 Key: SPARK-10071
>                 URL: https://issues.apache.org/jira/browse/SPARK-10071
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.4.1
>            Reporter: Asim Jalis
>
> I would like for https://issues.apache.org/jira/browse/SPARK-8630 to be 
> reverted and that issue resolved as won’t fix, and for QueueInputDStream to 
> revert to its old behavior of not throwing an exception if checkpointing is
> enabled.
> Why? The reason is that this fix which throws an exception if the DStream is 
> being checkpointed breaks the primary use case for QueueInputDStream, which 
> is testing. For example, the Spark Streaming documentation recommends using 
> QueueInputDStream for testing.
> Why does throwing an exception if checkpointing is used break this class? The 
> reason is that if I use windowing operations or updateStateByKey then the 
> StreamingContext requires that I enable checkpointing. It throws an exception 
> if I don’t enable checkpointing. But then if I enable checkpointing this 
> class throws an exception saying that I cannot use checkpointing with the 
> queue stream. The end result of this is that I cannot use QueueInputDStream 
> to test windowing operations and updateStateByKey. It can only be used for 
> trivial stateless DStreams.
> But would removing the exception-throwing logic make this code fragile? It 
> should not. In the testing scenario the RDD that is passed into the 
> QueueInputDStream is created through parallelize and it is checkpointable.
> But what about people who are using QueueInputDStream in non-testing 
> scenarios with non-recoverable RDDs? Perhaps a warning suffices here that 
> checkpointing will not be able to recover state if their RDDs are 
> non-recoverable. Then it is up to them how they resolve this situation.
> Since right now we have no good way of determining if a QueueInputDStream 
> contains RDDs that are recoverable or not, why not err on the side of leaving 
> it to the user of the class to not expect recoverability, rather than forcing 
> checkpointing.
> In conclusion: my recommendation would be to revert to the old behavior and 
> to resolve this bug as won’t fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to