etienne created SPARK-17606:
-------------------------------

             Summary: New batches are not created when there are 1000 created 
after restarting streaming from checkpoint.
                 Key: SPARK-17606
                 URL: https://issues.apache.org/jira/browse/SPARK-17606
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.6.1
            Reporter: etienne


When spark restarts from a checkpoint after being down for a while.
It recreates missing batch since the down time.

When there are few missing batches, spark creates new incoming batch every 
batchTime, but when there is enough missing time to create 1000 batches no new 
batch is created.

So when all these batch are completed the stream is idle ...

I think there is a rigid limit set somewhere.

I was expecting that spark continue to recreate missed batches, maybe not all 
at once ( because it's look like it's cause driver memory problem ), and then 
recreate batches each batchTime.

Another solution would be to not create missing batches but still restart the 
direct input.

Right know for me the only solution to restart a stream after a long break it 
to remove the checkpoint to allow the creation of a new stream. But losing all 
my states.

ps : I'm speaking about direct Kafka input because it's the source I'm 
currently using, I don't know what happens with other sources.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to