Clément MATHIEU created SPARK-21558:
---------------------------------------

             Summary: Kinesis lease failover time should be increased or made 
configurable
                 Key: SPARK-21558
                 URL: https://issues.apache.org/jira/browse/SPARK-21558
             Project: Spark
          Issue Type: Bug
          Components: DStreams
    Affects Versions: 2.0.2
            Reporter: Clément MATHIEU


I have a Spark Streaming application reading from a Kinesis stream which 
exhibits serious shard lease fickleness. The root cause as been identified as 
KCL default failover time being too low for our typical JVM pauses time:

#  KinesisClientLibConfiguration#DEFAULT_FAILOVER_TIME_MILLIS is 10 seconds, 
meaning that if a worker does not renew a lease within 10s, others workers will 
steal it
# spark-streaming-kinesis-asl uses default KCL failover time and does not allow 
to configure it
# Executor's JVM logs show frequent 10+ seconds pauses

While we could spend some time to fine tune GC configuration to reduce pause 
times, I am wondering if 10 seconds is not too low. Typical Spark executors 
have very large heaps and GCs available in HotSpot are not great at ensuring 
low and deterministic pause times. One might also want to use ParallelGC. 

What do you think about:

# Increasing fail over time (it might hurts application with low latency 
requirements)
# Making it configurable




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to