Clément MATHIEU created SPARK-21558: ---------------------------------------
Summary: Kinesis lease failover time should be increased or made configurable Key: SPARK-21558 URL: https://issues.apache.org/jira/browse/SPARK-21558 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.0.2 Reporter: Clément MATHIEU I have a Spark Streaming application reading from a Kinesis stream which exhibits serious shard lease fickleness. The root cause as been identified as KCL default failover time being too low for our typical JVM pauses time: # KinesisClientLibConfiguration#DEFAULT_FAILOVER_TIME_MILLIS is 10 seconds, meaning that if a worker does not renew a lease within 10s, others workers will steal it # spark-streaming-kinesis-asl uses default KCL failover time and does not allow to configure it # Executor's JVM logs show frequent 10+ seconds pauses While we could spend some time to fine tune GC configuration to reduce pause times, I am wondering if 10 seconds is not too low. Typical Spark executors have very large heaps and GCs available in HotSpot are not great at ensuring low and deterministic pause times. One might also want to use ParallelGC. What do you think about: # Increasing fail over time (it might hurts application with low latency requirements) # Making it configurable -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org