[GitHub] spark issue #21700: SPARK-24717 Split out min retain version of state for me...

HeartSaVioR Mon, 02 Jul 2018 15:26:15 -0700

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/21700
  
    Pasting JIRA issue description to explain why this patch is needed:
    
    As default version of "spark.sql.streaming.minBatchesToRetain" is set to 
high (100), which doesn't require strictly 100x of memory, but I'm seeing 10x ~ 
80x of memory consumption for various workloads. In addition, in some cases, 
requiring 2x of memory is even unacceptable, so we should split out 
configuration for memory and let users adjust to trade-off between memory usage 
vs cache miss (building state from files).
    
    In normal case, default value '2' would cover both cases: success and 
restoring failure with less than or around 2x of memory usage, and '1' would 
only cover success case but no longer require more than 1x of memory. In 
extreme case, user can set the value to '0' to completely disable the map cache 
to maximize executor memory (covers #21500).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21700: SPARK-24717 Split out min retain version of state for me...

Reply via email to