[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

squito Wed, 08 Aug 2018 08:47:45 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    > statistically fine considering most Spark jobs are short-running and 
don't hit FetchFailure quite often (The major advantage of this approach is 
that you don't pay for any penalty if you don't hit FetchFailure).
    
    I don't think this is the right metric we should be considering.  The 
majority of jobs may be small, but (a) thats different than total compute time 
(one big job is equal to many small jobs) and (b) many users start small 
because they believe spark will scale well for them.
    
    Fetch failures are way more common on big jobs.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to