Two limitations we found here: http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-in-quot-cogroup-quot-td17349.html
Best Regards, Shixiong Zhu 2014-11-06 2:04 GMT+08:00 Yangcheng Huang <yangcheng.hu...@huawei.com>: > Hi > > > > One question about the power of spark.shuffle.spill – > > (I know this has been asked several times :-) > > > > Basically, in handling a (cached) dataset that doesn’t fit in memory, > Spark can spill it to disk. > > > > However, can I say that, when this is enabled, Spark can handle the > situation faultlessly, no matter – > > (1) How big the data set is (as compared to the available memory) > > (2) How complex the detailed calculation is being carried out > > Can spark.shuffle.spill handle this perfectly? > > > > Here we assume that (1) the disk space has no limitations and (2) the code > is correctly written according to the functional requirements. > > > > The reason to ask this is, under such situations, I kept receiving > warnings like “FetchFailed”, if memory usage reaches the limit. > > > > Thanks > > YC >