I got the number from the Hadoop admin. It's 1M actually. I suspect the consolidation didn't work as expected? Any other reason?
On Thu, Jul 31, 2014 at 11:01 AM, Shao, Saisai <saisai.s...@intel.com> wrote: > I don’t think it’s a bug of consolidated shuffle, it’s a Linux > configuration problem. The default open files in Linux is 1024, while your > open file is larger than 1024 you will get the error as you mentioned > below. So you can set the open file numbers to a large one by: ulimit –n > xxx or write into /etc/security/limits.conf in Ubuntu. > > > > Shuffle consolidation can reduce the total shuffle file numbers, but the > concurrent opened file number is the same as basic hash-based shuffle. > > > > Thanks > > Jerry > > > > *From:* Jianshi Huang [mailto:jianshi.hu...@gmail.com] > *Sent:* Thursday, July 31, 2014 10:34 AM > *To:* user@spark.apache.org > *Cc:* xia...@sjtu.edu.cn > *Subject:* Re: spark.shuffle.consolidateFiles seems not working > > > > Ok... but my question is why spark.shuffle.consolidateFiles is working > (or is it)? Is this a bug? > > > > On Wed, Jul 30, 2014 at 4:29 PM, Larry Xiao <xia...@sjtu.edu.cn> wrote: > > Hi Jianshi, > > I've met similar situation before. > And my solution was 'ulimit', you can use > > -a to see your current settings > -n to set open files limit > (and other limits also) > > And I set -n to 10240. > > I see spark.shuffle.consolidateFiles helps by reusing open files. > (so I don't know to what extend does it help) > > Hope it helps. > > Larry > > > > On 7/30/14, 4:01 PM, Jianshi Huang wrote: > > I'm using Spark 1.0.1 on Yarn-Client mode. > > SortByKey always reports a FileNotFoundExceptions with messages says "too > many open files". > > I already set spark.shuffle.consolidateFiles to true: > > conf.set("spark.shuffle.consolidateFiles", "true") > > But it seems not working. What are the other possible reasons? How to fix > it? > > Jianshi > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > > > > > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/