I have also thought that Hadoop mapper output result is saved on HDFS, at least 
if the job only has Mapper but doesn't have Reducer.
If there is reducer, then the map output will be saved on local disk?




 
From: Shao, Saisai
Date: 2015-01-26 15:23
To: Larry Liu
CC: u...@spark.incubator.apache.org
Subject: RE: Shuffle to HDFS
Hey Larry,
 
I don’t think Hadoop will put shuffle output in HDFS, instead it’s behavior is 
the same as what Spark did, store mapper output (shuffle) data on local disks. 
You might misunderstood something J.
 
Thanks
Jerry
 
From: Larry Liu [mailto:larryli...@gmail.com] 
Sent: Monday, January 26, 2015 3:03 PM
To: Shao, Saisai
Cc: u...@spark.incubator.apache.org
Subject: Re: Shuffle to HDFS
 
Hi,Jerry
 
Thanks for your reply.
 
The reason I have this question is that in Hadoop, mapper intermediate output 
(shuffle) will be stored in HDFS. I think the default location for spark is 
/tmp I think. 
 
Larry
 
On Sun, Jan 25, 2015 at 9:44 PM, Shao, Saisai <saisai.s...@intel.com> wrote:
Hi Larry,
 
I don’t think current Spark’s shuffle can support HDFS as a shuffle output. 
Anyway, is there any specific reason to spill shuffle data to HDFS or NFS, this 
will severely increase the shuffle time.
 
Thanks
Jerry
 
From: Larry Liu [mailto:larryli...@gmail.com] 
Sent: Sunday, January 25, 2015 4:45 PM
To: u...@spark.incubator.apache.org
Subject: Shuffle to HDFS
 
How to change shuffle output to HDFS or NFS?
 

Reply via email to