Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Sun Rui
d.com <mailto:tony@tendcloud.com> > > From: Sun Rui <mailto:sunrise_...@163.com> > Date: 2016-08-24 22:17 > To: Saisai Shao <mailto:sai.sai.s...@gmail.com> > CC: tony@tendcloud.com <mailto:tony@tendcloud.com>; user > <mailto:user@spark.apa

Re: Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Saisai Shao
oud.com > > > *From:* Sun Rui <sunrise_...@163.com> > *Date:* 2016-08-24 22:17 > *To:* Saisai Shao <sai.sai.s...@gmail.com> > *CC:* tony....@tendcloud.com; user <user@spark.apache.org> > *Subject:* Re: Can we redirect Spark shuffle spill data to HDFS or >

Re: Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread tony....@tendcloud.com
@tendcloud.com From: Sun Rui Date: 2016-08-24 22:17 To: Saisai Shao CC: tony@tendcloud.com; user Subject: Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio? Yes, I also tried FUSE before, it is not stable and I don’t recommend it On Aug 24, 2016, at 22:15, Saisai Shao

Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Sun Rui
Yes, I also tried FUSE before, it is not stable and I don’t recommend it > On Aug 24, 2016, at 22:15, Saisai Shao wrote: > > Also fuse is another candidate (https://wiki.apache.org/hadoop/MountableHDFS > ), but not so stable

Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Saisai Shao
Also fuse is another candidate (https://wiki.apache.org/hadoop/MountableHDFS), but not so stable as I tried before. On Wed, Aug 24, 2016 at 10:09 PM, Sun Rui wrote: > For HDFS, maybe you can try mount HDFS as NFS. But not sure about the > stability, and also there is

Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Sun Rui
For HDFS, maybe you can try mount HDFS as NFS. But not sure about the stability, and also there is additional overhead of network I/O and replica of HDFS files. > On Aug 24, 2016, at 21:02, Saisai Shao wrote: > > Spark Shuffle uses Java File related API to create local

Re: Can we redirect Spark shuffle spill data to HDFS or Alluxio?

2016-08-24 Thread Saisai Shao
Spark Shuffle uses Java File related API to create local dirs and R/W data, so it can only be worked with OS supported FS. It doesn't leverage Hadoop FileSystem API, so writing to Hadoop compatible FS is not worked. Also it is not suitable to write temporary shuffle data into distributed FS, this