Hi,
I think intermediate results are stored in spark.local.dir (/tmp
by default), not in hdfs.
So if you see big latency in a shuffle-heavy operation, it's
probably due to the device where /tmp reside. You can run iotop on
your slaves to check
EC2 doesn't mount SSDs automatically, so I actually reformat those as
ex4, mount them and point hdfs to them. I haven't tweaked any FS
options and not sure what cause the problem. I will follow
Christopher's suggestion and see whether I can find any cause.
-chen
On Fri, Jan 17, 2014 at 11:22 AM,
Chen, I would also look at actual I/O patterns of the operations. SSDs
writes are sensitive to significantly variable performance depending on the
exact scenario, and can easily underperform HDD given the "right"
conditions. Generically quoted IOPS numbers are not reliable across a
variety of commo
This would be good to support backward and forward versions.
> On Jan 17, 2014, at 11:23 AM, Matei Zaharia wrote:
>
> Also I should say with this that 1.0 will stay on Scala 2.10, and more
> generally I think we want to keep having releases for Scala 2.10 at least for
> this year. It should
Also I should say with this that 1.0 will stay on Scala 2.10, and more
generally I think we want to keep having releases for Scala 2.10 at least for
this year. It should be easier to cross-build future releases for both 2.10 and
2.11 than it was with the 2.9 -> 2.10 jump.
Matei
On Jan 17, 2014
What file system do you have? One thing we’ve observed is that ext3, which is
the default on ephemeral disks on EC2, scales very poorly to multicore
workloads. We recommend reformatting those as XFS (which is very fast to
format) or ext4 (which unfortunately takes a few hours to finalize). Maybe
Are there different amounts of RAM on the SSD machines vs the Spinny disk
machines?
Sent from my mobile phone
On Jan 17, 2014 5:22 AM, "Jay" wrote:
> OS memory cache??
>
> Sent from my iPad.
>
> > 在 2014年1月16日,上午6:04,Chen Jin 写道:
> >
> > Dear Spark developers:
> >
> > We are benchmarking spark
OS memory cache??
Sent from my iPad.
> 在 2014年1月16日,上午6:04,Chen Jin 写道:
>
> Dear Spark developers:
>
> We are benchmarking spark operations such as filter, group, join on
> ssd instance i2.2xlarge on EC2. Most operations are similar or
> slightly better than ephemeral disks on EC2, however, th