Re: The performance of group operation on SSD

2014-01-22 Thread Guillaume Pitel
Hi, I think intermediate results are stored in spark.local.dir (/tmp by default), not in hdfs. So if you see big latency in a shuffle-heavy operation, it's probably due to the device where /tmp reside. You can run iotop on your slaves to check

Re: The performance of group operation on SSD

2014-01-22 Thread Chen Jin
EC2 doesn't mount SSDs automatically, so I actually reformat those as ex4, mount them and point hdfs to them. I haven't tweaked any FS options and not sure what cause the problem. I will follow Christopher's suggestion and see whether I can find any cause. -chen On Fri, Jan 17, 2014 at 11:22 AM,

Re: The performance of group operation on SSD

2014-01-17 Thread Christopher Nguyen
Chen, I would also look at actual I/O patterns of the operations. SSDs writes are sensitive to significantly variable performance depending on the exact scenario, and can easily underperform HDD given the "right" conditions. Generically quoted IOPS numbers are not reliable across a variety of commo

Re: The performance of group operation on SSD

2014-01-17 Thread Kk.gmail
This would be good to support backward and forward versions. > On Jan 17, 2014, at 11:23 AM, Matei Zaharia wrote: > > Also I should say with this that 1.0 will stay on Scala 2.10, and more > generally I think we want to keep having releases for Scala 2.10 at least for > this year. It should

Re: The performance of group operation on SSD

2014-01-17 Thread Matei Zaharia
Also I should say with this that 1.0 will stay on Scala 2.10, and more generally I think we want to keep having releases for Scala 2.10 at least for this year. It should be easier to cross-build future releases for both 2.10 and 2.11 than it was with the 2.9 -> 2.10 jump. Matei On Jan 17, 2014

Re: The performance of group operation on SSD

2014-01-17 Thread Matei Zaharia
What file system do you have? One thing we’ve observed is that ext3, which is the default on ephemeral disks on EC2, scales very poorly to multicore workloads. We recommend reformatting those as XFS (which is very fast to format) or ext4 (which unfortunately takes a few hours to finalize). Maybe

Re: The performance of group operation on SSD

2014-01-17 Thread Andrew Ash
Are there different amounts of RAM on the SSD machines vs the Spinny disk machines? Sent from my mobile phone On Jan 17, 2014 5:22 AM, "Jay" wrote: > OS memory cache?? > > Sent from my iPad. > > > 在 2014年1月16日,上午6:04,Chen Jin 写道: > > > > Dear Spark developers: > > > > We are benchmarking spark

Re: The performance of group operation on SSD

2014-01-17 Thread Jay
OS memory cache?? Sent from my iPad. > 在 2014年1月16日,上午6:04,Chen Jin 写道: > > Dear Spark developers: > > We are benchmarking spark operations such as filter, group, join on > ssd instance i2.2xlarge on EC2. Most operations are similar or > slightly better than ephemeral disks on EC2, however, th