Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Reynold Xin
Hi all, We are excited to announce that the benchmark entry has been reviewed by the Sort Benchmark committee and Spark has officially won the Daytona GraySort contest in sorting 100TB of data. Our entry tied with a UCSD research team building high performance systems and we jointly set a new

Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Matei Zaharia
Congrats to everyone who helped make this happen. And if anyone has even more machines they'd like us to run on next year, let us know :). Matei On Nov 5, 2014, at 3:11 PM, Reynold Xin r...@databricks.com wrote: Hi all, We are excited to announce that the benchmark entry has been

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Matei Zaharia
The biggest scaling issue was supporting a large number of reduce tasks efficiently, which the JIRAs in that post handle. In particular, our current default shuffle (the hash-based one) has each map task open a separate file output stream for each reduce task, which wastes a lot of memory

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Ilya Ganelin
Thank you for the details! Would you mind speaking to what tools proved most useful as far as identifying bottlenecks or bugs? Thanks again. On Oct 13, 2014 5:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote: The biggest scaling issue was supporting a large number of reduce tasks efficiently,

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Krishna Sankar
Well done guys. MapReduce sort at that time was a good feat and Spark now has raised the bar with the ability to sort a PB. Like some of the folks in the list, a summary of what worked (and didn't) as well as the monitoring practices would be good. Cheers k/ P.S: What are you folks planning next ?

Re: Breaking the previous large-scale sort record with Spark

2014-10-11 Thread Henry Saputra
Congrats to Reynold et al leading this effort! - Henry On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Dinesh J. Weerakkody
Wow.. Cool.. Congratulations.. :) On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska ted.mala...@cloudera.com wrote: This is a bad deal, great job. On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan mri...@gmail.com wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu
Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com
Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu zhunanmcg...@gmail.com wrote: Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Steve Nunez
...@gmail.com Cc: user u...@spark.apache.org, dev dev@spark.apache.org Subject: Re: Breaking the previous large-scale sort record with Spark Awesome news Matei ! Congratulations to the databricks team and all the community members... On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha