subject:"Breaking the previous large\-scale sort record with Spark"

Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Matei Zaharia

Congrats to everyone who helped make this happen. And if anyone has even more machines they'd like us to run on next year, let us know :). Matei > On Nov 5, 2014, at 3:11 PM, Reynold Xin wrote: > > Hi all, > > We are excited to announce that the benchmark entry has been reviewed by > the Sort

Re: Breaking the previous large-scale sort record with Spark

2014-11-05 Thread Reynold Xin

Hi all, We are excited to announce that the benchmark entry has been reviewed by the Sort Benchmark committee and Spark has officially won the Daytona GraySort contest in sorting 100TB of data. Our entry tied with a UCSD research team building high performance systems and we jointly set a new wor

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Krishna Sankar

Well done guys. MapReduce sort at that time was a good feat and Spark now has raised the bar with the ability to sort a PB. Like some of the folks in the list, a summary of what worked (and didn't) as well as the monitoring practices would be good. Cheers P.S: What are you folks planning next ? O

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Ilya Ganelin

Thank you for the details! Would you mind speaking to what tools proved most useful as far as identifying bottlenecks or bugs? Thanks again. On Oct 13, 2014 5:36 PM, "Matei Zaharia" wrote: > The biggest scaling issue was supporting a large number of reduce tasks > efficiently, which the JIRAs in

Re: Breaking the previous large-scale sort record with Spark

2014-10-13 Thread Matei Zaharia

The biggest scaling issue was supporting a large number of reduce tasks efficiently, which the JIRAs in that post handle. In particular, our current default shuffle (the hash-based one) has each map task open a separate file output stream for each reduce task, which wastes a lot of memory (since

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ilya Ganelin

Hi Matei - I read your post with great interest. Could you possibly comment in more depth on some of the issues you guys saw when scaling up spark and how you resolved them? I am interested specifically in spark-related problems. I'm working on scaling up spark to very large datasets and have been

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Henry Saputra

Congrats to Reynold et al leading this effort! - Henry On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been able to use Spark to > break MapReduc

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Steve Nunez

previous large-scale sort record with Spark > Awesome news Matei ! > > Congratulations to the databricks team and all the community members... > > On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia > wrote: >> Hi folks, >> >> I interrupt your regularly schedul

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread arthur.hk.c...@gmail.com

Wonderful !! On 11 Oct, 2014, at 12:00 am, Nan Zhu wrote: > Great! Congratulations! > > -- > Nan Zhu > On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > >> Brilliant stuff ! Congrats all :-) >> This is indeed really heartening news ! >> >> Regards, >> Mridul >> >> >> On

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Nan Zhu

Great! Congratulations! -- Nan Zhu On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia (mailto:matei.zaha...@gmail.com)

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Dinesh J. Weerakkody

Wow.. Cool.. Congratulations.. :) On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska wrote: > This is a bad deal, great job. > > On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan > wrote: > > > Brilliant stuff ! Congrats all :-) > > This is indeed really heartening news ! > > > > Regards, > > Mri

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Ted Malaska

This is a bad deal, great job. On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan wrote: > Brilliant stuff ! Congrats all :-) > This is indeed really heartening news ! > > Regards, > Mridul > > > On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia > wrote: > > Hi folks, > > > > I interrupt your r

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan

Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some pretty > cool news for the project, which is that we've been a

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Debasish Das

Awesome news Matei ! Congratulations to the databricks team and all the community members... On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia wrote: > Hi folks, > > I interrupt your regularly scheduled user / dev list to bring you some > pretty cool news for the project, which is that we've been

Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Matei Zaharia

Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which is that we've been able to use Spark to break MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer nodes. There's a detailed writeup at http://datab

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Re: Breaking the previous large-scale sort record with Spark

Breaking the previous large-scale sort record with Spark

15 matches

Site Navigation

Mail list logo

Footer information