Hi all,
We are excited to announce that the benchmark entry has been reviewed by
the Sort Benchmark committee and Spark has officially won the Daytona
GraySort contest in sorting 100TB of data.
Our entry tied with a UCSD research team building high performance systems
and we jointly set a new
Congrats to everyone who helped make this happen. And if anyone has even more
machines they'd like us to run on next year, let us know :).
Matei
On Nov 5, 2014, at 3:11 PM, Reynold Xin r...@databricks.com wrote:
Hi all,
We are excited to announce that the benchmark entry has been
The biggest scaling issue was supporting a large number of reduce tasks
efficiently, which the JIRAs in that post handle. In particular, our current
default shuffle (the hash-based one) has each map task open a separate file
output stream for each reduce task, which wastes a lot of memory
Thank you for the details! Would you mind speaking to what tools proved
most useful as far as identifying bottlenecks or bugs? Thanks again.
On Oct 13, 2014 5:36 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
The biggest scaling issue was supporting a large number of reduce tasks
efficiently,
Well done guys. MapReduce sort at that time was a good feat and Spark now
has raised the bar with the ability to sort a PB.
Like some of the folks in the list, a summary of what worked (and didn't)
as well as the monitoring practices would be good.
Cheers
k/
P.S: What are you folks planning next ?
Congrats to Reynold et al leading this effort!
- Henry
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which is that we've been able to use Spark
Brilliant stuff ! Congrats all :-)
This is indeed really heartening news !
Regards,
Mridul
On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which
Wow.. Cool.. Congratulations.. :)
On Fri, Oct 10, 2014 at 8:51 PM, Ted Malaska ted.mala...@cloudera.com
wrote:
This is a bad deal, great job.
On Fri, Oct 10, 2014 at 11:19 AM, Mridul Muralidharan mri...@gmail.com
wrote:
Brilliant stuff ! Congrats all :-)
This is indeed really heartening
Great! Congratulations!
--
Nan Zhu
On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote:
Brilliant stuff ! Congrats all :-)
This is indeed really heartening news !
Regards,
Mridul
On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com
Wonderful !!
On 11 Oct, 2014, at 12:00 am, Nan Zhu zhunanmcg...@gmail.com wrote:
Great! Congratulations!
--
Nan Zhu
On Friday, October 10, 2014 at 11:19 AM, Mridul Muralidharan wrote:
Brilliant stuff ! Congrats all :-)
This is indeed really heartening news !
Regards,
Mridul
...@gmail.com
Cc: user u...@spark.apache.org, dev dev@spark.apache.org
Subject: Re: Breaking the previous large-scale sort record with Spark
Awesome news Matei !
Congratulations to the databricks team and all the community members...
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha
11 matches
Mail list logo