Spark enables us to process Big Data on an ARM cluster !!

Chanwit Kaewkasi Tue, 18 Mar 2014 19:37:16 -0700

Hi all,

We are a small team doing a research on low-power (and low-cost) ARM
clusters. We built a 20-node ARM cluster that be able to start Hadoop.
But as all of you've known, Hadoop is performing on-disk operations,
so it's not suitable for a constraint machine powered by ARM.


We then switched to Spark and had to say wow!!

Spark / HDFS enables us to crush Wikipedia articles (of year 2012) of
size 34GB in 1h50m. We have identified the bottleneck and it's our
100M network.

Here's the cluster:
https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/Mk-I_SSD.png

And this is what we got from Spark's shell:
https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/result_00.png

I think it's the first ARM cluster that can process a non-trivial size
of Big Data.
(Please correct me if I'm wrong)
I really want to thank the Spark team that makes this possible !!

Best regards,

-chanwit

--
Chanwit Kaewkasi
linkedin.com/in/chanwit

Spark enables us to process Big Data on an ARM cluster !!

Reply via email to