Hi all, We are a small team doing a research on low-power (and low-cost) ARM clusters. We built a 20-node ARM cluster that be able to start Hadoop. But as all of you've known, Hadoop is performing on-disk operations, so it's not suitable for a constraint machine powered by ARM.
We then switched to Spark and had to say wow!! Spark / HDFS enables us to crush Wikipedia articles (of year 2012) of size 34GB in 1h50m. We have identified the bottleneck and it's our 100M network. Here's the cluster: https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/Mk-I_SSD.png And this is what we got from Spark's shell: https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/result_00.png I think it's the first ARM cluster that can process a non-trivial size of Big Data. (Please correct me if I'm wrong) I really want to thank the Spark team that makes this possible !! Best regards, -chanwit -- Chanwit Kaewkasi linkedin.com/in/chanwit