A simple throughput test is also repartition()ing a large RDD. This also stresses the disks, though, so you might try to mount your spark temporary directory as a ramfs.
On Fri, Jun 27, 2014 at 5:57 PM, danilopds <danilob...@gmail.com> wrote: > Hi, > According with the research paper bellow of Mathei Zaharia, Spark's > creator, > http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf > > He says on page 10 that: > Grep is network-bound due to the cost to replicate the input data to > multiple nodes. > > So, > I guess a can be a good initial recommendation. > > But I would like to know others workloads too. > Best Regards. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Interconnect-benchmarking-tp8467p8470.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >