hi, I found out the major problem of my spark cluster but don't know why it
happens. First, I was testing spark by running applications. It was spending
about 20 seconds only for counting 10 million strings/items(2GB) on the
cluster with 8 nodes (8 cores per node). As we know that it is a very bad
performance for a parallel programming framework. Today, I have tested the
same counting program using spark-shell over (this time much larger data)
100 million strings/items (20GB). And, it was taking just 10 seconds to
complete all the tasks and turned out to be almost 10x times faster than its
last result. Its performance was very good and promising. Do you think it is
because of my spark setting?

Joe



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/help-tp4648.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to