My Detail test process:
1. In initialization, it will create 100 string RDDs and distribute
them in spark workers.
for (int i = 1; i <= numOfRDDs; i++) {
JavaRDD<String> rddData =
sc.parallelize(Arrays.asList(Integer.toString(i))).coalesce(1);
rddData.cache().count();
simpleRDDs.put(Integer.toString(i), rddData);
}
2. In Jmeter, configure 100 threads and loop 100 times, each thread
will send the get method use its number as RDDId:
3. This function simply return the RDD string, note: the dictionary
simpleRDDs is initialized at first with 100 RDDs.
public static String simpleRDDTest(String keyOfRDD) {
JavaRDD<String> rddData = simpleRDDs.get(keyOfRDD);
return rddData.first();
}
4. Test three cases for different number of workers:
During the test, I run several times to get the stable throughput.
The throughput in three cases vary between 85-95/sec. There is no
significantly difference between different worker number.
5. I think this result means even if there is no calculation, the
through put has a limitation because spark job initialization and dispatch.
Add more workers can’t help improve this situation. Is anyone can explain
this?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-does-spark-take-so-much-time-for-simple-task-without-calculation-tp27628p27656.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]