Hi all,

I implemented a transformation on hdfs files with spark. First tested in
spark-shell (with yarn), I implemented essentially the same logic with a
spark program (scala), built a jar file and used spark-submit to execute it
on my yarn cluster. The weird thing is that spark-submit approach is almost
3x as slow (500s vs 1500s). I am curious why...

I am essentially writing a benchmarking program to test the performance of
spark in various settings, so my spark program has a Benchmark abstract
class, a trait for some common things, and an actual class to perform one
specific benchmark. My spark main creates an instance of my benchmark class
and execute something like benchmark1.run(), which in turn kicks off spark
context, perform data manipulation, etc. I wonder if such constructs
introduced some overhead - comparing to direct manipulation commands in
spark-shell.

Thanks.
-Simon

Reply via email to