Re: Spark shell vs Spark job

Patrick Wendell Thu, 19 Dec 2013 09:11:12 -0800

You can use the UI to debug what is going on. For instance, are the
tasks themselves taking longer (?) or is it possible that the overall
job acquires fewer executors (?). Another thing influencing this is
that when you run a job it counts the time it takes to go and start-up
all the executors. When you launch the shell you might be ignoring
that time because you only start counting once the shell is launched
(this depends how you are measuring the time).


- Patrick

On Wed, Dec 18, 2013 at 11:29 PM, Debasish Das <[email protected]> wrote:
> Hi,
>
> I have the equivalent code written in a spark script and spark job.
>
> My script runs 3X faster than the job.
>
> Any idea why I am noticing this discrepancy ? Is spark shell using kryo
> serialization by default ?
>
> Spark shell: use script ./wordcount.scala
>
> SPARK_MEM=2g ./spark-shell
> scala> :load wordcount.scala
> Loading wordcount.scala...
> inputPath: String = hdfs://x.com:9000/sandbox/data/wordcount/input
> outputPath: String = hdfs://x.com:9000/sandbox/data/wordcount/output_spark
> start: Long = 1387388284050
> file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
> <console>:14
> words: org.apache.spark.rdd.RDD[(String, Int)] = MappedRDD[3] at map at
> <console>:17
> counts: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[6] at
> reduceByKey at <console>:18
> end: Long = 1387388301740
>
> Non-cached wordcount runtime 17 sec
>
> Spark job: use org.apache.spark.examples.HdfsWordCount
>
> [debasish@istgbd011 sag_spark]$ SPARK_MEM=2g ./run-example
> org.apache.spark.examples.HdfsWordCount spark://x.com:7077
> hdfs://x.com:9000/sandbox/data/wordcount/input
> hdfs://x.com:9000/sandbox/data/wordcount/output_spark
>
> Non-cached wordcount runtime 53 sec
>
> I like the 17 sec runtime since it is around 3X faster than exact same code
> in scalding and I have not yet utilized the caching feature.
>
> Thanks.
> Deb

Re: Spark shell vs Spark job

Reply via email to