Unable to explain the job kicked off for spark.read.csv

Appu K Sun, 08 Jan 2017 07:12:06 -0800

I was trying to create a base-data-frame in an EMR cluster from a csv file
using


val baseDF =
spark.read.csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv”)

Omitted the options to infer the schema and specify the header, just to
understand what happens behind the screen.


The Spark UI shows that this kicked off a job with one stage.The stage
shows that a filter was applied

Got curious a little bit about this. Is there any place where i could
better understand why a filter was applied here and why there was an action
in this case


thanks

Unable to explain the job kicked off for spark.read.csv

Reply via email to