I was trying to create a base-data-frame in an EMR cluster from a csv file
using

val baseDF =
spark.read.csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv”)

Omitted the options to infer the schema and specify the header, just to
understand what happens behind the screen.


The Spark UI shows that this kicked off a job with one stage.The stage
shows that a filter was applied

Got curious a little bit about this. Is there any place where i could
better understand why a filter was applied here and why there was an action
in this case


thanks

Reply via email to