I was trying to create a base-data-frame in an EMR cluster from a csv file using
val baseDF = spark.read.csv("s3://l4b-d4t4/wikipedia/pageviews-by-second-tsv”) Omitted the options to infer the schema and specify the header, just to understand what happens behind the screen. The Spark UI shows that this kicked off a job with one stage.The stage shows that a filter was applied Got curious a little bit about this. Is there any place where i could better understand why a filter was applied here and why there was an action in this case thanks