Is sparkSession.sql now an action in Spark 3 and later?
Hi, I remember previously that spark.sql() wasn’t a final action and you would have needed to run something like show() for the query to actually being performed. Today I noticed that when I do just spark.sql() without show() or anything , lots of executors are being fired and reading their logs shows they are actually opening files and reading them. Was there a change in spark 3 and later that changed the behavior? I am using spark 3.1.2. This happens even if I disable AQE. Thanks, S.
Reading the last line of each file in a set of text files
Hi users, Does anyone here has experience with written spark code that just read the last line of each text file in a directory, s3 bucket, etc? I am looking for a solution that doesn’t require reading the whole file. I basically wonder whether you can create a data frame/Rdd using file seek. Not sure whether there is such a thing already available in spark. Thank you very much in advance.