Is sparkSession.sql now an action in Spark 3 and later?

2023-02-08 Thread Sayeh Roshan
Hi,
I remember previously that spark.sql() wasn’t a final action
and you would have needed to run something like show() for the query to
actually being performed. Today I noticed that when I do just spark.sql() without show() or anything , lots of executors are being fired and
reading their logs shows they are actually opening files and reading them.
Was there a change in spark 3 and later that changed the behavior?
I am using spark 3.1.2. This happens even if I disable AQE.
Thanks,
S.


Reading the last line of each file in a set of text files

2021-08-02 Thread Sayeh Roshan
Hi users,
Does anyone here has experience with written spark code that just read the
last line of each text file in a directory, s3 bucket, etc?
I am looking for a solution that doesn’t require reading the whole file. I
basically wonder whether you can create a data frame/Rdd using file seek.
Not sure whether there is such a thing already available in spark.
Thank you very much in advance.