Assuming you are running Linux, an easy option would be just to use the Linux tail command to extract the last line (or last couple of lines) of a file and save them to a different file/directory, before feeding it to Spark.  It shouldn't be hard to write a shell script that executes tail on all files in a directory (or S3 bucket if using AWS CLI).  If you really want this kind of file preprocessing done in Spark, you will have to extend Spark's DataFrameReader API which may not be an easy task if you don't have experienced Scala developers.  Hope this helps...

-- ND

On 8/2/21 6:50 PM, Sayeh Roshan wrote:
Hi users,
Does anyone here has experience with written spark code that just read the last line of each text file in a directory, s3 bucket, etc? I am looking for a solution that doesn’t require reading the whole file. I basically wonder whether you can create a data frame/Rdd using file seek. Not sure whether there is such a thing already available in spark.
Thank you very much in advance.


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to