Re: Reading the last line of each file in a set of text files

Artemis User Tue, 03 Aug 2021 06:01:58 -0700

Assuming you are running Linux, an easy option would be just to use theLinux tail command to extract the last line (or last couple of lines) ofa file and save them to a different file/directory, before feeding it toSpark. It shouldn't be hard to write a shell script that executes tailon all files in a directory (or S3 bucket if using AWS CLI). If youreally want this kind of file preprocessing done in Spark, you will haveto extend Spark's DataFrameReader API which may not be an easy task ifyou don't have experienced Scala developers. Hope this helps...


-- ND


On 8/2/21 6:50 PM, Sayeh Roshan wrote:

Hi users,
Does anyone here has experience with written spark code that just readthe last line of each text file in a directory, s3 bucket, etc?I am looking for a solution that doesn’t require reading the wholefile. I basically wonder whether you can create a data frame/Rdd usingfile seek. Not sure whether there is such a thing already available inspark.
Thank you very much in advance.



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Reading the last line of each file in a set of text files

Reply via email to