Re: Reading the last line of each file in a set of text files

2021-08-03 Thread Artemis User
Assuming you are running Linux, an easy option would be just to use the Linux tail command to extract the last line (or last couple of lines) of a file and save them to a different file/directory, before feeding it to Spark.  It shouldn't be hard to write a shell script that executes tail on al

Reading the last line of each file in a set of text files

2021-08-02 Thread Sayeh Roshan
Hi users, Does anyone here has experience with written spark code that just read the last line of each text file in a directory, s3 bucket, etc? I am looking for a solution that doesn’t require reading the whole file. I basically wonder whether you can create a data frame/Rdd using file seek. Not s