Hi, We have a spark structured streaming monitoring a folder and converting jsonl files into parquet. However, if there are some pre-existing jsonl files before the first time (no check point yet) running of the spark streaming job, these files will not be processed by the spark job when it runs. We need to do something like https://stackoverflow.com/questions/44618783/spark-streaming-only-streams-files-created-after-the-stream-initialization-time .
Is there a way for the spark streaming job to pick up the pre-existing files? For example, is there a setting for this? Appreciate any clue.