Given a directory with input files of the following format: /data/shard1/file1.json /data/shard1/file2.json /data/shard1/file3.json /data/shard2/file1.json /data/shard2/file2.json /data/shard2/file3.json
Is there a way to make FileInputFormat with parallelism 2 split processing by "shard" (folder) and then process files in chronological order (file1.json, file2.json, file3.json) in each shard? Will I have to implement a custom FilInputFormat for that?