How to make bucket listing faster while using S3 with wholeTextFile

Alchemist Mon, 15 Mar 2021 09:31:13 -0700

How to optimize s3 list S3 file using wholeTextFile(): We are using 
wholeTextFile to read data from S3.  As per my understanding wholeTextFile 
first list files of given path.  Since we are using S3 as input source, then 
listing files in a bucket is single-threaded, the S3 API for listing the keys 
in a bucket only returns keys by chunks of 1000 per call.   Since we have at 
millions of files, we are making thousands API calls.  This listing make our 
processing very slow. How can we make listing of S3 faster?
Thanks,
Rachana

How to make bucket listing faster while using S3 with wholeTextFile

Reply via email to