Re: ListS3 on very large buckets

Jerry Vinokurov Thu, 06 Feb 2020 09:54:25 -0800

I'd say the thing you're going to be most concerned with is running out of
memory, since it's going to produce one flow file per item in your listing.
Is there any sort of sensible prefix structure that partitions your files?
If there is, I would have some sort of iterative logic that constructs the
prefix path, lists the files under that path, and then perhaps combines
them into a single file whose body is the list of the files themselves, or
alternately processes them in batches.


On Thu, Feb 6, 2020 at 12:41 PM Mike Thomsen <[email protected]> wrote:

> We might be having to pull down potentially tens of millions or more
> objects from a bucket soon in one big pull. Are there any best practices
> for handling this with ListS3? Last time I used it, I recall that on the
> initial pull it would just keep going even if you scheduled a stop request
> on it, but that might have been just a bad perception on my part.
>
> Thanks,
>
> Mike
>


-- 
http://www.google.com/profiles/grapesmoker

Re: ListS3 on very large buckets

Reply via email to