Yeah, fixed slicing may help. I'll put more thought into this.
You had mentioned that you didn't put custom partitioner into production.
Would you mind sharing how you worked around this currently?

Srikanth

On Tue, May 3, 2016 at 5:43 PM, Wesley Chow <w...@chartbeat.com> wrote:

> >
> > Upload to S3 is partitioned by the "key" field. I.e, one folder per key.
> It
> > does offset management to make sure offset commit is in sync with S3
> upload.
>
> We do this in several spots and I wish we had built our system in such a
> way that we could just open source it. I’m sure many people have solved
> this repeatedly. We’ve had significant disk performance issues when the
> number of keys is large (40,000-ish in our case) — you can’t be expected to
> open a file per key. That’s why something like the fixed slicing strategy I
> described can make a big difference.
>
> Wes
>
>

Reply via email to