Re: Hash partition of key with skew

Srikanth Thu, 05 May 2016 13:37:05 -0700

Understood. Thanks.

On Wed, May 4, 2016 at 3:47 PM, Wesley Chow <w...@chartbeat.com> wrote:


>
> We don’t do this on the Kafka side, but for a different system that has
> similar distribution problems we manually maintain a map of “hot” keys. On
> the Kafka side, we distribute keys with an even distribution in our largest
> volume topic, and then squash the data and repartition based on a skewed
> key. The resulting skew is somewhat insignificant compared to our largest
> volume topic that we tend to not care.
>
> Wes
>
>
> > On May 4, 2016, at 2:57 PM, Srikanth <srikanth...@gmail.com> wrote:
> >
> > Yeah, fixed slicing may help. I'll put more thought into this.
> > You had mentioned that you didn't put custom partitioner into production.
> > Would you mind sharing how you worked around this currently?
> >
> > Srikanth
> >
> > On Tue, May 3, 2016 at 5:43 PM, Wesley Chow <w...@chartbeat.com> wrote:
> >
> >>>
> >>> Upload to S3 is partitioned by the "key" field. I.e, one folder per
> key.
> >> It
> >>> does offset management to make sure offset commit is in sync with S3
> >> upload.
> >>
> >> We do this in several spots and I wish we had built our system in such a
> >> way that we could just open source it. I’m sure many people have solved
> >> this repeatedly. We’ve had significant disk performance issues when the
> >> number of keys is large (40,000-ish in our case) — you can’t be
> expected to
> >> open a file per key. That’s why something like the fixed slicing
> strategy I
> >> described can make a big difference.
> >>
> >> Wes
> >>
> >>
>
>

Re: Hash partition of key with skew

Reply via email to