Sounds like a goer then :) Those strings in the protobuf always get ya, can't use clever encodings for them like you can with numbers.
On Wed, 16 Mar 2022 at 11:29, Dan Hill <quietgol...@gmail.com> wrote: > We're using protos but there are still a bunch of custom fields where > clients specify redundant strings. > > My local test is showing 75% reduction in size if I use zstd or gzip. I > care the most about Kafka storage costs right now. > > On Tue, Mar 15, 2022 at 2:25 PM Liam Clarke-Hutchinson < > lclar...@redhat.com> > wrote: > > > Hi Dan, > > > > Okay, so if you're looking for low latency, I'm guessing that you're > using > > a very low linger.ms in the producers? Also, what format are the > records? > > If they're already in a binary format like Protobuf or Avro, unless > they're > > composed largely of strings, compression may offer little benefit. > > > > With your small records, I'd suggest running some tests with your current > > config with different compression settings - none, snappy, lz4, (don't > > bother with gzip unless that's all you have) and checking producer > metrics > > (available via JMX if you're using the Java clients) for avg-batch-size > and > > compression-ratio. > > > > You may just wish to start with no compression, and then consider moving > to > > it if/when network bandwidth becomes a bottleneck. > > > > Regards, > > > > Liam > > > > On Tue, 15 Mar 2022 at 17:05, Dan Hill <quietgol...@gmail.com> wrote: > > > > > Thanks, Liam! > > > > > > I have a mixture of Kafka record size. 10% are large (>100kbs) and 90% > > of > > > the records are smaller than 1kb. I'm working on a streaming analytics > > > solution that streams impressions, user actions and serving info and > > > combines them together. End-to-end latency is more important than > > storage > > > size. > > > > > > > > > On Mon, Mar 14, 2022 at 3:27 PM Liam Clarke-Hutchinson < > > > lclar...@redhat.com> > > > wrote: > > > > > > > Hi Dan, > > > > > > > > Decompression generally only happens in the broker if the topic has a > > > > particular compression algorithm set, and the producer is using a > > > different > > > > one - then the broker will decompress records from the producer, then > > > > recompress it using the topic's configured algorithm. (The LogCleaner > > > will > > > > also decompress then recompress records when compacting compressed > > > topics). > > > > > > > > The consumer decompresses compressed record batches it receives. > > > > > > > > In my opinion, using topic compression instead of producer > compression > > > > would only make sense if the overhead of a few more CPU cycles > > > compression > > > > uses was not tolerable for the producing app. In all of my use cases, > > > > network throughput becomes a bottleneck long before producer > > compression > > > > CPU cost does. > > > > > > > > For your "if X, do Y" formulation I'd say - if your producer is > sending > > > > tiny batches, do some analysis of compressed vs. uncompressed size > for > > > your > > > > given compression algorithm - you may find that compression overhead > > > > increases batch size for tiny batches. > > > > > > > > If you're sending a large amount of data, do tune your batching and > use > > > > compression to reduce data being sent over the wire. > > > > > > > > If you can tell us more about what your problem domain, there might > be > > > more > > > > advice that's applicable :) > > > > > > > > Cheers, > > > > > > > > Liam Clarke-Hutchinson > > > > > > > > On Tue, 15 Mar 2022 at 10:05, Dan Hill <quietgol...@gmail.com> > wrote: > > > > > > > > > Hi. I looked around for advice about Kafka compression. I've seen > > > mixed > > > > > and conflicting advice. > > > > > > > > > > Is there any sorta "if X, do Y" type of documentation around Kafka > > > > > compression? > > > > > > > > > > Any advice? Any good posts to read that talk about this trade off? > > > > > > > > > > *Detailed comments* > > > > > I tried looking for producer vs topic compression. I didn't find > > much. > > > > > Some of the information I see is back from 2011 (which I'm guessing > > is > > > > > pretty stale). > > > > > > > > > > I can guess some potential benefits but I don't know if they are > > > actually > > > > > real. I've also seen some sites claim certain trade offs but it's > > > > unclear > > > > > if they're true. > > > > > > > > > > It looks like I can modify an existing topic's compression. I > don't > > > know > > > > > if that actually works. I'd assume it'd just impact data going > > > forward. > > > > > > > > > > I've seen multiple sites say that decompression happens in the > broker > > > and > > > > > multiple that say it happens in the consumer. > > > > > > > > > > > > > > >