Hi Ted,

The largest a partition can get is the size of the disk storing the
partition's log on the broker. You can use RAID to increase the "disk"
size, and hence the partition size. Whether or not you can fit all messages
in Kafka depends on your use case -- how much data you have, the size of
your disks, number of partitions, etc.

If you can't fit all messages on disk, consider using a compacted topic.
Most use cases I see that use Kafka as a source of truth are based on
compacted topics. Learn more about compaction here:

http://kafka.apache.org/documentation.html#compaction

Alex

On Wed, Feb 24, 2016 at 12:47 AM, Gerard Klijs <gerard.kl...@dizzit.com>
wrote:

> Hi Ted,
>
> Maybe it's usefull to take a look at samza, http://samza.apache.org/ they
> use kafka in a way which sounds similar to how you want to use it. As I
> recall from a youtube conference the creator of samza also mentioned to
> never delete the events. These things are off course very dependent on your
> use case, some events aren't worth keeping them around for long.
>
> On Wed, Feb 24, 2016 at 9:08 AM Ted Swerve <ted.swe...@gmail.com> wrote:
>
> > Hello,
> >
> > One of the big attractions of Kafka for me was the ability to write new
> > consumers of topics that would then be able to connect to a topic and
> > replay all the previous events.
> >
> > However, most of the time, Kafka appears to be used with a retention
> period
> > - presumably in such cases, the events have been warehoused into HDFS
> > or something similar.
> >
> > So my question is - how do people typically approach the scenario where a
> > new piece of code needs to process all events in a topic from "day one",
> > but has to source some of them from e.g HDFS and then connect to the
> > real-time Kafka topic?  Are there any wrinkles with such an approach?
> >
> > Thanks,
> > Ted
> >
>

Reply via email to