Re: Broker crashes when no space left for log.dirs

Jay Kreps Thu, 15 Aug 2013 10:59:24 -0700

I don't think the filesystem will overcommit its disk space, but I'm
actually not sure. I think this would only come into play on a fs like ext4
which does lazy block allocation in addition to lazy writing. But I think
even ext4 is probably not allowed to hand out more disk space then it has.



On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[email protected]> wrote:

> A related question:  Will producers sending messages with acknowledgment,
> get a failed ack if a broker is out of disk space, or will messages get
> buffered in memory successfully (resulting in a good ack, before failing to
> be written).
>
> It seems like it might be a good feature to have the broker auto-detect if
> it's log dir is nearing full, so that there is some runway to gracefully
> shutdown, while still writing any in memory buffered messages.  It could be
> an optional threshold, like 98% full, or X Mb free, etc.
>
> Jason
>
>
> On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[email protected]> wrote:
>
> > The crash is actually just a call to shutdown. We think this is the right
> > thing to do, though I agree it is unintuitive. Here is why. When you get
> an
> > out of space error it is likely that the operating system did a partial
> > write, leaving you with a corrupt log. Furthermore it is possible that
> > space will free up at which point more writes on the log could succeed so
> > you wouldn't even know there was a problem but all your consumers would
> hit
> > this data and choke.
> >
> > By "crashing" the node we ensure that recovery is run on the log to bring
> > it into a consistent state.
> >
> > Theoretically we could leave the node up accepting reads but rejecting
> > writes while attempting to recover the log. But there are a bunch of
> > problems with this. But this is very complex. Likely if you are out of
> > space you are just going to keep getting writes, and running out of space
> > again and then running recovery and so on. This kind of crazy loop is
> much
> > worse then just needing to bring the node back up.
> >
> > Alternately we could leave the node up but go into some kind of
> > write-rejecting mode forever. But this would still require that you
> restart
> > the node, and we would have to implement that write-rejecting node.
> >
> > Cheers,
> >
> > -Jay
> >
> >
> > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > This is more of a thought question than a problem that I need support
> > for.
> > > I have trying out Kafka 0.8.0-beta1 with replication. For our user case
> > we
> > > want to try and guarantee that our consumers will see all messages even
> > if
> > > they have fallen greatly behind the broker/producer. For this reason I
> > > wanted to know how the broker would react when the filesystem it writes
> > its
> > > messages to is full. What I found was that the broker crashes and
> cannot
> > be
> > > started until the filesystem has space again.
> > >
> > > Is there or would it make sense to provide configuration allowing the
> > > broker to reject writes in this case rather than crashing, electing a
> new
> > > leader and attempting the write again? I can clearly understand the use
> > > case that we don't want to 'lose' messages from the producer and I
> could
> > > also see how lack of filesystem space could be considered a machine
> > > failure, but with replication I would think if you are running out of
> > space
> > > on 1 broker you are likely running out of space on others.
> > >
> > > Bryan
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Reply via email to