Re: Broker crashes when no space left for log.dirs

Jay Kreps Thu, 15 Aug 2013 11:32:30 -0700

I am saying we always immediately write to the fs. So the question is is it
possible with delayed allocation in ext4 to do a successful write that
later cannot be flushed to disk due to running out of space? I don't know
the answer to this, though I would hope it is not possible.


Basically if our write to the fs succeeds and replicas acknowledge then we
send back the ack.

-Jay


On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <[email protected]> wrote:

> Hmmm....I guess I was thinking that a broker could receive a message and
> keep it in memory, before having disk space reserved for it's eventual
> storage.  Are you saying that memory is not allocated for a message without
> there already being disk space allocated for it?  In which case, there
> should be no problem!
>
> Jason
>
>
> On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[email protected]> wrote:
>
> > I don't think the filesystem will overcommit its disk space, but I'm
> > actually not sure. I think this would only come into play on a fs like
> ext4
> > which does lazy block allocation in addition to lazy writing. But I think
> > even ext4 is probably not allowed to hand out more disk space then it
> has.
> >
> >
> > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[email protected]>
> > wrote:
> >
> > > A related question:  Will producers sending messages with
> acknowledgment,
> > > get a failed ack if a broker is out of disk space, or will messages get
> > > buffered in memory successfully (resulting in a good ack, before
> failing
> > to
> > > be written).
> > >
> > > It seems like it might be a good feature to have the broker auto-detect
> > if
> > > it's log dir is nearing full, so that there is some runway to
> gracefully
> > > shutdown, while still writing any in memory buffered messages.  It
> could
> > be
> > > an optional threshold, like 98% full, or X Mb free, etc.
> > >
> > > Jason
> > >
> > >
> > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[email protected]>
> wrote:
> > >
> > > > The crash is actually just a call to shutdown. We think this is the
> > right
> > > > thing to do, though I agree it is unintuitive. Here is why. When you
> > get
> > > an
> > > > out of space error it is likely that the operating system did a
> partial
> > > > write, leaving you with a corrupt log. Furthermore it is possible
> that
> > > > space will free up at which point more writes on the log could
> succeed
> > so
> > > > you wouldn't even know there was a problem but all your consumers
> would
> > > hit
> > > > this data and choke.
> > > >
> > > > By "crashing" the node we ensure that recovery is run on the log to
> > bring
> > > > it into a consistent state.
> > > >
> > > > Theoretically we could leave the node up accepting reads but
> rejecting
> > > > writes while attempting to recover the log. But there are a bunch of
> > > > problems with this. But this is very complex. Likely if you are out
> of
> > > > space you are just going to keep getting writes, and running out of
> > space
> > > > again and then running recovery and so on. This kind of crazy loop is
> > > much
> > > > worse then just needing to bring the node back up.
> > > >
> > > > Alternately we could leave the node up but go into some kind of
> > > > write-rejecting mode forever. But this would still require that you
> > > restart
> > > > the node, and we would have to implement that write-rejecting node.
> > > >
> > > > Cheers,
> > > >
> > > > -Jay
> > > >
> > > >
> > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[email protected]>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is more of a thought question than a problem that I need
> support
> > > > for.
> > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user
> > case
> > > > we
> > > > > want to try and guarantee that our consumers will see all messages
> > even
> > > > if
> > > > > they have fallen greatly behind the broker/producer. For this
> reason
> > I
> > > > > wanted to know how the broker would react when the filesystem it
> > writes
> > > > its
> > > > > messages to is full. What I found was that the broker crashes and
> > > cannot
> > > > be
> > > > > started until the filesystem has space again.
> > > > >
> > > > > Is there or would it make sense to provide configuration allowing
> the
> > > > > broker to reject writes in this case rather than crashing,
> electing a
> > > new
> > > > > leader and attempting the write again? I can clearly understand the
> > use
> > > > > case that we don't want to 'lose' messages from the producer and I
> > > could
> > > > > also see how lack of filesystem space could be considered a machine
> > > > > failure, but with replication I would think if you are running out
> of
> > > > space
> > > > > on 1 broker you are likely running out of space on others.
> > > > >
> > > > > Bryan
> > > > >
> > > >
> > >
> >
>

Re: Broker crashes when no space left for log.dirs

Reply via email to