I am saying we always immediately write to the fs. So the question is is it possible with delayed allocation in ext4 to do a successful write that later cannot be flushed to disk due to running out of space? I don't know the answer to this, though I would hope it is not possible.
Basically if our write to the fs succeeds and replicas acknowledge then we send back the ack. -Jay On Thu, Aug 15, 2013 at 11:12 AM, Jason Rosenberg <[email protected]> wrote: > Hmmm....I guess I was thinking that a broker could receive a message and > keep it in memory, before having disk space reserved for it's eventual > storage. Are you saying that memory is not allocated for a message without > there already being disk space allocated for it? In which case, there > should be no problem! > > Jason > > > On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[email protected]> wrote: > > > I don't think the filesystem will overcommit its disk space, but I'm > > actually not sure. I think this would only come into play on a fs like > ext4 > > which does lazy block allocation in addition to lazy writing. But I think > > even ext4 is probably not allowed to hand out more disk space then it > has. > > > > > > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[email protected]> > > wrote: > > > > > A related question: Will producers sending messages with > acknowledgment, > > > get a failed ack if a broker is out of disk space, or will messages get > > > buffered in memory successfully (resulting in a good ack, before > failing > > to > > > be written). > > > > > > It seems like it might be a good feature to have the broker auto-detect > > if > > > it's log dir is nearing full, so that there is some runway to > gracefully > > > shutdown, while still writing any in memory buffered messages. It > could > > be > > > an optional threshold, like 98% full, or X Mb free, etc. > > > > > > Jason > > > > > > > > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[email protected]> > wrote: > > > > > > > The crash is actually just a call to shutdown. We think this is the > > right > > > > thing to do, though I agree it is unintuitive. Here is why. When you > > get > > > an > > > > out of space error it is likely that the operating system did a > partial > > > > write, leaving you with a corrupt log. Furthermore it is possible > that > > > > space will free up at which point more writes on the log could > succeed > > so > > > > you wouldn't even know there was a problem but all your consumers > would > > > hit > > > > this data and choke. > > > > > > > > By "crashing" the node we ensure that recovery is run on the log to > > bring > > > > it into a consistent state. > > > > > > > > Theoretically we could leave the node up accepting reads but > rejecting > > > > writes while attempting to recover the log. But there are a bunch of > > > > problems with this. But this is very complex. Likely if you are out > of > > > > space you are just going to keep getting writes, and running out of > > space > > > > again and then running recovery and so on. This kind of crazy loop is > > > much > > > > worse then just needing to bring the node back up. > > > > > > > > Alternately we could leave the node up but go into some kind of > > > > write-rejecting mode forever. But this would still require that you > > > restart > > > > the node, and we would have to implement that write-rejecting node. > > > > > > > > Cheers, > > > > > > > > -Jay > > > > > > > > > > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[email protected]> > > wrote: > > > > > > > > > Hi, > > > > > > > > > > This is more of a thought question than a problem that I need > support > > > > for. > > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user > > case > > > > we > > > > > want to try and guarantee that our consumers will see all messages > > even > > > > if > > > > > they have fallen greatly behind the broker/producer. For this > reason > > I > > > > > wanted to know how the broker would react when the filesystem it > > writes > > > > its > > > > > messages to is full. What I found was that the broker crashes and > > > cannot > > > > be > > > > > started until the filesystem has space again. > > > > > > > > > > Is there or would it make sense to provide configuration allowing > the > > > > > broker to reject writes in this case rather than crashing, > electing a > > > new > > > > > leader and attempting the write again? I can clearly understand the > > use > > > > > case that we don't want to 'lose' messages from the producer and I > > > could > > > > > also see how lack of filesystem space could be considered a machine > > > > > failure, but with replication I would think if you are running out > of > > > > space > > > > > on 1 broker you are likely running out of space on others. > > > > > > > > > > Bryan > > > > > > > > > > > > > > >
