Hmmm....I guess I was thinking that a broker could receive a message and keep it in memory, before having disk space reserved for it's eventual storage. Are you saying that memory is not allocated for a message without there already being disk space allocated for it? In which case, there should be no problem!
Jason On Thu, Aug 15, 2013 at 10:44 AM, Jay Kreps <[email protected]> wrote: > I don't think the filesystem will overcommit its disk space, but I'm > actually not sure. I think this would only come into play on a fs like ext4 > which does lazy block allocation in addition to lazy writing. But I think > even ext4 is probably not allowed to hand out more disk space then it has. > > > On Thu, Aug 15, 2013 at 10:18 AM, Jason Rosenberg <[email protected]> > wrote: > > > A related question: Will producers sending messages with acknowledgment, > > get a failed ack if a broker is out of disk space, or will messages get > > buffered in memory successfully (resulting in a good ack, before failing > to > > be written). > > > > It seems like it might be a good feature to have the broker auto-detect > if > > it's log dir is nearing full, so that there is some runway to gracefully > > shutdown, while still writing any in memory buffered messages. It could > be > > an optional threshold, like 98% full, or X Mb free, etc. > > > > Jason > > > > > > On Wed, Aug 14, 2013 at 7:58 PM, Jay Kreps <[email protected]> wrote: > > > > > The crash is actually just a call to shutdown. We think this is the > right > > > thing to do, though I agree it is unintuitive. Here is why. When you > get > > an > > > out of space error it is likely that the operating system did a partial > > > write, leaving you with a corrupt log. Furthermore it is possible that > > > space will free up at which point more writes on the log could succeed > so > > > you wouldn't even know there was a problem but all your consumers would > > hit > > > this data and choke. > > > > > > By "crashing" the node we ensure that recovery is run on the log to > bring > > > it into a consistent state. > > > > > > Theoretically we could leave the node up accepting reads but rejecting > > > writes while attempting to recover the log. But there are a bunch of > > > problems with this. But this is very complex. Likely if you are out of > > > space you are just going to keep getting writes, and running out of > space > > > again and then running recovery and so on. This kind of crazy loop is > > much > > > worse then just needing to bring the node back up. > > > > > > Alternately we could leave the node up but go into some kind of > > > write-rejecting mode forever. But this would still require that you > > restart > > > the node, and we would have to implement that write-rejecting node. > > > > > > Cheers, > > > > > > -Jay > > > > > > > > > On Wed, Aug 14, 2013 at 9:52 AM, Bryan Baugher <[email protected]> > wrote: > > > > > > > Hi, > > > > > > > > This is more of a thought question than a problem that I need support > > > for. > > > > I have trying out Kafka 0.8.0-beta1 with replication. For our user > case > > > we > > > > want to try and guarantee that our consumers will see all messages > even > > > if > > > > they have fallen greatly behind the broker/producer. For this reason > I > > > > wanted to know how the broker would react when the filesystem it > writes > > > its > > > > messages to is full. What I found was that the broker crashes and > > cannot > > > be > > > > started until the filesystem has space again. > > > > > > > > Is there or would it make sense to provide configuration allowing the > > > > broker to reject writes in this case rather than crashing, electing a > > new > > > > leader and attempting the write again? I can clearly understand the > use > > > > case that we don't want to 'lose' messages from the producer and I > > could > > > > also see how lack of filesystem space could be considered a machine > > > > failure, but with replication I would think if you are running out of > > > space > > > > on 1 broker you are likely running out of space on others. > > > > > > > > Bryan > > > > > > > > > >
