Yes, manually removing the old log files is the simplest solution right now.
Thanks, Jun On Mon, Sep 23, 2013 at 9:16 AM, Paul Mackles <[email protected]> wrote: > Done: > > https://issues.apache.org/jira/browse/KAFKA-1063 > > Out of curioisity, is manually removing the older log files the only > option at this point? > > From: Paul Mackles <[email protected]<mailto:[email protected]>> > To: "[email protected]<mailto:[email protected]>" < > [email protected]<mailto:[email protected]>> > Subject: full disk > > Hi - > > We ran into a situation on our dev cluster (3 nodes, v0.8) where we ran > out of disk on one of the nodes . As expected, the broker shut itself down > and all of the clients switched over to the other nodes. So far so good. > > To free up disk space, I reduced log.retention.hours to something more > manageable (from 172 to 12). I did this on all 3 nodes. Since the other 2 > nodes were running OK, I first tried to restart the node which ran out of > disk. Unfortunately, it kept shutting itself down due to the full disk. > From the logs, I think this was because it was trying to sync-up the > replicas it was responsible for and of course couldn't due to the lack of > disk space. My hope was that upon restart, it would see the new retention > settings and free up a bunch of disk space before trying to do any syncs. > > I then went and restarted the other 2 nodes. They both picked up the new > retention settings and freed up a bunch of storage as a result. I then went > back and tried to restart the 3rd node but to no avail. It still had > problems with the full disks. > > I thought about trying to reassign partitions so that the node in question > had less to manage but that turned out to be a hassle so I wound up > manually deleting some of the old log/segment files. The broker seemed to > come back fine after that but that's not something I would want to do on a > production server. > > We obviously need better monitoring/alerting to avoid this situation > altogether, but I am wondering if the order of operations at startup > could/should be changed to better account for scenarios like this. Or maybe > a utility to remove old logs after changing ttl? Did I miss a better way to > handle this? > > Thanks, > Paul > > > >
