Paul, This is likely due to that the log cleaner only runs every log.cleanup.interval.mins (defaults to 10) mins. We probably should consider running the cleaner on startup of a broker. Could you file a jira for that? Thanks, Jun
On Sat, Sep 21, 2013 at 12:06 PM, Paul Mackles <[email protected]> wrote: > Hi - > > We ran into a situation on our dev cluster (3 nodes, v0.8) where we ran > out of disk on one of the nodes . As expected, the broker shut itself down > and all of the clients switched over to the other nodes. So far so good. > > To free up disk space, I reduced log.retention.hours to something more > manageable (from 172 to 12). I did this on all 3 nodes. Since the other 2 > nodes were running OK, I first tried to restart the node which ran out of > disk. Unfortunately, it kept shutting itself down due to the full disk. > From the logs, I think this was because it was trying to sync-up the > replicas it was responsible for and of course couldn't due to the lack of > disk space. My hope was that upon restart, it would see the new retention > settings and free up a bunch of disk space before trying to do any syncs. > > I then went and restarted the other 2 nodes. They both picked up the new > retention settings and freed up a bunch of storage as a result. I then went > back and tried to restart the 3rd node but to no avail. It still had > problems with the full disks. > > I thought about trying to reassign partitions so that the node in question > had less to manage but that turned out to be a hassle so I wound up > manually deleting some of the old log/segment files. The broker seemed to > come back fine after that but that's not something I would want to do on a > production server. > > We obviously need better monitoring/alerting to avoid this situation > altogether, but I am wondering if the order of operations at startup > could/should be changed to better account for scenarios like this. Or maybe > a utility to remove old logs after changing ttl? Did I miss a better way to > handle this? > > Thanks, > Paul > > > >
