Re: full disk

Jun Rao Mon, 23 Sep 2013 20:42:53 -0700

Yes, manually removing the old log files is the simplest solution right now.


Thanks,

Jun


On Mon, Sep 23, 2013 at 9:16 AM, Paul Mackles <[email protected]> wrote:

> Done:
>
> https://issues.apache.org/jira/browse/KAFKA-1063
>
> Out of curioisity, is manually removing the older log files the only
> option at this point?
>
> From: Paul Mackles <[email protected]<mailto:[email protected]>>
> To: "[email protected]<mailto:[email protected]>" <
> [email protected]<mailto:[email protected]>>
> Subject: full disk
>
> Hi -
>
> We ran into a situation on our dev cluster (3 nodes, v0.8) where we ran
> out of disk on one of the nodes . As expected, the broker shut itself down
> and all of the clients switched over to the other nodes. So far so good.
>
> To free up disk space, I reduced log.retention.hours to something more
> manageable (from 172 to 12). I did this on all 3 nodes. Since the other 2
> nodes were running OK, I first tried to restart the node which ran out of
> disk. Unfortunately, it kept shutting itself down due to the full disk.
> From the logs, I think this was because it was trying to sync-up the
> replicas it was responsible for and of course couldn't due to the lack of
> disk space. My hope was that upon restart, it would see the new retention
> settings and free up a bunch of disk space before trying to do any syncs.
>
> I then went and restarted the other 2 nodes. They both picked up the new
> retention settings and freed up a bunch of storage as a result. I then went
> back and tried to restart the 3rd node but to no avail. It still had
> problems with the full disks.
>
> I thought about trying to reassign partitions so that the node in question
> had less to manage but that turned out to be a hassle so I wound up
> manually deleting some of the old log/segment files. The broker seemed to
> come back fine after that but that's not something I would want to do on a
> production server.
>
> We obviously need better monitoring/alerting to avoid this situation
> altogether, but I am wondering if the order of operations at startup
> could/should be changed to better account for scenarios like this. Or maybe
> a utility to remove old logs after changing ttl? Did I miss a better way to
> handle this?
>
> Thanks,
> Paul
>
>
>
>

Re: full disk

Reply via email to