logrotate error (was A full /var partition destroyed 3 hours of my life!)
> [rsyslog maintainer speaking here] > >> One of the culprits in my full /var partition was a 3 gig syslog file >> which has only been getting bigger since January despite running >> logrotate -f. I try to run it this time but I'm told that it can't > > I'd be interested to find out, why logrotation was not done > automatically. Do you have cron installed and running? > Do you have /etc/cron.daily/logrotate which works when executed and a > corresponding /etc/logrotate.d/rsyslog? > > Any idea why logrotate was not run or failed to do its job? Following up on this question, when I had run logrotate -f /etc/logrotate.conf in the past, I got these errors: error: error creating output file /var/log/apache2/access.log.1.gz: File exists error: error creating output file /var/log/apache2/error.log.1.gz: File exists error: error creating output file /var/log/cups/access_log.1.gz: File exists error: error creating output file /var/log/exim4/mainlog.1.gz: File exists error: error creating output file /var/log/syslog.1.gz: File exists It seems that logrotate doesn't rotate these, so the original files increase in size indefinitely. I removed the offending files and logrotate runs without errors, for the time being. Is this a known software issue, possibly from a version upgrade?
Re: btrfs filesystem full problems (was Re: A full /var partition destroyed 3 hours of my life!)
On Tue, Nov 15, 2016 at 02:25:55PM -0500, Borden Rhodes wrote: > Correct you are! The various incantations used different filters and > one of them worked. I have no idea what filters are and I would die a > happy man without needing to know. Last time I used 'btrfs balance', I had to run it with increasing (or decreasing) values for the -fi argument (iirc), as the initial values freed up just enough working space for the subsequent values to work within, and the subsequent values would not work when I first tried them. If that makes sense. I was left with a very sour taste in my mouth. Nowadays I only use btrfs for a development space (as a docker storage backend) and when it goes pear-shaped, I just blow it away and start again, there's nothing stored there which can't be recreated. > I use Debian testing, so it's whatever kernel and btrfs packages that > were in that as of yesterday. Ah ok, thanks. -- Jonathan Dowland Please do not CC me, I am subscribed to the list. signature.asc Description: Digital signature
RE: btrfs filesystem full problems (was Re: A full /var partition destroyed 3 hours of my life!)
> It sounds like btrfs specific behaviour. It would be interesting to know > what kernel version and btrfs version you were using, if only to confirm > my suspicion that even the versions in Debian are not suitable for use in > production. > > I'm going to guess that it was a series of 'btrfs balance' commands that > fixed things for you. Correct you are! The various incantations used different filters and one of them worked. I have no idea what filters are and I would die a happy man without needing to know. I use Debian testing, so it's whatever kernel and btrfs packages that were in that as of yesterday. > Yes, this is right. The problem is not 'rm', the problem is that you use > sudo without understanding why it is set up like that: sudo logs the > command it executes to /var/log/auth.log Makes sense. So why did I get the exact same problem when I enabled the debug-shell? Unless it's also lying to me, doesn't it boot into a proper root shell?
Re: A full /var partition destroyed 3 hours of my life!
> [rsyslog maintainer speaking here] > > Am 15.11.2016 um 06:00 schrieb Borden Rhodes: >> One of the culprits in my full /var partition was a 3 gig syslog file >> which has only been getting bigger since January despite running >> logrotate -f. I try to run it this time but I'm told that it can't > > I'd be interested to find out, why logrotation was not done > automatically. Do you have cron installed and running? > Do you have /etc/cron.daily/logrotate which works when executed and a > corresponding /etc/logrotate.d/rsyslog? > > Any idea why logrotate was not run or failed to do its job? Here's the contents of /etc/cron.daily/logrotate: #!/bin/sh test -x /usr/sbin/logrotate || exit 0 /usr/sbin/logrotate /etc/logrotate.conf and /etc/logrotate.d/rsyslog: /var/log/syslog { rotate 7 daily missingok notifempty delaycompress compress postrotate invoke-rc.d rsyslog rotate > /dev/null endscript } /var/log/mail.info /var/log/mail.warn /var/log/mail.err /var/log/mail.log /var/log/daemon.log /var/log/kern.log /var/log/auth.log /var/log/user.log /var/log/lpr.log /var/log/cron.log /var/log/debug /var/log/messages { rotate 4 weekly missingok notifempty compress delaycompress sharedscripts postrotate invoke-rc.d rsyslog rotate > /dev/null endscript } Both looked normal to me and, without knowing more about the structure of logrotate config files, didn't pick further. When I logrotate -f , it runs and finishes without complaining, but syslog doesn't seem to get smaller. I think it just kept getting bigger. >> My question, therefore, is whether this is a btrfs bug that got >> triggered by the full /var partition or whether Debian is designed to >> break irrecoverably when /var fills up. Any ideas of what happened? >>=20 > > That sounds like a btrfs issue. Which kernel is that? > I do remember btrfs having problems when the disk runs full. I'm running a 4.8.0-1-amd64 kernel. I'm on the testing branch. It makes me feel better knowing that it may be a btrfs bug (or at least not part of the Linux design) since that's a rough edge I can (try to) work around by checking /var every so often. Still, "A Cowboy's Guide to Cleaning /var and /tmp" would help in cases where some process gets greedy with space. >> My question, therefore, is whether this is a btrfs bug that got >> triggered by the full /var partition or whether Debian is designed to >> break irrecoverably when /var fills up. Any ideas of what happened? >> > > Does anything on the Debian Wiki on Btrfs [1] seem familiar? Other than > that I can only guess, but maybe check the SMART information of your > disk(s) for excessive errors, as it _could_ be that defective sectors > prevent Btrfs from doing it's COW magic. I don't think it's that, unless smartctl is lying to me. It passes all of the test and the only historical failure (which I think has almost always been there) is an airflow warning. Error logs are empty. If I start getting strange behaviour, I can do a more comprehensive SMART scan. > [1] https://wiki.debian.org/Btrfs#WARNINGS Nothing seems on point here. My configuration is btrfs partitions within an LVM within an MBR hard drive. I'm not doing any fancy RAID or anything. Thank you for the hints!
Re: A full /var partition destroyed 3 hours of my life!
On 15-11-2016 03:00, Borden Rhodes wrote: > My question, therefore, is whether this is a btrfs bug that got > triggered by the full /var partition or whether Debian is designed to > break irrecoverably when /var fills up. Any ideas of what happened? First, you mentioned the crucial bit of information (that it's a btrfs filesystem) only at the end. Also, you've left out important things such as your running kernel and (to a lesser extent) version of btrfs-tools (or btrfs-progs in newer systems). As others have pointed, btrfs is the culprit. Take a look at these links to try to understand what might have happened: https://btrfs.wiki.kernel.org/index.php/FAQ#Understanding_free_space.2C_using_the_original_tools https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_get_.22No_space_left_on_device.22_errors.2C_but_df_says_I.27ve_got_lots_of_space http://marc.merlins.org/perso/btrfs/post_2014-05-04_Fixing-Btrfs-Filesystem-Full-Problems.html -- Love is being stupid together. -- Paul Valery Eduardo M KALINOWSKI edua...@kalinowski.com.br
Re: A full /var partition destroyed 3 hours of my life!
Borden Rhodeswrites: > Since there's almost no documentation as to what can be safely rm'd in > /var without breaking your system, I decide the least risky choice is > to sudo rm -rf the offending 3-gig syslog file from single-user mode > and the systemd debug shell. But *THIS* command failed because there > was 'no space left on the device'. Is this right? Does rm need space > on a drive to free other space? Yes, this is right. The problem is not 'rm', the problem is that you use sudo without understanding why it is set up like that: sudo logs the command it executes to /var/log/auth.log -- "We will need a longer wall when the revolution comes." --- AJS, quoting an uncertain source.
btrfs filesystem full problems (was Re: A full /var partition destroyed 3 hours of my life!)
On Tue, Nov 15, 2016 at 12:00:37AM -0500, Borden Rhodes wrote: > I tried booting up into Debian and got all sorts of systemd breakages > apparently because my /var partition was full. ... > I start blindly casting whatever btrfs spells... Aha! brtfs! > My question, therefore, is whether this is a btrfs bug that got > triggered by the full /var partition or whether Debian is designed to > break irrecoverably when /var fills up. Any ideas of what happened? It sounds like btrfs specific behaviour. It would be interesting to know what kernel version and btrfs version you were using, if only to confirm my suspicion that even the versions in Debian are not suitable for use in production. I'm going to guess that it was a series of 'btrfs balance' commands that fixed things for you. -- Jonathan Dowland Please do not CC me, I am subscribed to the list. signature.asc Description: Digital signature
Re: A full /var partition destroyed 3 hours of my life!
Am 15.11.2016 um 06:00 schrieb Borden Rhodes: > I start blindly casting whatever btrfs spells I can find on the > Internet to fix 'no space left on device' errors. One of them > eventually works and df -h correctly reports the free space in my /var > partition and Debian boots normally again. > > My question, therefore, is whether this is a btrfs bug that got > triggered by the full /var partition or whether Debian is designed to > break irrecoverably when /var fills up. Any ideas of what happened? > Does anything on the Debian Wiki on Btrfs [1] seem familiar? Other than that I can only guess, but maybe check the SMART information of your disk(s) for excessive errors, as it _could_ be that defective sectors prevent Btrfs from doing it's COW magic. [1] https://wiki.debian.org/Btrfs#WARNINGS signature.asc Description: OpenPGP digital signature
Re: A full /var partition destroyed 3 hours of my life!
[rsyslog maintainer speaking here] Am 15.11.2016 um 06:00 schrieb Borden Rhodes: > One of the culprits in my full /var partition was a 3 gig syslog file > which has only been getting bigger since January despite running > logrotate -f. I try to run it this time but I'm told that it can't I'd be interested to find out, why logrotation was not done automatically. Do you have cron installed and running? Do you have /etc/cron.daily/logrotate which works when executed and a corresponding /etc/logrotate.d/rsyslog? Any idea why logrotate was not run or failed to do its job? > My question, therefore, is whether this is a btrfs bug that got > triggered by the full /var partition or whether Debian is designed to > break irrecoverably when /var fills up. Any ideas of what happened? > That sounds like a btrfs issue. Which kernel is that? I do remember btrfs having problems when the disk runs full. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
A full /var partition destroyed 3 hours of my life!
I tried booting up into Debian and got all sorts of systemd breakages apparently because my /var partition was full. That's fair, but the pain started when Debian frustrated any attempt to free up space. I'm wondering if this is a 'feature' that needs removing or if there might be a bug in the underlying filesystem. I really don't want anymore finite hours of my life (or anyone else's) lost in this problem if I can find the cause. One of the culprits in my full /var partition was a 3 gig syslog file which has only been getting bigger since January despite running logrotate -f. I try to run it this time but I'm told that it can't rotate anything because there's no space left of the device. OK, Plan B. Another thing that the interwebs say is to run apt-get clean to sweep out downloaded packages, of which I collected hundreds of megabytes. Again, this command failed because there was no space left on the device. Since there's almost no documentation as to what can be safely rm'd in /var without breaking your system, I decide the least risky choice is to sudo rm -rf the offending 3-gig syslog file from single-user mode and the systemd debug shell. But *THIS* command failed because there was 'no space left on the device'. Is this right? Does rm need space on a drive to free other space? If so, how on earth can you fix a full partition if you can't remove anything from it?! Since Debian can't delete files from its own partitions, I have to boot from a Ubuntu DVD. I'm able to rm -rf the syslog file from that, but when I reboot into Debian, I get the same 'no space left on device' errors. That's weird, so I df -h to figure out what's going on and df correctly reports a 5G var partition, of which under 3G are now used and avail space is 0G. Whoa, wait, what?!?! How can 5G - 3G = 0G?! I start blindly casting whatever btrfs spells I can find on the Internet to fix 'no space left on device' errors. One of them eventually works and df -h correctly reports the free space in my /var partition and Debian boots normally again. My question, therefore, is whether this is a btrfs bug that got triggered by the full /var partition or whether Debian is designed to break irrecoverably when /var fills up. Any ideas of what happened?