Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
OK it happened today and here's the backtrace: # gdb /usr/sbin/noflushd `pidof noflushd` GNU gdb 6.4.90-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i486-linux-gnu...Using host libthread_db library /lib/tls/i686/cmov/libthread_db.so.1. Attaching to program: /usr/sbin/noflushd, process 3159 Reading symbols from /lib/tls/i686/cmov/libc.so.6...done. Loaded symbols for /lib/tls/i686/cmov/libc.so.6 Reading symbols from /lib/ld-linux.so.2...done. Loaded symbols for /lib/ld-linux.so.2 Failed to read a valid object file image from memory. 0xa7e49881 in malloc () from /lib/tls/i686/cmov/libc.so.6 (gdb) bt #0 0xa7e49881 in malloc () from /lib/tls/i686/cmov/libc.so.6 #1 0xa7e49d60 in realloc () from /lib/tls/i686/cmov/libc.so.6 #2 0x0804eb3d in get_line (fp=0x8054030) at util.c:51 #3 0x0804f235 in eat_line (part=0x8054008) at part_info.c:164 #4 0x0804f860 in part_info_next (part=0x8054008, flag=1) at part_info.c:303 #5 0x0804f8e9 in part_info_disk_next (part=0x8054008) at part_info.c:324 #6 0x0804ddbc in sync_spinning_disks (head=0x8054e68) at state.c:209 #7 0x0804e5fe in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:437 #8 0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
I did a couple more backtraces and got different results (after detaching and reattaching): (gdb) bt #0 0xa7f2c410 in ?? () #1 0xaf9df678 in ?? () #2 0x08056468 in ?? () #3 0x0001 in ?? () #4 0xa7e9e423 in open () from /lib/tls/i686/cmov/libc.so.6 #5 0x0804dbef in sync_part (name=0x8054db0 /dev/sda1) at state.c:158 #6 0x0804dc5d in sync_current_disk () at state.c:176 #7 0x0804ddaf in sync_spinning_disks (head=0x8054e68) at state.c:221 #8 0x0804e5fe in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:437 #9 0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269 (gdb) bt #0 release_line (line=0x80576081 14 ram14 0 0 0 0 0 0 0 0 0 0 0\n) at util.c:69 #1 0x0804a966 in update_io_25 (ds=0x8056360) at disk_stat.c:502 #2 0x0804aba5 in disk_stat_update (ds=0x8056360) at disk_stat.c:556 #3 0x0804e0ec in check_io (di=0x8054e68, ds=0x8056360, interval=0) at state.c:316 #4 0x0804e580 in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:414 #5 0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269 (gdb) bt #0 0xa7f2c410 in ?? () #1 0xaf9df4cc in ?? () #2 0x0400 in ?? () #3 0xa7f27000 in ?? () #4 0xa7e9e603 in read () from /lib/tls/i686/cmov/libc.so.6 #5 0xa7e41638 in _IO_file_read () from /lib/tls/i686/cmov/libc.so.6 #6 0xa7e429e8 in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6 #7 0xa7e4313b in _IO_default_uflow () from /lib/tls/i686/cmov/libc.so.6 #8 0xa7e443fd in __uflow () from /lib/tls/i686/cmov/libc.so.6 #9 0xa7e386a6 in _IO_getline_info () from /lib/tls/i686/cmov/libc.so.6 #10 0xa7e385ef in _IO_getline () from /lib/tls/i686/cmov/libc.so.6 #11 0xa7e3757f in fgets () from /lib/tls/i686/cmov/libc.so.6 #12 0x0804eb71 in get_line (fp=0x8056e08) at util.c:56 #13 0x0804a8d6 in update_io_25 (ds=0x8056360) at disk_stat.c:493 #14 0x0804aba5 in disk_stat_update (ds=0x8056360) at disk_stat.c:556 #15 0x0804e0ec in check_io (di=0x8054e68, ds=0x8056360, interval=0) at state.c:316 #16 0x0804e580 in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:414 #17 0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269 Also, strace shows this endless loop: _llseek(3, 0, [0], SEEK_SET)= 0 read(3, major minor #blocks name\n\n 8..., 1024) = 323 open(/dev/sda, O_WRONLY) = 7 fsync(7)= 0 close(7)= 0 open(/dev/sda1, O_WRONLY) = 7 fsync(7)= 0 close(7)= 0 open(/dev/sda2, O_WRONLY) = 7 fsync(7)= 0 close(7)= 0 open(/dev/sda5, O_WRONLY) = 7 fsync(7)= 0 close(7)= 0 open(/dev/sda6, O_WRONLY) = 7 fsync(7)= 0 close(7)= 0 read(3, , 1024) = 0 time(NULL) = 1163667408 _llseek(5, 0, [0], SEEK_SET)= 0 read(5,10 ram0 0 0 0 0 0 0 0 0 0..., 1024) = 1024 read(5, hda5 198 396 0 0\n 36 hda6 ..., 1024) = 163 read(5, , 1024) = 0 time(NULL) = 1163667408 where: # lsof -n|grep noflushd noflushd 3159root cwd DIR8,1 4096 2 / noflushd 3159root rtd DIR8,1 4096 2 / noflushd 3159root txt REG8,1105783 726181 /usr/sbin/noflushd noflushd 3159root mem REG0,00 [heap] (stat: No such file or directory) noflushd 3159root mem REG8,1 1241580 613116 /lib/tls/i686/cmov/libc-2.3.6.so noflushd 3159root mem REG8,1 88164 290409 /lib/ld-2.3.6.so noflushd 3159root0u CHR1,3 1075 /dev/null noflushd 3159root1u CHR1,3 1075 /dev/null noflushd 3159root2u CHR1,3 1075 /dev/null noflushd 3159root3r REG0,3 0 4026531852 /proc/partitions noflushd 3159root4u REG0,3 0 4026531937 /proc/sys/vm/dirty_writeback_centisecs noflushd 3159root5r REG0,3 0 4026531859 /proc/diskstats noflushd 3159root6r DIR0,8 0273 inotify and another backtrace: (gdb) bt full #0 0xa7e0f174 in strtol_l () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. #1 0xa7e0e82f in __strtoul_internal () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. #2 0xa7e2c5fb in _IO_vfscanf () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. #3 0xa7e39a79 in vsscanf () from /lib/tls/i686/cmov/libc.so.6 No symbol table info available. #4 0xa7e34f2e in sscanf () from
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
On Thu, Nov 16, 2006 at 11:04:23AM +0200, Udi Meiri wrote: I did a couple more backtraces and got different results (after detaching and reattaching): Thanks Heiko and Udi for the traces! They seem to indicate that noflushd's main loop is iterated with a zero sleep timeout. I don't see how this condition could be reached from noflushd itself, but it is possible when someone else starts tweaking pdflush parameters, and noflushd didn't take that into account. Maybe some other power tuning, hotplug, or whatever daemon started doing so recently? Anyway, noflushd will still preserve external tweaks to the pdflush parameter, but it won't be forced into a tight loop any longer. At least if my hypothesis is correct. A new revision of noflushd is on its way to the archive. Please let me know if you still see noflushd hogging the CPU with this version. Thanks, Daniel. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
On Thu, Nov 09, 2006 at 08:49:53AM +0200, Udi Meiri wrote: I get this too every week or so, after a daily script that spins up /dev/hda runs (noflushd has already spun it back down when it happens). /dev/sda is not ever spun down (not supposed to be). Hm, this doesn't seem to happen on my test system, so I need to ask you for a bit more assistance: Could you please download and install http://people.debian.org/~kobras/noflushd/noflushd_2.7.5-2+b1_i386.deb It's simply rebuild of the official package without optimisation and with debugging enabled. The next time you find noflushd hogging your CPU, please don't kill it right away, but run gdb /usr/sbin/noflushd `pidof noflushd` instead and send me a backtrace (command bt in gdb). This should give me a hint at where to start looking at. Thanks, Daniel. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
I get this too every week or so, after a daily script that spins up /dev/hda runs (noflushd has already spun it back down when it happens). /dev/sda is not ever spun down (not supposed to be). I'm using 2.6.18 with the Con Kolivas patchset (http://members.optusnet.com.au/ckolivas/kernel/). # ps ax|grep noflushd 14550 ?Ss 0:00 /usr/sbin/noflushd -n 20 # cat /proc/partitions major minor #blocks name 8 0 156290904 sda 8 16835626 sda1 8 2 1 sda2 8 5 488530 sda5 8 6 146801938 sda6 3 0 244198584 hda 3 1 24418768 hda1 3 2 1 hda2 3 5 23430771 hda5 3 6 195366433 hda6 3 7 979933 hda7 # cat /proc/diskstats 10 ram0 0 0 0 0 0 0 0 0 0 0 0 11 ram1 0 0 0 0 0 0 0 0 0 0 0 12 ram2 0 0 0 0 0 0 0 0 0 0 0 13 ram3 0 0 0 0 0 0 0 0 0 0 0 14 ram4 0 0 0 0 0 0 0 0 0 0 0 15 ram5 0 0 0 0 0 0 0 0 0 0 0 16 ram6 0 0 0 0 0 0 0 0 0 0 0 17 ram7 0 0 0 0 0 0 0 0 0 0 0 18 ram8 0 0 0 0 0 0 0 0 0 0 0 19 ram9 0 0 0 0 0 0 0 0 0 0 0 1 10 ram10 0 0 0 0 0 0 0 0 0 0 0 1 11 ram11 0 0 0 0 0 0 0 0 0 0 0 1 12 ram12 0 0 0 0 0 0 0 0 0 0 0 1 13 ram13 0 0 0 0 0 0 0 0 0 0 0 1 14 ram14 0 0 0 0 0 0 0 0 0 0 0 1 15 ram15 0 0 0 0 0 0 0 0 0 0 0 80 sda 5382246 654978 199347321 1912906176 1659272 7149765 70514304 36018288 0 25272725 1949389238 81 sda1 494234 12362298 3699517 29596136 82 sda2 2 4 0 0 85 sda5 3754246 30033780 353544 2828352 86 sda6 1808589 156950959 4761234 38089816 30 hda 155217 6518 1344365 453992 42805 1174172 9737616 10401203 0 575269 10855940 31 hda1 322 322 0 0 32 hda2 2 4 0 0 35 hda5 198 396 0 0 36 hda6 160803 1342981 1217202 9737616 37 hda7 374 374 0 0 220 hdc 0 0 0 0 0 0 0 0 0 0 0 22 64 hdd 10 40 200 3 0 0 0 0 0 3 3 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
Package: noflushd Version: 2.7.5-2 Severity: normal This bug is not reproducible here. At random times, noflushd starts eating up all free cpu-cycles and remains to do so until it is restarted by hand or killed. No Log-Messages or similar stuff get recorded. I think it is possible, that this bug only happens on my local kernel, which has realtime (rt8) and suspend2-patches added, as I didn't notice above behaviour until i upgraded the whole system and added this particular kernel. As such, this bug is filed under normal. Other people might confirm a severity-upgrade :) -- System Information: Debian Release: testing/unstable APT prefers unstable APT policy: (500, 'unstable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/zsh Kernel: Linux 2.6.17-rt8resist Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15) Versions of packages noflushd depends on: ii debconf [debconf-2.0] 1.5.2 Debian configuration management sy ii ed0.2-20 The classic unix line editor ii libc6 2.3.6-15 GNU C Library: Shared libraries noflushd recommends no packages. -- debconf information: * noflushd/expert: false * noflushd/disks: noflushd/params: * noflushd/timeout: 30 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get
On Sun, Sep 24, 2006 at 02:10:32PM +0200, Heiko Weinen wrote: This bug is not reproducible here. At random times, noflushd starts eating up all free cpu-cycles and remains to do so until it is restarted by hand or killed. No Log-Messages or similar stuff get recorded. I think it is possible, that this bug only happens on my local kernel, which has realtime (rt8) and suspend2-patches added, as I didn't notice above behaviour until i upgraded the whole system and added this particular kernel. As such, this bug is filed under normal. Other people might confirm a severity-upgrade :) Thanks for the report. Have you just installed noflushd for the first time, or did it work for you on previous kernels? Also, can you please send in a copy of your /proc/partitions and /proc/diskstat? Thanks, Daniel. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]