Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-11-16 Thread Udi Meiri
OK it happened today and here's the backtrace:

# gdb /usr/sbin/noflushd `pidof noflushd`
GNU gdb 6.4.90-debian
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i486-linux-gnu...Using host libthread_db library 
/lib/tls/i686/cmov/libthread_db.so.1.

Attaching to program: /usr/sbin/noflushd, process 3159
Reading symbols from /lib/tls/i686/cmov/libc.so.6...done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Failed to read a valid object file image from memory.
0xa7e49881 in malloc () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0  0xa7e49881 in malloc () from /lib/tls/i686/cmov/libc.so.6
#1  0xa7e49d60 in realloc () from /lib/tls/i686/cmov/libc.so.6
#2  0x0804eb3d in get_line (fp=0x8054030) at util.c:51
#3  0x0804f235 in eat_line (part=0x8054008) at part_info.c:164
#4  0x0804f860 in part_info_next (part=0x8054008, flag=1) at part_info.c:303
#5  0x0804f8e9 in part_info_disk_next (part=0x8054008) at part_info.c:324
#6  0x0804ddbc in sync_spinning_disks (head=0x8054e68) at state.c:209
#7  0x0804e5fe in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:437
#8  0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-11-16 Thread Udi Meiri
I did a couple more backtraces and got different results (after
detaching and reattaching):

(gdb) bt
#0  0xa7f2c410 in ?? ()
#1  0xaf9df678 in ?? ()
#2  0x08056468 in ?? ()
#3  0x0001 in ?? ()
#4  0xa7e9e423 in open () from /lib/tls/i686/cmov/libc.so.6
#5  0x0804dbef in sync_part (name=0x8054db0 /dev/sda1) at state.c:158
#6  0x0804dc5d in sync_current_disk () at state.c:176
#7  0x0804ddaf in sync_spinning_disks (head=0x8054e68) at state.c:221
#8  0x0804e5fe in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:437
#9  0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269

(gdb) bt
#0  release_line (line=0x80576081   14 ram14 0 0 0 0 0 0 0 0 0 0 0\n)
at util.c:69
#1  0x0804a966 in update_io_25 (ds=0x8056360) at disk_stat.c:502
#2  0x0804aba5 in disk_stat_update (ds=0x8056360) at disk_stat.c:556
#3  0x0804e0ec in check_io (di=0x8054e68, ds=0x8056360, interval=0)
at state.c:316
#4  0x0804e580 in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:414
#5  0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269

(gdb) bt
#0  0xa7f2c410 in ?? ()
#1  0xaf9df4cc in ?? ()
#2  0x0400 in ?? ()
#3  0xa7f27000 in ?? ()
#4  0xa7e9e603 in read () from /lib/tls/i686/cmov/libc.so.6
#5  0xa7e41638 in _IO_file_read () from /lib/tls/i686/cmov/libc.so.6
#6  0xa7e429e8 in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6
#7  0xa7e4313b in _IO_default_uflow () from /lib/tls/i686/cmov/libc.so.6
#8  0xa7e443fd in __uflow () from /lib/tls/i686/cmov/libc.so.6
#9  0xa7e386a6 in _IO_getline_info () from /lib/tls/i686/cmov/libc.so.6
#10 0xa7e385ef in _IO_getline () from /lib/tls/i686/cmov/libc.so.6
#11 0xa7e3757f in fgets () from /lib/tls/i686/cmov/libc.so.6
#12 0x0804eb71 in get_line (fp=0x8056e08) at util.c:56
#13 0x0804a8d6 in update_io_25 (ds=0x8056360) at disk_stat.c:493
#14 0x0804aba5 in disk_stat_update (ds=0x8056360) at disk_stat.c:556
#15 0x0804e0ec in check_io (di=0x8054e68, ds=0x8056360, interval=0)
at state.c:316
#16 0x0804e580 in nfd_daemon (head=0x8054e68, stat=0x8056360) at state.c:414
#17 0x0804cb2c in main (argc=3, argv=0xaf9df844) at noflushd.c:269


Also, strace shows this endless loop:
_llseek(3, 0, [0], SEEK_SET)= 0
read(3, major minor  #blocks  name\n\n   8..., 1024) = 323
open(/dev/sda, O_WRONLY)  = 7
fsync(7)= 0
close(7)= 0
open(/dev/sda1, O_WRONLY) = 7
fsync(7)= 0
close(7)= 0
open(/dev/sda2, O_WRONLY) = 7
fsync(7)= 0
close(7)= 0
open(/dev/sda5, O_WRONLY) = 7
fsync(7)= 0
close(7)= 0
open(/dev/sda6, O_WRONLY) = 7
fsync(7)= 0
close(7)= 0
read(3, , 1024)   = 0
time(NULL)  = 1163667408
_llseek(5, 0, [0], SEEK_SET)= 0
read(5,10 ram0 0 0 0 0 0 0 0 0 0..., 1024) = 1024
read(5, hda5 198 396 0 0\n   36 hda6 ..., 1024) = 163
read(5, , 1024)   = 0
time(NULL)  = 1163667408

where:
# lsof -n|grep noflushd
noflushd   3159root  cwd   DIR8,1  4096  2 /
noflushd   3159root  rtd   DIR8,1  4096  2 /
noflushd   3159root  txt   REG8,1105783 726181 
/usr/sbin/noflushd
noflushd   3159root  mem   REG0,00 
[heap] (stat: No such file or directory)
noflushd   3159root  mem   REG8,1   1241580 613116 
/lib/tls/i686/cmov/libc-2.3.6.so
noflushd   3159root  mem   REG8,1 88164 290409 
/lib/ld-2.3.6.so
noflushd   3159root0u  CHR1,3 1075 
/dev/null
noflushd   3159root1u  CHR1,3 1075 
/dev/null
noflushd   3159root2u  CHR1,3 1075 
/dev/null
noflushd   3159root3r  REG0,3 0 4026531852 
/proc/partitions
noflushd   3159root4u  REG0,3 0 4026531937 
/proc/sys/vm/dirty_writeback_centisecs
noflushd   3159root5r  REG0,3 0 4026531859 
/proc/diskstats
noflushd   3159root6r  DIR0,8 0273 
inotify


and another backtrace:

(gdb) bt full
#0  0xa7e0f174 in strtol_l () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#1  0xa7e0e82f in __strtoul_internal () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xa7e2c5fb in _IO_vfscanf () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3  0xa7e39a79 in vsscanf () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4  0xa7e34f2e in sscanf () from 

Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-11-16 Thread Daniel Kobras
On Thu, Nov 16, 2006 at 11:04:23AM +0200, Udi Meiri wrote:
 I did a couple more backtraces and got different results (after
 detaching and reattaching):

Thanks Heiko and Udi for the traces! They seem to indicate that
noflushd's main loop is iterated with a zero sleep timeout. I don't see
how this condition could be reached from noflushd itself, but it is
possible when someone else starts tweaking pdflush parameters, and
noflushd didn't take that into account. Maybe some other power tuning,
hotplug, or whatever daemon started doing so recently? Anyway, noflushd
will still preserve external tweaks to the pdflush parameter, but it
won't be forced into a tight loop any longer. At least if my hypothesis
is correct. A new revision of noflushd is on its way to the archive.
Please let me know if you still see noflushd hogging the CPU with this
version.

Thanks,

Daniel.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-11-09 Thread Daniel Kobras
On Thu, Nov 09, 2006 at 08:49:53AM +0200, Udi Meiri wrote:
 I get this too every week or so, after a daily script that spins
 up /dev/hda runs (noflushd has already spun it back down when it
 happens). /dev/sda is not ever spun down (not supposed to be).

Hm, this doesn't seem to happen on my test system, so I need to ask you
for a bit more assistance: Could you please download and install
http://people.debian.org/~kobras/noflushd/noflushd_2.7.5-2+b1_i386.deb
It's simply rebuild of the official package without optimisation and
with debugging enabled. The next time you find noflushd hogging your
CPU, please don't kill it right away, but run

gdb /usr/sbin/noflushd `pidof noflushd`

instead and send me a backtrace (command bt in gdb). This should give
me a hint at where to start looking at.

Thanks,

Daniel.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-11-08 Thread Udi Meiri
I get this too every week or so, after a daily script that spins
up /dev/hda runs (noflushd has already spun it back down when it
happens). /dev/sda is not ever spun down (not supposed to be).

I'm using 2.6.18 with the Con Kolivas patchset
(http://members.optusnet.com.au/ckolivas/kernel/).

# ps ax|grep noflushd
14550 ?Ss 0:00 /usr/sbin/noflushd -n 20

# cat /proc/partitions
major minor  #blocks  name

   8 0  156290904 sda
   8 16835626 sda1
   8 2  1 sda2
   8 5 488530 sda5
   8 6  146801938 sda6
   3 0  244198584 hda
   3 1   24418768 hda1
   3 2  1 hda2
   3 5   23430771 hda5
   3 6  195366433 hda6
   3 7 979933 hda7
# cat /proc/diskstats
   10 ram0 0 0 0 0 0 0 0 0 0 0 0
   11 ram1 0 0 0 0 0 0 0 0 0 0 0
   12 ram2 0 0 0 0 0 0 0 0 0 0 0
   13 ram3 0 0 0 0 0 0 0 0 0 0 0
   14 ram4 0 0 0 0 0 0 0 0 0 0 0
   15 ram5 0 0 0 0 0 0 0 0 0 0 0
   16 ram6 0 0 0 0 0 0 0 0 0 0 0
   17 ram7 0 0 0 0 0 0 0 0 0 0 0
   18 ram8 0 0 0 0 0 0 0 0 0 0 0
   19 ram9 0 0 0 0 0 0 0 0 0 0 0
   1   10 ram10 0 0 0 0 0 0 0 0 0 0 0
   1   11 ram11 0 0 0 0 0 0 0 0 0 0 0
   1   12 ram12 0 0 0 0 0 0 0 0 0 0 0
   1   13 ram13 0 0 0 0 0 0 0 0 0 0 0
   1   14 ram14 0 0 0 0 0 0 0 0 0 0 0
   1   15 ram15 0 0 0 0 0 0 0 0 0 0 0
   80 sda 5382246 654978 199347321 1912906176 1659272 7149765
70514304 36018288 0 25272725 1949389238
   81 sda1 494234 12362298 3699517 29596136 
   82 sda2 2 4 0 0
   85 sda5 3754246 30033780 353544 2828352
   86 sda6 1808589 156950959 4761234 38089816
   30 hda 155217 6518 1344365 453992 42805 1174172 9737616 10401203
0 575269 10855940 
   31 hda1 322 322 0 0
   32 hda2 2 4 0 0
   35 hda5 198 396 0 0
   36 hda6 160803 1342981 1217202 9737616
   37 hda7 374 374 0 0
  220 hdc 0 0 0 0 0 0 0 0 0 0 0
  22   64 hdd 10 40 200 3 0 0 0 0 0 3 3


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-09-24 Thread Heiko Weinen
Package: noflushd
Version: 2.7.5-2
Severity: normal

This bug is not reproducible here. At random times, noflushd starts
eating up all free cpu-cycles and remains to do so until it is restarted
by hand or killed. No Log-Messages or similar stuff get recorded.
I think it is possible, that this bug only happens on my local kernel,
which has realtime (rt8) and suspend2-patches added, as I didn't notice
above behaviour until i upgraded the whole system and added this
particular kernel. As such, this bug is filed under normal. Other
people might confirm a severity-upgrade :)

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/zsh
Kernel: Linux 2.6.17-rt8resist
Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15)

Versions of packages noflushd depends on:
ii  debconf [debconf-2.0] 1.5.2  Debian configuration management sy
ii  ed0.2-20 The classic unix line editor
ii  libc6 2.3.6-15   GNU C Library: Shared libraries

noflushd recommends no packages.

-- debconf information:
* noflushd/expert: false
* noflushd/disks:
  noflushd/params:
* noflushd/timeout: 30


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#389157: noflushd: Noflushd uses up all the cpu-time it can get

2006-09-24 Thread Daniel Kobras
On Sun, Sep 24, 2006 at 02:10:32PM +0200, Heiko Weinen wrote:
 This bug is not reproducible here. At random times, noflushd starts
 eating up all free cpu-cycles and remains to do so until it is restarted
 by hand or killed. No Log-Messages or similar stuff get recorded.
 I think it is possible, that this bug only happens on my local kernel,
 which has realtime (rt8) and suspend2-patches added, as I didn't notice
 above behaviour until i upgraded the whole system and added this
 particular kernel. As such, this bug is filed under normal. Other
 people might confirm a severity-upgrade :)

Thanks for the report. Have you just installed noflushd for the first
time, or did it work for you on previous kernels? Also, can you please
send in a copy of your /proc/partitions and /proc/diskstat?

Thanks,

Daniel.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]