Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-03-03 Thread Ken Moffat
On Thu, Feb 27, 2014 at 05:52:42AM +0100, Mike Galbraith wrote:
> On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote: 
> > On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
> > > 
> > > I would start with strace to see if a task is looping in userspace, then
> > > move on to perf top -g -p  (or perf record/report) to peek at what
> > > it's up to in the kernel.  Once you have the where, trace_printk() is
> > > the best thing since sliced bread (which ranks just below printk()).
> > > 
> > > -Mike
> >  Thanks.  I'll need to build perf.
> 
> You may want to build the kernel with frame-pointers too, for easy gdb
> list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
> handy both for rummaging live via crash vmlinux /proc/kcore, and for
> leisurely postmortem analysis if you set the box up to crashdump in
> advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger)
> when you see the bad thing happen.  Crash has all kinds of goodies,
> including invocation of gdb.
> 
> -Mike

 In fact, it seems I had already enabled perf, but not built the
userspace part.  However, I was motivated to build crash, and a
kernel with frame pointers plus a _lot_ of debug options.
Unsurprisingly, that ran like a dog, and many things seemed to use a
lot of CPU according to top (e.g. 'X', at least for the first
several minutes after it started), but I couldn't replicate the
problem.  Also proved that my OpenJDK script doesn't normally trash
/etc/passwd- :)

 Then, I went back to the original 3.13.5, proved that perf was
working, and built a new LFS system in chroot, again with 3.13.5, in
chroot.  Booted it as soon as I'd built xorg, then went on through
glib/gtk, firefox, print/photo programs, libreoffice.  No problems at
all, apart from the common "needed to use -j3 for gcc, because it's
a phonon".  NB I forgot to mention that this is all 64-bit, and the
box has 8GB.

 If something does happen here in the next few days (I've still got
to sort out my kde scripts and build that) then I'll reply to this.
But otherwise, I guess it must be cosmic rays, or maybe Gaffer forgot
to feed and muck-out the imps.

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-03-03 Thread Ken Moffat
On Thu, Feb 27, 2014 at 05:52:42AM +0100, Mike Galbraith wrote:
 On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote: 
  On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
   
   I would start with strace to see if a task is looping in userspace, then
   move on to perf top -g -p pid (or perf record/report) to peek at what
   it's up to in the kernel.  Once you have the where, trace_printk() is
   the best thing since sliced bread (which ranks just below printk()).
   
   -Mike
   Thanks.  I'll need to build perf.
 
 You may want to build the kernel with frame-pointers too, for easy gdb
 list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
 handy both for rummaging live via crash vmlinux /proc/kcore, and for
 leisurely postmortem analysis if you set the box up to crashdump in
 advance, and force a dump (poke sysrq-c or echo c  /proc/sysrq-trigger)
 when you see the bad thing happen.  Crash has all kinds of goodies,
 including invocation of gdb.
 
 -Mike

 In fact, it seems I had already enabled perf, but not built the
userspace part.  However, I was motivated to build crash, and a
kernel with frame pointers plus a _lot_ of debug options.
Unsurprisingly, that ran like a dog, and many things seemed to use a
lot of CPU according to top (e.g. 'X', at least for the first
several minutes after it started), but I couldn't replicate the
problem.  Also proved that my OpenJDK script doesn't normally trash
/etc/passwd- :)

 Then, I went back to the original 3.13.5, proved that perf was
working, and built a new LFS system in chroot, again with 3.13.5, in
chroot.  Booted it as soon as I'd built xorg, then went on through
glib/gtk, firefox, print/photo programs, libreoffice.  No problems at
all, apart from the common needed to use -j3 for gcc, because it's
a phonon.  NB I forgot to mention that this is all 64-bit, and the
box has 8GB.

 If something does happen here in the next few days (I've still got
to sort out my kde scripts and build that) then I'll reply to this.
But otherwise, I guess it must be cosmic rays, or maybe Gaffer forgot
to feed and muck-out the imps.

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-27 Thread Gene Heskett
On Wednesday 26 February 2014, Mike Galbraith wrote:
>On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote:
>> On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
>> > I would start with strace to see if a task is looping in userspace,
>> > then move on to perf top -g -p  (or perf record/report) to peek
>> > at what it's up to in the kernel.  Once you have the where,
>> > trace_printk() is the best thing since sliced bread (which ranks
>> > just below printk()).
>> > 
>> > -Mike
>>  
>>  Thanks.  I'll need to build perf.
>
>You may want to build the kernel with frame-pointers too, for easy gdb
>list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
>handy both for rummaging live via crash vmlinux /proc/kcore, and for
>leisurely postmortem analysis if you set the box up to crashdump in
>advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger)
>when you see the bad thing happen.  Crash has all kinds of goodies,
>including invocation of gdb.
>
>-Mike

It doesn't look as if I'll have to guys.  Amanda is about 1/2 done and 
running normally by switching from tar-1.22 to tar-1.27.  So I'm going to 
toddle off in the general direction of a bed and let amanda send me an 
email when its done.

And it did, including some "strange" reports I have yet to decipher. Diffs 
in the tar's probably.  But it did not hang.

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-27 Thread Gene Heskett
On Wednesday 26 February 2014, Mike Galbraith wrote:
On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote:
 On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
  I would start with strace to see if a task is looping in userspace,
  then move on to perf top -g -p pid (or perf record/report) to peek
  at what it's up to in the kernel.  Once you have the where,
  trace_printk() is the best thing since sliced bread (which ranks
  just below printk()).
  
  -Mike
  
  Thanks.  I'll need to build perf.

You may want to build the kernel with frame-pointers too, for easy gdb
list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
handy both for rummaging live via crash vmlinux /proc/kcore, and for
leisurely postmortem analysis if you set the box up to crashdump in
advance, and force a dump (poke sysrq-c or echo c  /proc/sysrq-trigger)
when you see the bad thing happen.  Crash has all kinds of goodies,
including invocation of gdb.

-Mike

It doesn't look as if I'll have to guys.  Amanda is about 1/2 done and 
running normally by switching from tar-1.22 to tar-1.27.  So I'm going to 
toddle off in the general direction of a bed and let amanda send me an 
email when its done.

And it did, including some strange reports I have yet to decipher. Diffs 
in the tar's probably.  But it did not hang.

Cheers, Gene
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Genes Web page http://geneslinuxbox.net:6309/gene

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Mike Galbraith
On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote: 
> On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
> > 
> > I would start with strace to see if a task is looping in userspace, then
> > move on to perf top -g -p  (or perf record/report) to peek at what
> > it's up to in the kernel.  Once you have the where, trace_printk() is
> > the best thing since sliced bread (which ranks just below printk()).
> > 
> > -Mike
>  Thanks.  I'll need to build perf.

You may want to build the kernel with frame-pointers too, for easy gdb
list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
handy both for rummaging live via crash vmlinux /proc/kcore, and for
leisurely postmortem analysis if you set the box up to crashdump in
advance, and force a dump (poke sysrq-c or echo c > /proc/sysrq-trigger)
when you see the bad thing happen.  Crash has all kinds of goodies,
including invocation of gdb.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
>On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
>> I would start with strace to see if a task is looping in userspace,
>> then move on to perf top -g -p  (or perf record/report) to peek
>> at what it's up to in the kernel.  Once you have the where,
>> trace_printk() is the best thing since sliced bread (which ranks just
>> below printk()).
>> 
>> -Mike
>
> Thanks.  I'll need to build perf.
>
>ؤ¸en
I probably will too, but I don't have a huge amount of tracing turned on in 
this kernel.  We'll see what happens tonight & go from there.
FWIW, about all I see in htop is the command line that launched it.


Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Ken Moffat
On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
> 
> I would start with strace to see if a task is looping in userspace, then
> move on to perf top -g -p  (or perf record/report) to peek at what
> it's up to in the kernel.  Once you have the where, trace_printk() is
> the best thing since sliced bread (which ranks just below printk()).
> 
> -Mike
 Thanks.  I'll need to build perf.

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Mike Galbraith
On Thu, 2014-02-27 at 00:52 +, Ken Moffat wrote: 
> Hi,
> 
>  Short summary : on 3.13.5, rm -rf of an application source
> directory on an ext4 filesystem sometimes takes forever (probably
> isn't going anywhere), with one CPU pegged at all-but 100% utilization.
> 
>  I've nearly finished building a new system from source, to check
> various desktop packages in linuxfromscratch.  On this build, much of
> it is things I don't normally use and I needed to upgrade my
> buildscripts, so most of it was built in chroot using 3.10.32.  But
> late last night I booted the new system using 3.13.5 to finish the
> build.  This morning I discovered that rm -rf for the icedtea source
> directory was still running, and had taken over 5 hours of CPU time
> (one CPU seemd to be running at close to 100%, the others had dropped
> to their slowest frequency).  That script was running as root (yeah,
> but it's a new system) and it looks as if /etc/passwd~ had got
> trashed, because I could no longer su or login.  Not sure if that is
> related, at this stage it might just be a side-effect of my scripts.
> 
>  Booted another system, chrooted, fixed up passwords.  Started
> again after commenting out icedtea - I hadn't intended to build
> what was an old version, I'd just forgotten it was in this script -
> that's why I do things in userspace, not the kernel :-(
> 
>  Continued with remaining packages, but a couple of hours later I
> saw a similar "one CPU at 100%, rm -rf GConf source taking forever"
> problem.  Dumped all the processes with Alt-SysRQ-T [ huge log ] but
> at that point 'rm' was merely 'ready' so I doubt there is anything
> useful to see in the log.
> 
>  Built 3.13.4, booted to that.  So far, everything looks good - but
> I'm now building the _current_ version of icedtea, so if this isn't
> a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.
> 
>  Meanwhile, any suggestions about how I can debug this if I hit it
> again, please ?

I would start with strace to see if a task is looping in userspace, then
move on to perf top -g -p  (or perf record/report) to peek at what
it's up to in the kernel.  Once you have the where, trace_printk() is
the best thing since sliced bread (which ranks just below printk()).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
>On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote:
>> On Wednesday 26 February 2014, Ken Moffat wrote:
>> 
>> I don't have any help to offer Ken, but this walks and quacks much like
>> a duck I'm encountering in 3.13.5, with the backup program amanda,
>> which uses gnu tar.  To facilitate intelligent guesses as to the size
>> of the various levels of backup, amanda does a dummy collection using
>> tar, sent to /dev/null, using only the size it reports on the first
>> pass.  Version 1.22, quite old, works on 3.12.9, but not on 3.13.5.  I
>> have now pulled in, built and installed tar-1.27, and rebuilt amanda
>> to let it know that the tar its using is not in /usr/bin, but in
>> /usr/local/bin.  Next run at 1:30AM
>> 
>> This freeze, using 100% of a core, but causing no visible disk activity
>> has killed my backups 3 nights running.  At this point I've no clue as
>> to the cause, but I will be watching this thread closely, it has a
>> similar description.
>> 
>> Cheers, Gene
>
> My box where I see this is a phenom (my intel is still on 3.13.4),
>and from past threads I suspect yours is as well ?
>
>ؤ¸en

Yes, it is Ken, an old slow 2.1 Ghz 9550 phenom, with 8Gb of ram. 32 bit 
PAE build.  I wonder if that is also connected...  But my crystal ball is 
broke, in addition to the wet ram  starting to rust out.  Can't be the 
years, I've only 40 years practice at being 39. ;-)

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Ken Moffat
On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote:
> On Wednesday 26 February 2014, Ken Moffat wrote:
> 
> I don't have any help to offer Ken, but this walks and quacks much like a 
> duck I'm encountering in 3.13.5, with the backup program amanda, which uses 
> gnu tar.  To facilitate intelligent guesses as to the size of the various 
> levels of backup, amanda does a dummy collection using tar, sent to 
> /dev/null, using only the size it reports on the first pass.  Version 1.22, 
> quite old, works on 3.12.9, but not on 3.13.5.  I have now pulled in, built 
> and installed tar-1.27, and rebuilt amanda to let it know that the tar its 
> using is not in /usr/bin, but in /usr/local/bin.  Next run at 1:30AM
> 
> This freeze, using 100% of a core, but causing no visible disk activity has 
> killed my backups 3 nights running.  At this point I've no clue as to the 
> cause, but I will be watching this thread closely, it has a similar 
> description.
> 
> Cheers, Gene

 My box where I see this is a phenom (my intel is still on 3.13.4),
and from past threads I suspect yours is as well ?

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
>Hi,
>
> Short summary : on 3.13.5, rm -rf of an application source
>directory on an ext4 filesystem sometimes takes forever (probably
>isn't going anywhere), with one CPU pegged at all-but 100% utilization.
>
> I've nearly finished building a new system from source, to check
>various desktop packages in linuxfromscratch.  On this build, much of
>it is things I don't normally use and I needed to upgrade my
>buildscripts, so most of it was built in chroot using 3.10.32.  But
>late last night I booted the new system using 3.13.5 to finish the
>build.  This morning I discovered that rm -rf for the icedtea source
>directory was still running, and had taken over 5 hours of CPU time
>(one CPU seemd to be running at close to 100%, the others had dropped
>to their slowest frequency).  That script was running as root (yeah,
>but it's a new system) and it looks as if /etc/passwd~ had got
>trashed, because I could no longer su or login.  Not sure if that is
>related, at this stage it might just be a side-effect of my scripts.
>
> Booted another system, chrooted, fixed up passwords.  Started
>again after commenting out icedtea - I hadn't intended to build
>what was an old version, I'd just forgotten it was in this script -
>that's why I do things in userspace, not the kernel :-(
>
> Continued with remaining packages, but a couple of hours later I
>saw a similar "one CPU at 100%, rm -rf GConf source taking forever"
>problem.  Dumped all the processes with Alt-SysRQ-T [ huge log ] but
>at that point 'rm' was merely 'ready' so I doubt there is anything
>useful to see in the log.
>
> Built 3.13.4, booted to that.  So far, everything looks good - but
>I'm now building the _current_ version of icedtea, so if this isn't
>a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.
>
> Meanwhile, any suggestions about how I can debug this if I hit it
>again, please ?
>
>ؤ¸en

I don't have any help to offer Ken, but this walks and quacks much like a 
duck I'm encountering in 3.13.5, with the backup program amanda, which uses 
gnu tar.  To facilitate intelligent guesses as to the size of the various 
levels of backup, amanda does a dummy collection using tar, sent to 
/dev/null, using only the size it reports on the first pass.  Version 1.22, 
quite old, works on 3.12.9, but not on 3.13.5.  I have now pulled in, built 
and installed tar-1.27, and rebuilt amanda to let it know that the tar its 
using is not in /usr/bin, but in /usr/local/bin.  Next run at 1:30AM

This freeze, using 100% of a core, but causing no visible disk activity has 
killed my backups 3 nights running.  At this point I've no clue as to the 
cause, but I will be watching this thread closely, it has a similar 
description.

Cheers, Gene
-- 
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Genes Web page 

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
Hi,

 Short summary : on 3.13.5, rm -rf of an application source
directory on an ext4 filesystem sometimes takes forever (probably
isn't going anywhere), with one CPU pegged at all-but 100% utilization.

 I've nearly finished building a new system from source, to check
various desktop packages in linuxfromscratch.  On this build, much of
it is things I don't normally use and I needed to upgrade my
buildscripts, so most of it was built in chroot using 3.10.32.  But
late last night I booted the new system using 3.13.5 to finish the
build.  This morning I discovered that rm -rf for the icedtea source
directory was still running, and had taken over 5 hours of CPU time
(one CPU seemd to be running at close to 100%, the others had dropped
to their slowest frequency).  That script was running as root (yeah,
but it's a new system) and it looks as if /etc/passwd~ had got
trashed, because I could no longer su or login.  Not sure if that is
related, at this stage it might just be a side-effect of my scripts.

 Booted another system, chrooted, fixed up passwords.  Started
again after commenting out icedtea - I hadn't intended to build
what was an old version, I'd just forgotten it was in this script -
that's why I do things in userspace, not the kernel :-(

 Continued with remaining packages, but a couple of hours later I
saw a similar one CPU at 100%, rm -rf GConf source taking forever
problem.  Dumped all the processes with Alt-SysRQ-T [ huge log ] but
at that point 'rm' was merely 'ready' so I doubt there is anything
useful to see in the log.

 Built 3.13.4, booted to that.  So far, everything looks good - but
I'm now building the _current_ version of icedtea, so if this isn't
a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.

 Meanwhile, any suggestions about how I can debug this if I hit it
again, please ?

ؤ¸en

I don't have any help to offer Ken, but this walks and quacks much like a 
duck I'm encountering in 3.13.5, with the backup program amanda, which uses 
gnu tar.  To facilitate intelligent guesses as to the size of the various 
levels of backup, amanda does a dummy collection using tar, sent to 
/dev/null, using only the size it reports on the first pass.  Version 1.22, 
quite old, works on 3.12.9, but not on 3.13.5.  I have now pulled in, built 
and installed tar-1.27, and rebuilt amanda to let it know that the tar its 
using is not in /usr/bin, but in /usr/local/bin.  Next run at 1:30AM

This freeze, using 100% of a core, but causing no visible disk activity has 
killed my backups 3 nights running.  At this point I've no clue as to the 
cause, but I will be watching this thread closely, it has a similar 
description.

Cheers, Gene
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Genes Web page http://geneslinuxbox.net:6309/gene

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Ken Moffat
On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote:
 On Wednesday 26 February 2014, Ken Moffat wrote:
 
 I don't have any help to offer Ken, but this walks and quacks much like a 
 duck I'm encountering in 3.13.5, with the backup program amanda, which uses 
 gnu tar.  To facilitate intelligent guesses as to the size of the various 
 levels of backup, amanda does a dummy collection using tar, sent to 
 /dev/null, using only the size it reports on the first pass.  Version 1.22, 
 quite old, works on 3.12.9, but not on 3.13.5.  I have now pulled in, built 
 and installed tar-1.27, and rebuilt amanda to let it know that the tar its 
 using is not in /usr/bin, but in /usr/local/bin.  Next run at 1:30AM
 
 This freeze, using 100% of a core, but causing no visible disk activity has 
 killed my backups 3 nights running.  At this point I've no clue as to the 
 cause, but I will be watching this thread closely, it has a similar 
 description.
 
 Cheers, Gene

 My box where I see this is a phenom (my intel is still on 3.13.4),
and from past threads I suspect yours is as well ?

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
On Wed, Feb 26, 2014 at 08:28:29PM -0500, Gene Heskett wrote:
 On Wednesday 26 February 2014, Ken Moffat wrote:
 
 I don't have any help to offer Ken, but this walks and quacks much like
 a duck I'm encountering in 3.13.5, with the backup program amanda,
 which uses gnu tar.  To facilitate intelligent guesses as to the size
 of the various levels of backup, amanda does a dummy collection using
 tar, sent to /dev/null, using only the size it reports on the first
 pass.  Version 1.22, quite old, works on 3.12.9, but not on 3.13.5.  I
 have now pulled in, built and installed tar-1.27, and rebuilt amanda
 to let it know that the tar its using is not in /usr/bin, but in
 /usr/local/bin.  Next run at 1:30AM
 
 This freeze, using 100% of a core, but causing no visible disk activity
 has killed my backups 3 nights running.  At this point I've no clue as
 to the cause, but I will be watching this thread closely, it has a
 similar description.
 
 Cheers, Gene

 My box where I see this is a phenom (my intel is still on 3.13.4),
and from past threads I suspect yours is as well ?

ؤ¸en

Yes, it is Ken, an old slow 2.1 Ghz 9550 phenom, with 8Gb of ram. 32 bit 
PAE build.  I wonder if that is also connected...  But my crystal ball is 
broke, in addition to the wet ram  starting to rust out.  Can't be the 
years, I've only 40 years practice at being 39. ;-)

Cheers, Gene
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Genes Web page http://geneslinuxbox.net:6309/gene

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Mike Galbraith
On Thu, 2014-02-27 at 00:52 +, Ken Moffat wrote: 
 Hi,
 
  Short summary : on 3.13.5, rm -rf of an application source
 directory on an ext4 filesystem sometimes takes forever (probably
 isn't going anywhere), with one CPU pegged at all-but 100% utilization.
 
  I've nearly finished building a new system from source, to check
 various desktop packages in linuxfromscratch.  On this build, much of
 it is things I don't normally use and I needed to upgrade my
 buildscripts, so most of it was built in chroot using 3.10.32.  But
 late last night I booted the new system using 3.13.5 to finish the
 build.  This morning I discovered that rm -rf for the icedtea source
 directory was still running, and had taken over 5 hours of CPU time
 (one CPU seemd to be running at close to 100%, the others had dropped
 to their slowest frequency).  That script was running as root (yeah,
 but it's a new system) and it looks as if /etc/passwd~ had got
 trashed, because I could no longer su or login.  Not sure if that is
 related, at this stage it might just be a side-effect of my scripts.
 
  Booted another system, chrooted, fixed up passwords.  Started
 again after commenting out icedtea - I hadn't intended to build
 what was an old version, I'd just forgotten it was in this script -
 that's why I do things in userspace, not the kernel :-(
 
  Continued with remaining packages, but a couple of hours later I
 saw a similar one CPU at 100%, rm -rf GConf source taking forever
 problem.  Dumped all the processes with Alt-SysRQ-T [ huge log ] but
 at that point 'rm' was merely 'ready' so I doubt there is anything
 useful to see in the log.
 
  Built 3.13.4, booted to that.  So far, everything looks good - but
 I'm now building the _current_ version of icedtea, so if this isn't
 a new 3.13.5 problem I guess I'm fairly likely to see it tomorrow.
 
  Meanwhile, any suggestions about how I can debug this if I hit it
 again, please ?

I would start with strace to see if a task is looping in userspace, then
move on to perf top -g -p pid (or perf record/report) to peek at what
it's up to in the kernel.  Once you have the where, trace_printk() is
the best thing since sliced bread (which ranks just below printk()).

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Ken Moffat
On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
 
 I would start with strace to see if a task is looping in userspace, then
 move on to perf top -g -p pid (or perf record/report) to peek at what
 it's up to in the kernel.  Once you have the where, trace_printk() is
 the best thing since sliced bread (which ranks just below printk()).
 
 -Mike
 Thanks.  I'll need to build perf.

ĸen
-- 
das eine Mal als Tragödie, dieses Mal als Farce
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Gene Heskett
On Wednesday 26 February 2014, Ken Moffat wrote:
On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
 I would start with strace to see if a task is looping in userspace,
 then move on to perf top -g -p pid (or perf record/report) to peek
 at what it's up to in the kernel.  Once you have the where,
 trace_printk() is the best thing since sliced bread (which ranks just
 below printk()).
 
 -Mike

 Thanks.  I'll need to build perf.

ؤ¸en
I probably will too, but I don't have a huge amount of tracing turned on in 
this kernel.  We'll see what happens tonight  go from there.
FWIW, about all I see in htop is the command line that launched it.


Cheers, Gene
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Genes Web page http://geneslinuxbox.net:6309/gene

NOTICE: Will pay 100 USD for an HP-4815A defective but
complete probe assembly.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.13.5 : rm -rf running forever, one cpu at approx 100%

2014-02-26 Thread Mike Galbraith
On Thu, 2014-02-27 at 03:45 +, Ken Moffat wrote: 
 On Thu, Feb 27, 2014 at 04:26:35AM +0100, Mike Galbraith wrote:
  
  I would start with strace to see if a task is looping in userspace, then
  move on to perf top -g -p pid (or perf record/report) to peek at what
  it's up to in the kernel.  Once you have the where, trace_printk() is
  the best thing since sliced bread (which ranks just below printk()).
  
  -Mike
  Thanks.  I'll need to build perf.

You may want to build the kernel with frame-pointers too, for easy gdb
list *0x(hexnum) of *func()+0x(hexoffset) use.  Crash is also pretty
handy both for rummaging live via crash vmlinux /proc/kcore, and for
leisurely postmortem analysis if you set the box up to crashdump in
advance, and force a dump (poke sysrq-c or echo c  /proc/sysrq-trigger)
when you see the bad thing happen.  Crash has all kinds of goodies,
including invocation of gdb.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/