Re: Kernel oops after heavy filesystem work

2014-07-25 Thread sfjro
James B: > The system has now been up for 46 hours and is still going strong. > I think we can reasonably be sure that this problem has been fixed - that one > single "else" fixes it :) Great! > My only wonder is why it hasn't surfaced until now, well not in x86_64 and > x86 area anyway. > I'

Re: Kernel oops after heavy filesystem work

2014-07-25 Thread James B
The system has now been up for 46 hours and is still going strong. I think we can reasonably be sure that this problem has been fixed - that one single "else" fixes it :) I'm going to stop the test and continue to do other things that I need to do. Many thanks for helping me going through this.

Re: Kernel oops after heavy filesystem work

2014-07-24 Thread sfjro
James B: > The system has been up for 15 hours; normally by now it would have given me > the oops. But it still looks like it is going strong. I'll update you again > as time goes on, keep fingers crossed ... Digging into the aufs history, I've found the bug was born in aufs1 (CVS-age) On 2008

Re: Kernel oops after heavy filesystem work

2014-07-24 Thread sfjro
James B: > The system has been up for 15 hours; normally by now it would have given me > the oops. But it still looks like it is going strong. I'll update you again > as time goes on, keep fingers crossed ... tHat Is a gOOd nEWs. Sorry, I cannot type well with my fingers crossing... J. R. Oka

Re: Kernel oops after heavy filesystem work

2014-07-24 Thread James B
On Wed, 23 Jul 2014 23:36:07 +0700 James B wrote: > > Thank you. > I will bring up the system to the latest AND apply the patch you gave me just > now. > The system has been up for 15 hours; normally by now it would have given me the oops. But it still looks like it is going strong. I'll up

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread James B
On Thu, 24 Jul 2014 01:29:47 +0900 sf...@users.sourceforge.net wrote: > > Hmm,,, > I think the missing au_unpin is unrelated to this problem. But > unfortunately I am unsure because the root cause of your problem is not > identified yet. > Mostly the missing au_unpin bug causes the problem of ret

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread sfjro
James B: > I also saw earlier last week you fixed a bug on au_unpin. > I haven't applied that - you think I should to that too? Hmm,,, I think the missing au_unpin is unrelated to this problem. But unfortunately I am unsure because the root cause of your problem is not identified yet. Mostly the

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread James B
Thanks, I will apply this. I also saw earlier last week you fixed a bug on au_unpin. I haven't applied that - you think I should to that too? cheers! On Thu, 24 Jul 2014 01:15:36 +0900 sf...@users.sourceforge.net wrote: > > James B: > > Okay now I confirm that it is zero. I had to reboot the b

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread sfjro
James B: > Okay now I confirm that it is zero. I had to reboot the box as it gave me rcu > sched stall anyway ... so this time around I will run with the default > printk, it seems to be giving me some output from "dmesg -w" (I've got > something like "d-1" or "i-1 aufs do_rename" followed by t

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread James B
On Wed, 23 Jul 2014 22:36:48 +0700 James B wrote: > On Thu, 24 Jul 2014 00:27:53 +0900 > sf...@users.sourceforge.net wrote: > > > No, I am quite sure that one is set to zero. I can't verify it without > rebooting because the system gets so bogged down trying to push the logs > through the ser

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread James B
On Thu, 24 Jul 2014 00:27:53 +0900 sf...@users.sourceforge.net wrote: > > > Just a guess, you still set 1 to /sys/module/aufs/parameters/debug? > If so, stop it. au_debug_on() sets 1 to the module parameter, and > au_debug_off() resets it to zero. The debug messages are printed during > "debug"

Re: Kernel oops after heavy filesystem work

2014-07-23 Thread sfjro
James B: > Even with this I still get thousands of entries coming out from the serial > console (I've already killed klogd and syslogd - if I don't I will get a > multiple of that, each time klogd tries to open "etc"). > Mainly output h_d_revalidate from "ps" reading from "proc". > It this norm

Re: Kernel oops after heavy filesystem work

2014-07-20 Thread sfjro
James B: > Update on the issue: I've decided to try another path - Russel King the ARM > kernel maintainer has updated his cubox-i patches to work with 3.16-rc kernel > (so now the cubox-i can run mainline kernel + his 40-odd patches). I have > tried building 3.16-rc5 kernel with his patches (u

Re: Kernel oops after heavy filesystem work

2014-07-19 Thread James B
Thanks Pete. On Fri, 18 Jul 2014 11:37:54 +0200 Hans-Peter Jansen wrote: > On Donnerstag, 17. Juli 2014 20:41:09 James B wrote: > > On Thu, 17 Jul 2014 13:36:47 +0700 > > > > For the receiving part, you can use a usb serial converter just fine! Actually for i4p all I need is a USB cable since

Re: Kernel oops after heavy filesystem work

2014-07-18 Thread Hans-Peter Jansen
On Donnerstag, 17. Juli 2014 20:41:09 James B wrote: > On Thu, 17 Jul 2014 13:36:47 +0700 > > James B wrote: > > No worries at all. I'll rebuild the kernel with the conditions, and get > > back to you again. As it turns out, I didn't enable netconsole so I need > > to re-build the kernel anyway :

Re: Kernel oops after heavy filesystem work

2014-07-17 Thread James B
On Thu, 17 Jul 2014 13:36:47 +0700 James B wrote: > > No worries at all. I'll rebuild the kernel with the conditions, and get back > to you again. > As it turns out, I didn't enable netconsole so I need to re-build the kernel > anyway :) > Netconsole doesn't work :( netpoll: netconsole: wlan

Re: Kernel oops after heavy filesystem work

2014-07-16 Thread James B
Hi, On Thu, 17 Jul 2014 14:56:19 +0900 sf...@users.sourceforge.net wrote: > > > Thanks. The diff looks good, but I found an issue. > Please add a condition to the code we added such like this. > if (d) { > au_debug_on(); > AuDbgDentry(d); > au_deb

Re: Kernel oops after heavy filesystem work

2014-07-16 Thread sfjro
James B: > > Would post the output of diff(1) to confirm the changes you made? > > Attached. Thanks. The diff looks good, but I found an issue. Please add a condition to the code we added such like this. if (d) { au_debug_on(); AuDbgDentry(d);

Re: Kernel oops after heavy filesystem work

2014-07-16 Thread James B
On Thu, 17 Jul 2014 12:50:43 +0900 sf...@users.sourceforge.net wrote: > > Would post the output of diff(1) to confirm the changes you made? Attached. > By the way, my weekly test for aufs release takes 6 - 8 hours per a > version and I am maintaining several versions. That's a lot of work, I

Re: Kernel oops after heavy filesystem work

2014-07-16 Thread sfjro
James B: > My first attempt resulting in a total crash while I was sleeping, so I can't > see anything. > I will try again. Would post the output of diff(1) to confirm the changes you made? By the way, my weekly test for aufs release takes 6 - 8 hours per a version and I am maintaining several v

Re: Kernel oops after heavy filesystem work

2014-07-16 Thread James B
On Wed, 16 Jul 2014 02:14:49 +0700 James B wrote: > On Tue, 15 Jul 2014 23:49:20 +0900 > sf...@users.sourceforge.net wrote: > > > > > In this case, this approach may be more effective. > > - insert this just before every dput() in aufs_rename(). > > au_debug_on(); > > AuDbgDentry(d); >

Re: Kernel oops after heavy filesystem work

2014-07-15 Thread sfjro
James B: > The filesystem operations are indeed not heavy; I have a few cronjobs that > performs various sanity checks by doing those mv/ls/cat/touch etc and they > will run at the same frequency whether the CPU is loaded or not. When the CPU > is not loaded the kernel can last longer (so far I

Re: Kernel oops after heavy filesystem work

2014-07-15 Thread James B
Thanks you for looking into this. On Tue, 15 Jul 2014 14:18:09 +0900 sf...@users.sourceforge.net wrote: > > Although I can guess it is the repeated mv, ls, cat, touch, etc. under > aufs, would you describe more specifically about the heavy load? > Sorry for the misleading subject title :( The

Re: Kernel oops after heavy filesystem work

2014-07-14 Thread sfjro
Hello James, James B: > I've gotten this kernel oops after running the system continuously for about > 14-15 hours. > This only happens when the system is under load most of the time (load is > around 95% of CPU most of the time). > If the system load is lower (idle being 40-50% load) this doe