Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Thu, 25 Jan 2007, Mark Hahn wrote: > > Something is seriously wrong with that OOM killer. > > do you know you don't have to operate in OOM-slaughter mode? > > "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where > the kernel tracks your "committed" memory needs, and will eventually cause > some allocations to fail. > this is often much nicer than the default random OOM slaughter. > (you probably also need to adjust vm.overcommit_ratio with some knowlege of > your MemTotal and SwapTotal.) > > regards, mark hahn. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > > # sysctl -a | grep vm.over vm.overcommit_ratio = 50 vm.overcommit_memory = 0 I'll have to experiment with these options, thanks for the info! Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Something is seriously wrong with that OOM killer. do you know you don't have to operate in OOM-slaughter mode? "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where the kernel tracks your "committed" memory needs, and will eventually cause some allocations to fail. this is often much nicer than the default random OOM slaughter. (you probably also need to adjust vm.overcommit_ratio with some knowlege of your MemTotal and SwapTotal.) regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Wed, 24 Jan 2007, Bill Cizek wrote: > Justin Piszcz wrote: > > On Mon, 22 Jan 2007, Andrew Morton wrote: > > > > > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz > > > > <[EMAIL PROTECTED]> wrote: > > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to > > > > invoke the OOM killer and kill all of my processes? > > > > > > Running with PREEMPT OFF lets me copy the file!! The machine LAGS > > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of > > lag, but hey, it does not crash!! I will boot the older kernel with preempt > > on and see if I can get you that information you requested. > > > Justin, > > According to your kernel_ring_buffer.txt (attached to another email), you are > using "anticipatory" as your io scheduler: > 289 Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop registered > 290 Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler anticipatory > registered (default) > > I had a problem with this scheduler where my system would occasionally lockup > during heavy I/O. Sometimes it would fix itself, sometimes I had to reboot. > I changed to the "CFQ" io scheduler and my system has worked fine since then. > > CFQ has to be built into the kernel (under BlockLayer/IOSchedulers). It can > be selected as default or you can set it during runtime: > > echo cfq > /sys/block//queue/scheduler > ... > > Hope this helps, > Bill > > I used to run CFQ awhile back but then I switched over to AS as it has better performance for my workloads, currently, I am running with PREEMPT off, if I see any additional issues, I will switch to the CFQ scheduler. Right now, its the OOM killer that is going crazy. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Thu, 25 Jan 2007, Nick Piggin wrote: > Justin Piszcz wrote: > > > > On Mon, 22 Jan 2007, Andrew Morton wrote: > > > >After the oom-killing, please see if you can free up the ZONE_NORMAL memory > > >via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can > > >work out what happened to the missing couple-of-hundred MB from > > >ZONE_NORMAL. > > > > > > > Running with PREEMPT OFF lets me copy the file!! The machine LAGS > > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of > > lag, but hey, it does not crash!! I will boot the older kernel with preempt > > on and see if I can get you that information you requested. > > It wouldn't be a bad idea to recompile the new kernel with preempt on > and get the info from there. > > It is usually best to be working with the most recent kernels. We can > always backport any important fixes if we need to. > > Thanks, > Nick > > -- > SUSE Labs, Novell Inc. > Send instant messages to your online friends http://au.messenger.yahoo.com - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > In my tests for the most part I am using the latest kernels. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Thu, 25 Jan 2007, Pavel Machek wrote: > Hi! > > > > Is it highmem-related? Can you try it with mem=256M? > > > > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot > > 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I > > use an onboard graphics controller that has 128MB of RAM allocated to it > > and I believe the ICH8 chipset also uses some memory, in any event mem=256 > > causes the machine to lockup before it can even get to the boot/init > > processes, the two leds on the keyboard were blinking, caps lock and > > scroll lock and I saw no console at all! > > Okay, so try mem=700M or disable CONFIG_HIGHMEM or something. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > I forgot to remove the mem=700M with PREEMPT off and this is what I saw this morning: [18005.261875] Killed process 4768 (screen) [18005.262343] Out of memory: kill process 4793 (screen) score 385 or a child [18005.262350] Killed process 4793 (screen) [18005.378536] Out of memory: kill process 4825 (screen) score 385 or a child [18005.378542] Killed process 4825 (screen) [18005.378547] Out of memory: kill process 4825 (screen) score 385 or a child [18005.378553] Killed process 4825 (screen) [18005.413072] Out of memory: kill process 4875 (screen) score 385 or a child [18005.413079] Killed process 4875 (screen) [18005.423735] Out of memory: kill process 4970 (screen) score 385 or a child [18005.423742] Killed process 4970 (screen) [18005.431391] Out of memory: kill process 21365 (xfs_fsr) score 286 or a child [18005.431398] Killed process 21365 (xfs_fsr) $ screen -ls There are screens on: 2532.pts-0.p34 (Dead ???) 3776.pts-2.p34 (Dead ???) 4768.pts-7.p34 (Dead ???) 4793.pts-9.p34 (Dead ???) 4825.pts-11.p34 (Dead ???) 4875.pts-13.p34 (Dead ???) 4970.pts-15.p34 (Dead ???) Lovely... $ uname -r 2.6.20-rc5 Something is seriously wrong with that OOM killer. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Justin Piszcz wrote: On Mon, 22 Jan 2007, Andrew Morton wrote: On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> wrote: Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke the OOM killer and kill all of my processes? Running with PREEMPT OFF lets me copy the file!! The machine LAGS occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of lag, but hey, it does not crash!! I will boot the older kernel with preempt on and see if I can get you that information you requested. Justin, According to your kernel_ring_buffer.txt (attached to another email), you are using "anticipatory" as your io scheduler: 289 Jan 24 18:35:25 p34 kernel: [0.142130] io scheduler noop registered 290 Jan 24 18:35:25 p34 kernel: [0.142194] io scheduler anticipatory registered (default) I had a problem with this scheduler where my system would occasionally lockup during heavy I/O. Sometimes it would fix itself, sometimes I had to reboot. I changed to the "CFQ" io scheduler and my system has worked fine since then. CFQ has to be built into the kernel (under BlockLayer/IOSchedulers). It can be selected as default or you can set it during runtime: echo cfq > /sys/block//queue/scheduler ... Hope this helps, Bill - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Thu, 25 Jan 2007, Pavel Machek wrote: > Hi! > > > > Is it highmem-related? Can you try it with mem=256M? > > > > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot > > 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I > > use an onboard graphics controller that has 128MB of RAM allocated to it > > and I believe the ICH8 chipset also uses some memory, in any event mem=256 > > causes the machine to lockup before it can even get to the boot/init > > processes, the two leds on the keyboard were blinking, caps lock and > > scroll lock and I saw no console at all! > > Okay, so try mem=700M or disable CONFIG_HIGHMEM or something. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Looks like you may be onto something Pavel, has not invoked the OOM killer yet using mem=700M, I see the swap increasing and the speed of the copy is only 14-15MB/s, when mem= is off (giving me all memory w/preempt) I get 45-65MB/s. With append="mem=700M", seen below: top - 19:38:46 up 1 min, 3 users, load average: 1.24, 0.38, 0.13 Tasks: 172 total, 1 running, 171 sleeping, 0 stopped, 0 zombie Cpu(s): 10.6%us, 4.6%sy, 1.3%ni, 69.6%id, 13.9%wa, 0.0%hi, 0.1%si, 0.0%st Mem:705512k total, 699268k used, 6244k free, 12k buffers Swap: 2200760k total,18520k used, 2182240k free,34968k cached (with mem=700M): procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 0 40276 7456 12 2318400 13196 14904 1204 4756 1 2 50 47 0 2 40276 6156 12 2426400 20096 18352 1293 6400 1 4 49 46 0 2 40276 6264 12 2426400 15504 17836 1219 5202 0 2 50 48 0 2 40276 6144 12 2437600 14348 14453 1190 4815 0 3 49 48 0 2 40276 6156 12 2450400 11396 12724 1169 3724 1 2 50 48 0 1 40276 7532 12 2327200 11412 13121 1183 4017 0 2 50 48 0 1 40276 7944 12 2264400 19084 19144 1234 6548 0 4 50 46 Almost there, looks like its going to make it. -rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb -rw-r--r-- 1 user group 15399329792 2007-01-24 19:55 18gb.copy Hrmm, with preempt off it works and with preempt ON and mem=700M it works. I guess I will run with preempt off as I'd prefer to have the memory available-- and tolerate the large lag bursts w/ no preemption. Yup it worked. -rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb -rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb.copy Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Justin Piszcz wrote: On Mon, 22 Jan 2007, Andrew Morton wrote: After the oom-killing, please see if you can free up the ZONE_NORMAL memory via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can work out what happened to the missing couple-of-hundred MB from ZONE_NORMAL. Running with PREEMPT OFF lets me copy the file!! The machine LAGS occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of lag, but hey, it does not crash!! I will boot the older kernel with preempt on and see if I can get you that information you requested. It wouldn't be a bad idea to recompile the new kernel with preempt on and get the info from there. It is usually best to be working with the most recent kernels. We can always backport any important fixes if we need to. Thanks, Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Thu, 25 Jan 2007, Pavel Machek wrote: > Hi! > > > > Is it highmem-related? Can you try it with mem=256M? > > > > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot > > 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I > > use an onboard graphics controller that has 128MB of RAM allocated to it > > and I believe the ICH8 chipset also uses some memory, in any event mem=256 > > causes the machine to lockup before it can even get to the boot/init > > processes, the two leds on the keyboard were blinking, caps lock and > > scroll lock and I saw no console at all! > > Okay, so try mem=700M or disable CONFIG_HIGHMEM or something. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Ok, this will be my last test for tonight, trying now. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Hi! > > Is it highmem-related? Can you try it with mem=256M? > > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot > 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I > use an onboard graphics controller that has 128MB of RAM allocated to it > and I believe the ICH8 chipset also uses some memory, in any event mem=256 > causes the machine to lockup before it can even get to the boot/init > processes, the two leds on the keyboard were blinking, caps lock and > scroll lock and I saw no console at all! Okay, so try mem=700M or disable CONFIG_HIGHMEM or something. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Mon, 22 Jan 2007, Andrew Morton wrote: > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> > > wrote: > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > > the OOM killer and kill all of my processes? > > What's that? Software raid or hardware raid? If the latter, which driver? > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > happens every time! > > > > Anything to try? Any other output needed? Can someone shed some light on > > this situation? > > > > Thanks. > > > > > > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) > > > > procs ---memory-- ---swap-- -io -system-- > > cpu > > r b swpd free buff cache si sobibo in cs us sy id > > wa > > 0 7764 50348 12 126998800 53632 172 1902 4600 1 8 > > 29 62 > > 0 7764 49420 12 126000400 53632 34368 1871 6357 2 11 > > 48 40 > > The wordwrapping is painful :( > > > > > The last lines of dmesg: > > [ 5947.199985] lowmem_reserve[]: 0 0 0 > > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB > > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB > > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB > > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB > > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB > > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB > > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 > > [ 5947.200055] Free swap = 2197628kB > > [ 5947.200058] Total swap = 2200760kB > > [ 5947.200060] Free swap: 2197628kB > > [ 5947.205664] 517888 pages of RAM > > [ 5947.205671] 288512 pages of HIGHMEM > > [ 5947.205673] 5666 reserved pages > > [ 5947.205675] 257163 pages shared > > [ 5947.205678] 600 pages swap cached > > [ 5947.205680] 88876 pages dirty > > [ 5947.205682] 115111 pages writeback > > [ 5947.205684] 5608 pages mapped > > [ 5947.205686] 49367 pages slab > > [ 5947.205688] 541 pages pagetables > > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a > > child > > [ 5947.205801] Killed process 1853 (named) > > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, > > oomkilladj=0 > > [ 5947.206621] [] out_of_memory+0x17b/0x1b0 > > [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 > > [ 5947.206636] [] __pte_alloc+0x1d/0x90 > > [ 5947.206643] [] copy_page_range+0x357/0x380 > > [ 5947.206649] [] copy_process+0x765/0xfc0 > > [ 5947.206655] [] alloc_pid+0x1b9/0x280 > > [ 5947.206662] [] do_fork+0x79/0x1e0 > > [ 5947.206674] [] do_pipe+0x5f/0xc0 > > [ 5947.206680] [] sys_clone+0x36/0x40 > > [ 5947.206686] [] syscall_call+0x7/0xb > > [ 5947.206691] [] __sched_text_start+0x853/0x950 > > [ 5947.206698] === > > Important information from the oom-killing event is missing. Please send > it all. > > >From your earlier reports we have several hundred MB of ZONE_NORMAL memory > which has gone awol. > > Please include /proc/meminfo from after the oom-killing. > > Please work out what is using all that slab memory, via /proc/slabinfo. > > After the oom-killing, please see if you can free up the ZONE_NORMAL memory > via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can > work out what happened to the missing couple-of-hundred MB from > ZONE_NORMAL. > > Running with PREEMPT OFF lets me copy the file!! The machine LAGS occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of lag, but hey, it does not crash!! I will boot the older kernel with preempt on and see if I can get you that information you requested. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
And FYI yes I used mem=256M just as you said, not mem=256. Justin. On Wed, 24 Jan 2007, Justin Piszcz wrote: > > Is it highmem-related? Can you try it with mem=256M? > > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot > 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I > use an onboard graphics controller that has 128MB of RAM allocated to it > and I believe the ICH8 chipset also uses some memory, in any event mem=256 > causes the machine to lockup before it can even get to the boot/init > processes, the two leds on the keyboard were blinking, caps lock and > scroll lock and I saw no console at all! > > Justin. > > On Mon, 22 Jan 2007, Justin Piszcz wrote: > > > > > > > On Mon, 22 Jan 2007, Pavel Machek wrote: > > > > > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote: > > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to > > > > invoke > > > > the OOM killer and kill all of my processes? > > > > > > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > > > happens every time! > > > > > > > > Anything to try? Any other output needed? Can someone shed some light > > > > on > > > > this situation? > > > > > > Is it highmem-related? Can you try it with mem=256M? > > > > > > Pavel > > > -- > > > (english) http://www.livejournal.com/~pavelmachek > > > (cesky, pictures) > > > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > > > - > > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > > the body of a message to [EMAIL PROTECTED] > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > I will give this a try later or tomorrow, I cannot have my machine crash > > at the moment. > > > > Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if > > that has anything to do with it because after the system kill -9's all the > > processes etc, my terminal looks like garbage. > > > > Justin. > > > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Mon, 22 Jan 2007, Andrew Morton wrote: > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> > > wrote: > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > > the OOM killer and kill all of my processes? > > What's that? Software raid or hardware raid? If the latter, which driver? > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > happens every time! > > > > Anything to try? Any other output needed? Can someone shed some light on > > this situation? > > > > Thanks. > > > > > > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) > > > > procs ---memory-- ---swap-- -io -system-- > > cpu > > r b swpd free buff cache si sobibo in cs us sy id > > wa > > 0 7764 50348 12 126998800 53632 172 1902 4600 1 8 > > 29 62 > > 0 7764 49420 12 126000400 53632 34368 1871 6357 2 11 > > 48 40 > > The wordwrapping is painful :( > > > > > The last lines of dmesg: > > [ 5947.199985] lowmem_reserve[]: 0 0 0 > > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB > > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB > > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB > > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB > > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB > > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB > > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 > > [ 5947.200055] Free swap = 2197628kB > > [ 5947.200058] Total swap = 2200760kB > > [ 5947.200060] Free swap: 2197628kB > > [ 5947.205664] 517888 pages of RAM > > [ 5947.205671] 288512 pages of HIGHMEM > > [ 5947.205673] 5666 reserved pages > > [ 5947.205675] 257163 pages shared > > [ 5947.205678] 600 pages swap cached > > [ 5947.205680] 88876 pages dirty > > [ 5947.205682] 115111 pages writeback > > [ 5947.205684] 5608 pages mapped > > [ 5947.205686] 49367 pages slab > > [ 5947.205688] 541 pages pagetables > > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a > > child > > [ 5947.205801] Killed process 1853 (named) > > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, > > oomkilladj=0 > > [ 5947.206621] [] out_of_memory+0x17b/0x1b0 > > [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 > > [ 5947.206636] [] __pte_alloc+0x1d/0x90 > > [ 5947.206643] [] copy_page_range+0x357/0x380 > > [ 5947.206649] [] copy_process+0x765/0xfc0 > > [ 5947.206655] [] alloc_pid+0x1b9/0x280 > > [ 5947.206662] [] do_fork+0x79/0x1e0 > > [ 5947.206674] [] do_pipe+0x5f/0xc0 > > [ 5947.206680] [] sys_clone+0x36/0x40 > > [ 5947.206686] [] syscall_call+0x7/0xb > > [ 5947.206691] [] __sched_text_start+0x853/0x950 > > [ 5947.206698] === > > Important information from the oom-killing event is missing. Please send > it all. > > >From your earlier reports we have several hundred MB of ZONE_NORMAL memory > which has gone awol. > > Please include /proc/meminfo from after the oom-killing. > > Please work out what is using all that slab memory, via /proc/slabinfo. > > After the oom-killing, please see if you can free up the ZONE_NORMAL memory > via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can > work out what happened to the missing couple-of-hundred MB from > ZONE_NORMAL. > > Trying this now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
> Is it highmem-related? Can you try it with mem=256M? Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 2.6.20-rc5-6 single to get back into my machine, very nasty. Remember I use an onboard graphics controller that has 128MB of RAM allocated to it and I believe the ICH8 chipset also uses some memory, in any event mem=256 causes the machine to lockup before it can even get to the boot/init processes, the two leds on the keyboard were blinking, caps lock and scroll lock and I saw no console at all! Justin. On Mon, 22 Jan 2007, Justin Piszcz wrote: > > > On Mon, 22 Jan 2007, Pavel Machek wrote: > > > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote: > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to > > > invoke > > > the OOM killer and kill all of my processes? > > > > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > > happens every time! > > > > > > Anything to try? Any other output needed? Can someone shed some light > > > on > > > this situation? > > > > Is it highmem-related? Can you try it with mem=256M? > > > > Pavel > > -- > > (english) http://www.livejournal.com/~pavelmachek > > (cesky, pictures) > > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > I will give this a try later or tomorrow, I cannot have my machine crash > at the moment. > > Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if > that has anything to do with it because after the system kill -9's all the > processes etc, my terminal looks like garbage. > > Justin. > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Tue, 23 Jan 2007 11:37:09 +1100 Donald Douwsma <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > >> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> > >> wrote: > >> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > >> the OOM killer and kill all of my processes? > > > > What's that? Software raid or hardware raid? If the latter, which driver? > > I've hit this using local disk while testing xfs built against 2.6.20-rc4 > (SMP x86_64) > > dmesg follows, I'm not sure if anything in this is useful after the first > event as our automated tests continued on > after the failure. This looks different. > ... > > Mem-info: > Node 0 DMA per-cpu: > CPU0: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: > 0 > CPU1: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: > 0 > CPU2: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: > 0 > CPU3: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: > 0 > Node 0 DMA32 per-cpu: > CPU0: Hot: hi: 186, btch: 31 usd: 31 Cold: hi: 62, btch: 15 usd: > 53 > CPU1: Hot: hi: 186, btch: 31 usd: 2 Cold: hi: 62, btch: 15 usd: > 60 > CPU2: Hot: hi: 186, btch: 31 usd: 20 Cold: hi: 62, btch: 15 usd: > 47 > CPU3: Hot: hi: 186, btch: 31 usd: 25 Cold: hi: 62, btch: 15 usd: > 56 > Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 > mapped:32 pagetables:637 No dirty pages, no pages under writeback. > Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB > present:9376kB pages_scanned:3296 > all_unreclaimable? yes > lowmem_reserve[]: 0 2003 2003 > Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB > inactive:1981624kB present:2052068kB Inactive list is filled. > pages_scanned:4343329 all_unreclaimable? yes We scanned our guts out and decided that nothing was reclaimable. > No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child > No available memory (MPOL_BIND): kill process 7914 (top) score 0 or a child > No available memory (MPOL_BIND): kill process 4166 (nscd) score 0 or a child > No available memory (MPOL_BIND): kill process 17869 (xfs_repair) score 0 or a > child But in all cases a constrained memory policy was in use. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Andrew Morton wrote: >> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> >> wrote: >> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke >> the OOM killer and kill all of my processes? > > What's that? Software raid or hardware raid? If the latter, which driver? I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP x86_64) dmesg follows, I'm not sure if anything in this is useful after the first event as our automated tests continued on after the failure. > Please include /proc/meminfo from after the oom-killing. > > Please work out what is using all that slab memory, via /proc/slabinfo. Sorry I didnt pick this up ether. I'll try to reproduce this and gather some more detailed info for a single event. Donald ... XFS mounting filesystem sdb5 Ending clean XFS mount for filesystem: sdb5 XFS mounting filesystem sdb5 Ending clean XFS mount for filesystem: sdb5 hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Call Trace: [] out_of_memory+0x70/0x25d [] __alloc_pages+0x22c/0x2b5 [] alloc_page_vma+0x71/0x76 [] read_swap_cache_async+0x45/0xd8 [] swapin_readahead+0x60/0xd3 [] __handle_mm_fault+0x703/0x9d8 [] do_page_fault+0x42b/0x7b3 [] do_readv_writev+0x176/0x18b [] thread_return+0x0/0xed [] __const_udelay+0x2c/0x2d [] scsi_done+0x0/0x17 [] error_exit+0x0/0x84 Mem-info: Node 0 DMA per-cpu: CPU0: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU1: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU2: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU3: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: Hot: hi: 186, btch: 31 usd: 31 Cold: hi: 62, btch: 15 usd: 53 CPU1: Hot: hi: 186, btch: 31 usd: 2 Cold: hi: 62, btch: 15 usd: 60 CPU2: Hot: hi: 186, btch: 31 usd: 20 Cold: hi: 62, btch: 15 usd: 47 CPU3: Hot: hi: 186, btch: 31 usd: 25 Cold: hi: 62, btch: 15 usd: 56 Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 mapped:32 pagetables:637 Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3296 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB inactive:1981624kB present:2052068kB pages_scanned:4343329 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 6684kB Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239 Free swap = 1088524kB Total swap = 3140668kB Free swap: 1088524kB 524224 pages of RAM 9619 reserved pages 259 pages shared 496388 pages swap cached No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child Killed process 3626 (hald-addon-acpi) top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0 Call Trace: [] out_of_memory+0x70/0x25d [] __alloc_pages+0x22c/0x2b5 [] alloc_pages_current+0x74/0x79 [] __page_cache_alloc+0xb/0xe [] __do_page_cache_readahead+0xa1/0x217 [] io_schedule+0x28/0x33 [] __wait_on_bit_lock+0x5b/0x66 [] __lock_page+0x72/0x78 [] do_page_cache_readahead+0x4e/0x5a [] filemap_nopage+0x140/0x30c [] __handle_mm_fault+0x1fb/0x9d8 [] do_page_fault+0x42b/0x7b3 [] __wake_up+0x43/0x50 [] tty_ldisc_deref+0x71/0x76 [] error_exit+0x0/0x84 Mem-info: Node 0 DMA per-cpu: CPU0: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU1: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU2: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 CPU3: Hot: hi:0, btch: 1 usd: 0 Cold: hi:0, btch: 1 usd: 0 Node 0 DMA32 per-cpu: CPU0: Hot: hi: 186, btch: 31 usd: 31 Cold: hi: 62, btch: 15 usd: 53 CPU1: Hot: hi: 186, btch: 31 usd: 2 Cold: hi: 62, btch: 15 usd: 60 CPU2: Hot: hi: 186, btch: 31 usd: 1 Cold: hi: 62, btch: 15 usd: 10 CPU3: Hot: hi: 186, btch: 31 usd: 25 Cold: hi: 62, btch: 15 usd: 26 Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 mapped:32 pagetables:637 Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3328 all_unreclaimable? yes lowmem_reserve[]: 0 2003 2003 Node 0 DMA32 free:5904kB min:5712kB low:7140kB high:8568kB active:360kB inactive:1983092kB present:2052068kB pages_scanned:4587649 all_unreclaimable? yes lowmem_reserve[]: 0 0 0 Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB Node 0 DMA32: 78*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5904k
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Mon 2007-01-22 13:48:44, Justin Piszcz wrote: > > > On Mon, 22 Jan 2007, Pavel Machek wrote: > > > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote: > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to > > > invoke > > > the OOM killer and kill all of my processes? > > > > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > > happens every time! > > > > > > Anything to try? Any other output needed? Can someone shed some light > > > on > > > this situation? > > > > Is it highmem-related? Can you try it with mem=256M? > > > > Pavel > > -- > > (english) http://www.livejournal.com/~pavelmachek > > (cesky, pictures) > > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > I will give this a try later or tomorrow, I cannot have my machine crash > at the moment. > > Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if > that has anything to do with it because after the system kill -9's all the > processes etc, my terminal looks like garbage. That looks like separate problem. Switch to text mode console (vgacon, not fbcon) for tests. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
> What's that? Software raid or hardware raid? If the latter, which driver? Software RAID (md) On Mon, 22 Jan 2007, Andrew Morton wrote: > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> > > wrote: > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > > the OOM killer and kill all of my processes? > > What's that? Software raid or hardware raid? If the latter, which driver? > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > happens every time! > > > > Anything to try? Any other output needed? Can someone shed some light on > > this situation? > > > > Thanks. > > > > > > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) > > > > procs ---memory-- ---swap-- -io -system-- > > cpu > > r b swpd free buff cache si sobibo in cs us sy id > > wa > > 0 7764 50348 12 126998800 53632 172 1902 4600 1 8 > > 29 62 > > 0 7764 49420 12 126000400 53632 34368 1871 6357 2 11 > > 48 40 > > The wordwrapping is painful :( > > > > > The last lines of dmesg: > > [ 5947.199985] lowmem_reserve[]: 0 0 0 > > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB > > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB > > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB > > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB > > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB > > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB > > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 > > [ 5947.200055] Free swap = 2197628kB > > [ 5947.200058] Total swap = 2200760kB > > [ 5947.200060] Free swap: 2197628kB > > [ 5947.205664] 517888 pages of RAM > > [ 5947.205671] 288512 pages of HIGHMEM > > [ 5947.205673] 5666 reserved pages > > [ 5947.205675] 257163 pages shared > > [ 5947.205678] 600 pages swap cached > > [ 5947.205680] 88876 pages dirty > > [ 5947.205682] 115111 pages writeback > > [ 5947.205684] 5608 pages mapped > > [ 5947.205686] 49367 pages slab > > [ 5947.205688] 541 pages pagetables > > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a > > child > > [ 5947.205801] Killed process 1853 (named) > > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, > > oomkilladj=0 > > [ 5947.206621] [] out_of_memory+0x17b/0x1b0 > > [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 > > [ 5947.206636] [] __pte_alloc+0x1d/0x90 > > [ 5947.206643] [] copy_page_range+0x357/0x380 > > [ 5947.206649] [] copy_process+0x765/0xfc0 > > [ 5947.206655] [] alloc_pid+0x1b9/0x280 > > [ 5947.206662] [] do_fork+0x79/0x1e0 > > [ 5947.206674] [] do_pipe+0x5f/0xc0 > > [ 5947.206680] [] sys_clone+0x36/0x40 > > [ 5947.206686] [] syscall_call+0x7/0xb > > [ 5947.206691] [] __sched_text_start+0x853/0x950 > > [ 5947.206698] === > > Important information from the oom-killing event is missing. Please send > it all. > > >From your earlier reports we have several hundred MB of ZONE_NORMAL memory > which has gone awol. > > Please include /proc/meminfo from after the oom-killing. > > Please work out what is using all that slab memory, via /proc/slabinfo. > > After the oom-killing, please see if you can free up the ZONE_NORMAL memory > via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can > work out what happened to the missing couple-of-hundred MB from > ZONE_NORMAL. > > I believe this is the first part of it (hopefully): 2908kB active:86104kB inactive:1061904kB present:1145032kB pages_scanned:0 all_unreclaimable? no [ 5947.199985] lowmem_reserve[]: 0 0 0 [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 [ 5947.200055] Free swap = 2197628kB [ 5947.200058] Total swap = 2200760kB [ 5947.200060] Free swap: 2197628kB [ 5947.205664] 517888 pages of RAM [ 5947.205671] 288512 pages of HIGHMEM [ 5947.205673] 5666 reserved pages [ 5947.205675] 257163 pages shared [ 5947.205678] 600 pages swap cached [ 5947.205680] 88876 pages dirty [ 5947.205682] 115111 pages writeback [ 5947.205684] 5608 pages mapped [ 5947.205686] 49367 pages slab [ 5947.205688] 541 pages pagetables [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a child [ 5947.205801] Killed process 1853 (named) [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, oomkilladj=0 [ 5947.206621] [] out_of_memory+0x17b/0x1b0 [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 [ 5947.206636] [] __pte_alloc+0x1d/0x90 [ 5947.206643] [] copy_page_rang
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <[EMAIL PROTECTED]> > wrote: > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > the OOM killer and kill all of my processes? What's that? Software raid or hardware raid? If the latter, which driver? > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > happens every time! > > Anything to try? Any other output needed? Can someone shed some light on > this situation? > > Thanks. > > > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) > > procs ---memory-- ---swap-- -io -system-- > cpu > r b swpd free buff cache si sobibo in cs us sy id > wa > 0 7764 50348 12 126998800 53632 172 1902 4600 1 8 > 29 62 > 0 7764 49420 12 126000400 53632 34368 1871 6357 2 11 > 48 40 The wordwrapping is painful :( > > The last lines of dmesg: > [ 5947.199985] lowmem_reserve[]: 0 0 0 > [ 5947.12] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0 > [ 5947.200055] Free swap = 2197628kB > [ 5947.200058] Total swap = 2200760kB > [ 5947.200060] Free swap: 2197628kB > [ 5947.205664] 517888 pages of RAM > [ 5947.205671] 288512 pages of HIGHMEM > [ 5947.205673] 5666 reserved pages > [ 5947.205675] 257163 pages shared > [ 5947.205678] 600 pages swap cached > [ 5947.205680] 88876 pages dirty > [ 5947.205682] 115111 pages writeback > [ 5947.205684] 5608 pages mapped > [ 5947.205686] 49367 pages slab > [ 5947.205688] 541 pages pagetables > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a > child > [ 5947.205801] Killed process 1853 (named) > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, > oomkilladj=0 > [ 5947.206621] [] out_of_memory+0x17b/0x1b0 > [ 5947.206631] [] __alloc_pages+0x29c/0x2f0 > [ 5947.206636] [] __pte_alloc+0x1d/0x90 > [ 5947.206643] [] copy_page_range+0x357/0x380 > [ 5947.206649] [] copy_process+0x765/0xfc0 > [ 5947.206655] [] alloc_pid+0x1b9/0x280 > [ 5947.206662] [] do_fork+0x79/0x1e0 > [ 5947.206674] [] do_pipe+0x5f/0xc0 > [ 5947.206680] [] sys_clone+0x36/0x40 > [ 5947.206686] [] syscall_call+0x7/0xb > [ 5947.206691] [] __sched_text_start+0x853/0x950 > [ 5947.206698] === Important information from the oom-killing event is missing. Please send it all. >From your earlier reports we have several hundred MB of ZONE_NORMAL memory which has gone awol. Please include /proc/meminfo from after the oom-killing. Please work out what is using all that slab memory, via /proc/slabinfo. After the oom-killing, please see if you can free up the ZONE_NORMAL memory via a few `echo 3 > /proc/sys/vm/drop_caches' commands. See if you can work out what happened to the missing couple-of-hundred MB from ZONE_NORMAL. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Mon, 22 Jan 2007, Pavel Machek wrote: > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote: > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > > the OOM killer and kill all of my processes? > > > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > > happens every time! > > > > Anything to try? Any other output needed? Can someone shed some light on > > this situation? > > Is it highmem-related? Can you try it with mem=256M? > > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > I will give this a try later or tomorrow, I cannot have my machine crash at the moment. Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if that has anything to do with it because after the system kill -9's all the processes etc, my terminal looks like garbage. Justin. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
On Sun 2007-01-21 14:27:34, Justin Piszcz wrote: > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke > the OOM killer and kill all of my processes? > > Doing this on a single disk 2.6.19.2 is OK, no issues. However, this > happens every time! > > Anything to try? Any other output needed? Can someone shed some light on > this situation? Is it highmem-related? Can you try it with mem=256M? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke the OOM killer and kill all of my processes? Doing this on a single disk 2.6.19.2 is OK, no issues. However, this happens every time! Anything to try? Any other output needed? Can someone shed some light on this situation? Thanks. The last lines of vmstat 1 (right before it kill -9'd my shell/ssh) procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 7764 50348 12 126998800 53632 172 1902 4600 1 8 29 62 0 7764 49420 12 126000400 53632 34368 1871 6357 2 11 48 40 0 6764 39608 12 123742000 52880 94696 1891 7424 1 12 47 39 0 6764 44264 12 122606400 42496 29723 1908 6035 1 9 31 58 0 6764 26672 12 121418000 43520 117472 1944 6189 1 13 0 87 0 7764 9132 12 121173200 22016 80400 1570 3304 1 8 0 92 1 8764 9512 12 120038800 33288 62212 1687 4843 1 9 0 91 0 5764 13980 12 119709600 5012 161 1619 2115 1 4 42 54 0 5764 29604 12 119722000 0 112 1548 1602 0 3 50 48 0 5764 49692 12 119739600 0 152 1484 1438 1 3 50 47 0 5764 73128 12 119764400 0 120 1463 1392 1 3 49 47 0 4764 99460 12 11977040024 168 1545 1803 1 3 39 57 0 4764 100088 12 121929600 11672 75450 1614 1371 0 5 73 22 0 6764 50404 12 126907200 53632 145 1989 3871 1 9 34 56 0 6764 51500 12 126768400 53632 608 1834 4437 1 8 21 71 4 5764 51424 12 126679200 53504 7584 1847 4393 2 9 48 42 0 6764 51456 12 126373600 53636 9804 1880 4326 1 10 9 81 0 6764 50640 12 126306000 53504 4392 1929 4430 1 8 28 63 0 6764 50956 12 1257884 240 50724 17214 1858 4755 1 11 35 54 0 6764 48360 12 124769200 50840 48880 1871 6242 1 10 0 89 0 6764 40028 12 122586000 42512 93346 1770 5599 2 11 0 87 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 6764 20372 12 121174400 21512 123664 1747 4378 0 9 3 88 0 6764 12140 12 118858400 20224 111244 1628 4244 1 9 15 76 0 7764 11280 12 117114400 22300 80936 1669 5314 1 8 11 80 0 6764 12168 12 116284000 28168 44072 1808 5065 1 8 0 92 1 7 3132 123740 12 10518480 2368 3852 34246 2097 2376 0 5 0 94 1 6 3132 19996 12 115566400 51752 290 1999 2136 2 8 0 91 The last lines of iostat (right before it kill -9'd my shell/ssh) avg-cpu: %user %nice %system %iowait %steal %idle 0.510.007.65 91.840.000.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 87.63 3873.20 254.64 65.98 21905.15 24202.06 287.61 144.37 209.54 3.22 103.30 sdb 0.00 3873.20 1.03 132.9912.37 60045.36 896.25 41.19 398.53 6.93 92.89 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdg 0.0030.93 0.00 9.28 0.00 157.7334.00 0.011.11 1.11 1.03 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sdi 12.37 0.00 7.22 1.0378.35 1.0319.25 0.045.38 5.38 4.43 sdj 12.37 0.00 6.19 1.0374.23 1.0320.86 0.022.43 2.43 1.75 sdk 0.0030.93 0.00 9.28 0.00 157.7334.00 0.010.56 0.56 0.52 md0 0.00 0.00 0.00 610.31 0.00 2441.24 8.00 0.000.00 0.00 0.00 md2 0.00 0.00 347.42 996.91 22181.44 7917.5344.78 0.000.00 0.00 0.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 md3 0.00 0.00 1.03 9.28 4.12 164.9532.80 0.000.00 0.00 0.00 md4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 1.020.00 12.24 86.730.000.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-