Re: btrfs crash when low on memory.

2013-03-04 Thread Martin Steigerwald
Am Mittwoch, 27. Februar 2013 schrieb Ahmet Inan:
> > Yeah we have a lot of
> > 
> > ptr = kmalloc();
> > BUG_ON(ptr);
> > 
> > everywhere.  I'll fix this one up but I really need to sit down and go
> > through all of them and make sure we do the right thing in all these
> > places.  Thanks,
> 
> But what would be the right thing to do when you got no memory?
> Spinlock until you can kmalloc? Pre-reserve some memory?
> 
> At the moment im using:
> 
> vm.min_free_kbytes = 65536
> 
> Which helps most of the time and i think is the better way to handle
> this kind of Situation.

Thank you.

Raising /proc/sys/vm/min_free_kbytes from about 65000 to 20 KiB helped
here on a ThinkPad T520 equipped with a 8 MiB for RAM.

I now have the oom killer raised while Planeshift client was doing accesses
to BTRFS.

Thus I have a complete OOM backtrace and I bet thats likely the place where
before BTRFS crashed the kernel before the OOM killer could chime in to
clear up the situation.

The rtkit-daemon invoked the OOM killer, since I do not use Pulseaudio
anymore since it didnĀ“t work to my satisfaction on this machine as well,
I might remove it. But then someone else will likely trigger it. :)

If need be, I would reduce the min_free_kbytes again until it crashes some
more time and do a screenshot, but a screenshot always only just shows
part of the trace. Some my memory the backtrace I saw on tty1 was similar
to the psclient.bin backtrace in the following OOM excerpt.

Otherwise I will leave it at that and make the min_free_kbytes setting
permanent :), or maybe disable the over commit to let Planeshift client
fail and possibly crash earlier on out of memory conditions.

Mar  4 22:56:00 merkaba rtkit-daemon[1575]: The canary thread is apparently 
starving. Taking action.
Mar  4 22:56:00 merkaba kernel: [183059.738831] rtkit-daemon invoked 
oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar  4 22:56:00 merkaba kernel: [183059.738837] rtkit-daemon cpuset=/ 
mems_allowed=0
Mar  4 22:56:00 merkaba kernel: [183059.738841] Pid: 1579, comm: rtkit-daemon 
Tainted: G   O 3.8.0-tp520 #40
Mar  4 22:56:00 merkaba kernel: [183059.738843] Call Trace:
Mar  4 22:56:00 merkaba kernel: [183059.738851]  [] ? 
_raw_spin_unlock+0x26/0x31
Mar  4 22:56:00 merkaba kernel: [183059.738856]  [] 
dump_header.isra.9+0x6b/0x1cd
Mar  4 22:56:00 merkaba kernel: [183059.738860]  [] ? 
_raw_spin_unlock_irqrestore+0x2e/0x39
Mar  4 22:56:00 merkaba kernel: [183059.738865]  [] ? 
___ratelimit+0xc9/0xe7
Mar  4 22:56:00 merkaba kernel: [183059.738869]  [] 
oom_kill_process+0x62/0x2bc
Mar  4 22:56:00 merkaba kernel: [183059.738874]  [] ? 
rcu_read_unlock_special+0x138/0x162
Mar  4 22:56:00 merkaba kernel: [183059.738877]  [] 
out_of_memory+0x3c4/0x3f7
Mar  4 22:56:00 merkaba kernel: [183059.738881]  [] 
__alloc_pages_nodemask+0x548/0x6c7
Mar  4 22:56:00 merkaba kernel: [183059.738886]  [] 
alloc_pages_current+0xc0/0xdd
Mar  4 22:56:00 merkaba kernel: [183059.738889]  [] 
__page_cache_alloc+0x87/0x93
Mar  4 22:56:00 merkaba kernel: [183059.738893]  [] 
filemap_fault+0x250/0x35f
Mar  4 22:56:00 merkaba kernel: [183059.738897]  [] 
__do_fault+0xa6/0x351
Mar  4 22:56:00 merkaba kernel: [183059.738900]  [] 
handle_pte_fault+0x28e/0x73f
Mar  4 22:56:00 merkaba kernel: [183059.738905]  [] ? 
try_to_wake_up+0x1b7/0x1c9
Mar  4 22:56:00 merkaba kernel: [183059.738908]  [] ? 
pmd_offset+0x10/0x3d
Mar  4 22:56:00 merkaba kernel: [183059.738911]  [] 
handle_mm_fault+0x1d8/0x1f2
Mar  4 22:56:00 merkaba kernel: [183059.738915]  [] 
__do_page_fault+0x37b/0x3c5
Mar  4 22:56:00 merkaba kernel: [183059.738920]  [] ? 
timespec_add_safe+0x22/0x51
Mar  4 22:56:00 merkaba kernel: [183059.738924]  [] ? 
paravirt_read_tsc+0x9/0xd
Mar  4 22:56:00 merkaba kernel: [183059.738928]  [] ? 
read_tsc+0x9/0x19
Mar  4 22:56:00 merkaba kernel: [183059.738931]  [] ? 
timekeeping_get_ns.constprop.8+0x13/0x3a
Mar  4 22:56:00 merkaba kernel: [183059.738935]  [] ? 
ktime_get_ts+0x47/0x87
Mar  4 22:56:00 merkaba kernel: [183059.738939]  [] ? 
poll_select_set_timeout+0x53/0x6f
Mar  4 22:56:00 merkaba kernel: [183059.738942]  [] 
do_page_fault+0x9/0xb
Mar  4 22:56:00 merkaba kernel: [183059.738945]  [] 
page_fault+0x28/0x30
Mar  4 22:56:00 merkaba kernel: [183059.738947] Mem-Info:
Mar  4 22:56:00 merkaba kernel: [183059.738949] Node 0 DMA per-cpu:
Mar  4 22:56:00 merkaba kernel: [183059.738952] CPU0: hi:0, btch:   1 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738954] CPU1: hi:0, btch:   1 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738956] CPU2: hi:0, btch:   1 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738958] CPU3: hi:0, btch:   1 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738960] Node 0 DMA32 per-cpu:
Mar  4 22:56:00 merkaba kernel: [183059.738962] CPU0: hi:  186, btch:  31 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738964] CPU1: hi:  186, btch:  31 
usd:   0
Mar  4 22:56:00 merkaba kernel: [183059.738966] CPU   

Re: btrfs crash when low on memory.

2013-02-27 Thread Ahmet Inan
> If we're corrupting on abort that is a bug too that needs to be fixed
> too.  I've banged on the abort stuff a lot recently when trying to
> make it not panic the box and it appears to work fine.  Obviously that
> kind of stuff needs to be tested as well, but so far I haven't seen
> abort corrupt the file system.  Thanks,

thank you for the info Josef.
i will report a bug next time i hit such a case then.

Ahmet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Josef Bacik
On Wed, Feb 27, 2013 at 3:10 PM, Ahmet Inan
 wrote:
> On Wed, Feb 27, 2013 at 7:26 PM, Josef Bacik  wrote:
>> On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote:
>>> > Yeah we have a lot of
>>> >
>>> > ptr = kmalloc();
>>> > BUG_ON(ptr);
>>> >
>>> > everywhere.  I'll fix this one up but I really need to sit down and go 
>>> > through
>>> > all of them and make sure we do the right thing in all these places.  
>>> > Thanks,
>>>
>>> But what would be the right thing to do when you got no memory?
>>> Spinlock until you can kmalloc? Pre-reserve some memory?
>>>
>>
>> Return ENOMEM?  We have a way to abort transactions now, if it's in a 
>> horrible
>> of enough spot we can just abort the transaction and let the user deal with 
>> the
>> aftermath, it's nicer than panicing.  Thanks,
>
> youre right. i am only afraid of silent corruption of data on aborts:
> our guys here trigger OOM all the time with their compilers and
> numerical codes (go figure).
> and until now we had no more aborts / panics because of
> "vm.min_free_kbytes = 65536" and thus no corruption.
>
> my point is:
> i like a freezing computer more than an corrupting computer, even if
> its a server. reboot to the rescue.
>

If we're corrupting on abort that is a bug too that needs to be fixed
too.  I've banged on the abort stuff a lot recently when trying to
make it not panic the box and it appears to work fine.  Obviously that
kind of stuff needs to be tested as well, but so far I haven't seen
abort corrupt the file system.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Ahmet Inan
On Wed, Feb 27, 2013 at 7:26 PM, Josef Bacik  wrote:
> On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote:
>> > Yeah we have a lot of
>> >
>> > ptr = kmalloc();
>> > BUG_ON(ptr);
>> >
>> > everywhere.  I'll fix this one up but I really need to sit down and go 
>> > through
>> > all of them and make sure we do the right thing in all these places.  
>> > Thanks,
>>
>> But what would be the right thing to do when you got no memory?
>> Spinlock until you can kmalloc? Pre-reserve some memory?
>>
>
> Return ENOMEM?  We have a way to abort transactions now, if it's in a horrible
> of enough spot we can just abort the transaction and let the user deal with 
> the
> aftermath, it's nicer than panicing.  Thanks,

youre right. i am only afraid of silent corruption of data on aborts:
our guys here trigger OOM all the time with their compilers and
numerical codes (go figure).
and until now we had no more aborts / panics because of
"vm.min_free_kbytes = 65536" and thus no corruption.

my point is:
i like a freezing computer more than an corrupting computer, even if
its a server. reboot to the rescue.

Ahmet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Josef Bacik
On Wed, Feb 27, 2013 at 07:31:11AM -0700, Ahmet Inan wrote:
> > Yeah we have a lot of
> >
> > ptr = kmalloc();
> > BUG_ON(ptr);
> >
> > everywhere.  I'll fix this one up but I really need to sit down and go 
> > through
> > all of them and make sure we do the right thing in all these places.  
> > Thanks,
> 
> But what would be the right thing to do when you got no memory?
> Spinlock until you can kmalloc? Pre-reserve some memory?
>

Return ENOMEM?  We have a way to abort transactions now, if it's in a horrible
of enough spot we can just abort the transaction and let the user deal with the
aftermath, it's nicer than panicing.  Thanks,

Josef 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Ahmet Inan
> Yeah we have a lot of
>
> ptr = kmalloc();
> BUG_ON(ptr);
>
> everywhere.  I'll fix this one up but I really need to sit down and go through
> all of them and make sure we do the right thing in all these places.  Thanks,

But what would be the right thing to do when you got no memory?
Spinlock until you can kmalloc? Pre-reserve some memory?

At the moment im using:

vm.min_free_kbytes = 65536

Which helps most of the time and i think is the better way to handle
this kind of Situation.

Ahmet
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Josef Bacik
On Tue, Feb 26, 2013 at 10:22:47PM -0700, Dave Jones wrote:
> Something I've yet to repeat managed to leak a whole bunch of memory
> while I was travelling, and locked up my workstation.
> 
> When I got home, this was the last thing printed out before it locked up
> (it did make it into the logs thankfully) after a bunch of instances of
> the oom-killers handywork.

Yeah we have a lot of

ptr = kmalloc();
BUG_ON(ptr);

everywhere.  I'll fix this one up but I really need to sit down and go through
all of them and make sure we do the right thing in all these places.  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs crash when low on memory.

2013-02-27 Thread Martin Steigerwald
Am Mittwoch, 27. Februar 2013 schrieb Dave Jones:
> Something I've yet to repeat managed to leak a whole bunch of memory
> while I was travelling, and locked up my workstation.
> 
> When I got home, this was the last thing printed out before it locked up
> (it did make it into the logs thankfully) after a bunch of instances of
> the oom-killers handywork.
> 
> 
> 
> SLUB: Unable to allocate memory on node -1 (gfp=0x50)
>   cache: btrfs_extent_state, object size: 176, buffer size: 504, default
> order: 1, min order: 0 node 0: slabs: 49, objs: 640, free: 0
> [ cut here ]
> kernel BUG at fs/btrfs/extent_io.c:748!

Thank you for reporting this Dave.

I have lockups due to memory pressure conditions on my ThinkPad T520 as well 
when playing Planeshift for some time. (AFAIR since I switched my home 
directory to BTRFS (/ was BTRFS before), but I am not sure about this.) 
Planeshift goes from 2 GB to about 4 GB RSS and then the machine usually 
starts to swap to SSD. 

I did not get around to report this yet. The machine is basically locked (at 
least for long periods of times like minutes). I intend to collect some 
photos and upload them somewhere, cause I do not see anything in logs after 
reboot.

I think this happens *before* real OOM conditions are met (i.e. all of swap 
is being used up as well).

In backtraces btrfs related stuff appears.

Expected results of cause: System continues swapping and if OOM conditions 
are met calls the OOM killer (which might try to get rid of running 
Planeshift client).

Current workaround: Develop a good feeling on when to better restart the PS 
client. :)

So for now just a heads up that I have seen similar issues. (But I think my 
backtraces might have been different, difficult to say since some of it 
scrolls by quite quickly.)

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs crash when low on memory.

2013-02-26 Thread Dave Jones
Something I've yet to repeat managed to leak a whole bunch of memory
while I was travelling, and locked up my workstation.

When I got home, this was the last thing printed out before it locked up
(it did make it into the logs thankfully) after a bunch of instances of
the oom-killers handywork.



SLUB: Unable to allocate memory on node -1 (gfp=0x50)
  cache: btrfs_extent_state, object size: 176, buffer size: 504, default order: 
1, min order: 0
  node 0: slabs: 49, objs: 640, free: 0
[ cut here ]
kernel BUG at fs/btrfs/extent_io.c:748!
invalid opcode:  [#1] PREEMPT SMP 
Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp 
snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq 
snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore 
e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss 
nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix 
usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit 
hwmon drm_kms_helper ttm drm i2c_core floppy
CPU 1 
Pid: 7017, comm: mutt Not tainted 3.8.0+ #67  /D975XBX
RIP: 0010:[]  [] 
__set_extent_bit+0x3ae/0x4d0 [btrfs]
RSP: :8800a4c31838  EFLAGS: 00010246
RAX:  RBX: 001bbfff RCX: 
RDX: 0001 RSI: 00b0 RDI: 
RBP: 8800a4c318b8 R08: 81cf0b80 R09: 0400
R10: 0001 R11: 0508 R12: 8800ba4ab2c8
R13: 8800ba4ab2c8 R14:  R15: 001bb000
FS:  7eff96e14800() GS:8800bfc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 00449cee CR3: 80ebb000 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mutt (pid: 7017, threadinfo 8800a4c3, task 880025cba4c0)
Stack:
 2ab2 8800a4c318e0 8800a4c31fd8 0292
 8800a4c31fd8 1000 001bbfff 10080008
 034a39a8 8800ba4ab2c8 8800a4c31898 001bbfff
Call Trace:
 [] lock_extent_bits+0x74/0xa0 [btrfs]
 [] lock_extent+0x13/0x20 [btrfs]
 [] __extent_read_full_page+0xc4/0x720 [btrfs]
 [] ? repair_io_failure+0x440/0x440 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] extent_readpages+0x116/0x1f0 [btrfs]
 [] btrfs_readpages+0x1f/0x30 [btrfs]
 [] __do_page_cache_readahead+0x2aa/0x350
 [] ? __do_page_cache_readahead+0x110/0x350
 [] ? find_get_page+0x5/0x280
 [] ra_submit+0x21/0x30
 [] filemap_fault+0x267/0x4a0
 [] __do_fault+0x6e/0x530
 [] handle_pte_fault+0x8f/0x900
 [] handle_mm_fault+0x210/0x300
 [] __do_page_fault+0x15c/0x4e0
 [] ? rcu_eqs_exit_common+0xc7/0x380
 [] ? rcu_eqs_exit+0x65/0xb0
 [] do_page_fault+0x2b/0x50
 [] page_fault+0x1f/0x30
Code: c9 0f 85 c7 fc ff ff 66 0f 1f 44 00 00 f6 45 18 10 0f 84 b7 fc ff ff 8b 
7d 18 e8 8e f2 ff ff 48 85 c0 48 89 c1 0f 85 a3 fc ff ff <0f> 0b 4d 89 ef 31 c9 
eb 89 66 0f 1f 84 00 00 00 00 00 48 83 7b 

WARNING: at kernel/exit.c:721 do_exit+0x55/0xc70()
Hardware name: 
Modules linked in: xfs vfat fat ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 
xt_conntrack nf_conntrack ip6table_filter ip6_tables snd_emu10k1 coretemp 
snd_hwdep snd_util_mem snd_ac97_codec ac97_bus snd_rawmidi snd_seq 
snd_seq_device microcode snd_pcm pcspkr snd_page_alloc snd_timer snd soundcore 
e1000e vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc nfsd auth_rpcgss 
nfs_acl lockd sunrpc btrfs libcrc32c lzo_compress zlib_deflate ata_piix 
usb_storage firewire_ohci firewire_core sata_sil crc_itu_t radeon i2c_algo_bit 
hwmon drm_kms_helper ttm drm i2c_core floppy
Pid: 7017, comm: mutt Tainted: G  D  3.8.0+ #67
Call Trace:
 [] warn_slowpath_common+0x7f/0xc0
 [] warn_slowpath_null+0x1a/0x20
 [] do_exit+0x55/0xc70
 [] ? __const_udelay+0x28/0x30
 [] ? __rcu_read_unlock+0x5c/0xa0
 [] ? kmsg_dump+0x1bd/0x230
 [] ? kmsg_dump+0x25/0x230
 [] oops_end+0x96/0xe0
 [] die+0x58/0x90
 [] do_trap+0x6b/0x170
 [] do_invalid_op+0x9a/0xc0
 [] ? __set_extent_bit+0x3ae/0x4d0 [btrfs]
 [] ? alloc_extent_state+0x2e/0x1b0 [btrfs]
 [] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [] ? restore_args+0x30/0x30
 [] invalid_op+0x15/0x20
 [] ? __set_extent_bit+0x3ae/0x4d0 [btrfs]
 [] ? __set_extent_bit+0x3a2/0x4d0 [btrfs]
 [] lock_extent_bits+0x74/0xa0 [btrfs]
 [] lock_extent+0x13/0x20 [btrfs]
 [] __extent_read_full_page+0xc4/0x720 [btrfs]
 [] ? repair_io_failure+0x440/0x440 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] ? btrfs_submit_direct+0x640/0x640 [btrfs]
 [] extent_readpages+0x116/0x1f0 [btrfs]
 [] btrfs_readpages+0x1f/0x30 [btrfs]
 [] __