On Sun, May 20, 2012 at 12:05 PM, Christoph Bartoschek <po...@pontohonk.de> wrote: > Hi, > > we have a two node setup with drbd below LVM and an Ext4 filesystem that is > shared vi NFS. The system shows low performance and lots of timeouts > resulting in unnecessary failovers from pacemaker. > > The connection between both nodes is capable of 1 GByte/s as shown by iperf. > The network between the clients and the nodes is capable of 110 MByte/s. The > RAID can be filled with 450 MByte/s.
No it can't (most likely); see below. > Thus I would expect to have a write performance of about 100 MByte/s. But dd > gives me only 20 MByte/s. > > dd if=/dev/zero of=bigfile.10G bs=8192 count=1310720 > 1310720+0 records in > 1310720+0 records out > 10737418240 bytes (11 GB) copied, 498.26 s, 21.5 MB/s If you used that same dd invocation for your local test that allegedly produced 450 MB/s, you've probably been testing only your page cache. Add oflag=dsync or oflag=direct (the latter will only work locally, as NFS doesn't support O_DIRECT). If your RAID is one of reasonably contemporary SAS or SATA drives, then a sustained to-disk throughput of 450 MB/s would require about 7-9 stripes in a RAID-0 or RAID-10 configuration. Is that what you've got? Or are you writing to SSDs? > While the slow dd runs there are timeouts on the server resulting in a > restart of some resources. In the logfile I also see: > > [329014.592452] INFO: task nfsd:2252 blocked for more than 120 seconds. > [329014.592820] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [329014.593273] nfsd D 0000000000000007 0 2252 2 > 0x00000000 > [329014.593278] ffff88060a847c40 0000000000000046 ffff88060a847bf8 > 0000000300000001 > [329014.593284] ffff88060a847fd8 ffff88060a847fd8 ffff88060a847fd8 > 0000000000013780 > [329014.593290] ffff8806091416f0 ffff8806085bc4d0 ffff88060a847c50 > ffff88061870c800 > [329014.593295] Call Trace: > [329014.593303] [<ffffffff8165a55f>] schedule+0x3f/0x60 > [329014.593309] [<ffffffff81265085>] jbd2_log_wait_commit+0xb5/0x130 > [329014.593315] [<ffffffff8108aec0>] ? add_wait_queue+0x60/0x60 > [329014.593321] [<ffffffff812111b8>] ext4_sync_file+0x208/0x2d0 > [329014.593328] [<ffffffff811a62dd>] vfs_fsync_range+0x1d/0x40 > [329014.593339] [<ffffffffa0227e51>] nfsd_commit+0xb1/0xd0 [nfsd] > [329014.593349] [<ffffffffa022f28d>] nfsd3_proc_commit+0x9d/0x100 [nfsd] > [329014.593356] [<ffffffffa0222a4b>] nfsd_dispatch+0xeb/0x230 [nfsd] > [329014.593373] [<ffffffffa00e9d95>] svc_process_common+0x345/0x690 > [sunrpc] > [329014.593379] [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200 > [329014.593391] [<ffffffffa00ea1e2>] svc_process+0x102/0x150 [sunrpc] > [329014.593397] [<ffffffffa02221ad>] nfsd+0xbd/0x160 [nfsd] > [329014.593403] [<ffffffffa02220f0>] ? nfsd_startup+0xf0/0xf0 [nfsd] > [329014.593407] [<ffffffff8108a42c>] kthread+0x8c/0xa0 > [329014.593412] [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10 > [329014.593416] [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0 > [329014.593420] [<ffffffff81666bf0>] ? gs_change+0x13/0x13 > > > Has anyone an idea what could cause such problems? I have no idea for > further analysis. As a knee-jerk response, that might be the classic issue of NFS filling up the page cache until it hits the vm.dirty_ratio and then having a ton of stuff to write to disk, which the local I/O subsystem can't cope with. Cheers, Florian -- Need help with High Availability? http://www.hastexo.com/now _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org