Hello,

During some testing I found that orangefs behaves badly when multiple
intense parallel i/o is used on the same directory. For testing I
used parallel make: just untar some relatively large tarball and run
make -j10
I used torque-3.0.5, but this should not matter.

My current setup is: orangefs-2.8.5, 15 servers serving both data and
metadata, 16 clients, 15 of them are on the same nodes as servers,
this testing was conducted on a separated node with no servers on it.
Kernel is linux-3.2.14, ACL support is disabled due to previously
found bugs:
http://www.beowulf-underground.org/pipermail/pvfs2-developers/2012-April/004974.html
I use TroveSync disabled.

During parallel make random files (rarely directories) become
inaccessible, any attempt to use them results in EIO (system error 5,
input/output error). However, these files can be normally accessed
from other nodes or even from the same node using pvfs2-cp, which
doesn't use kernel VFS to my knowledge.

I made a series of tests to find what may affect this behaviour and
found that:

1) Error rate depends on parallelism level: make -j2 is often fine,
-j5 produces more problems, -j10 tends to "generate" broken files
very often and so on.

2) With client-side caching disabled (defaults are -a5 -n5):
pvfs2-client -a 0 -n 0 ...
things became worse: frequency of error occurrence raised
significantly. Somewhat large cache (-a10 -n10) seems to work better,
but doesn't eliminate problem completely.

3) During such tests I found that sometimes kernel produce backtraces
and complains about NULL pointer dereference. See attached kernel.log
for details. pvfs2-client complains a lot in its log via the same
message:
[E 09:49:22.580278] Completed upcall of unknown type ff00000d!
Though, it is not strictly in sync with kernel backtraces.

4) When I tried to increase client cache significatly (-a500 -b500)
and run make -j10, I got kernel crash, all disk subsystem (not only
pvfs2) became unresponsive and only hardware watchdog save the
situation. This was general protection fault. I managed to saved
kernel trace, see kernel.crash.log.

5) There are no errors logged on the pvfs2 servers.

6) TroveSyncMeta yes has no noticeable effect on this issue.

7) TroveSyncData yes makes it somewhat better in one cases and worse
in another.

8) I tried to increase AttrCacheSize and AttrCacheMaxNumElems values,
though with no effect. Nevertheless I plan to keep larger values,
they shouldn't hurt and we have a plenty of RAM available.

My current pvfs config as attached for reference.

As for now I can somewhat mitigate this issue by using a cron
script with either mount -o remount on nodes with problems (though
remount with live applications may produce problems itself) or by
using the following sequence:
pvfs2-cp badfile tempfile
pvfs2-rm badfile
cp tempfile badfile

But anyway this will not help with already confused applications...

I'm aware that support for 3.1 and 3.2 kernels is still experimental,
but I can't downgrade this system because other applications require
some new kernel features.

Also I found an interesting options: DBCacheSizeBytes and
DBCacheType, though as far as I understand they have effect only on
TroveMethod dbpf and are useless for alt-aio used in my setup.
Please correct me if I'm wrong.

Best regards,
Andrew Savchenko
PVFS: kernel debug mask has been modified to "none" (0x00000000)
PVFS: client debug mask has been modified to "none" (0x00000000)
general protection fault: 0000 [#1] SMP
CPU 4
Modules linked in: pvfs2(O) knem(O) md5 nfsd 8021q garp stp llc xt_NOTRACK 
iptable_raw iptable_nat nf_nat iptable_mangle ipt_REJECT ipt_LOG xt_pkttype 
xt_limit xt_tcpudp xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_hashlimit 
xt_conntrack iptable_filter ip_tables x_tables nf_conntrack_ftp nf_conntrack 
[last unloaded: pvfs2]

Pid: 3871, comm: pvfs2-client-co Tainted: G           O 3.2.14-unicluster #2 HP 
ProLiant BL2x220c G5
RIP: 0010:[<ffffffffa00ac705>]  [<ffffffffa00ac705>] 
PVFS_proc_mask_to_eventlog+0x995/0x1110 [pvfs2]
RSP: 0018:ffff8807dadc7ca8  EFLAGS: 00010246
RAX: 0000000000000350 RBX: 0000000000002030 RCX: 0000000000009d48
RDX: 00000000000000c4 RSI: ffff8807f37f41c8 RDI: ffff8807dadc7d00
RBP: ffff8807f37f41c8 R08: ffff8807dadc7f58 R09: ffffffffa00ac5a0
R10: 0000000000002030 R11: ffff8807dadc7fd8 R12: ffff8807da98a038
R13: ffff8807dadc7e78 R14: dead000000100100 R15: ffff8807dafaed40
FS:  00007f4666f78720(0000) GS:ffff88081fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2f1aa85070 CR3: 00000007f8b15000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process pvfs2-client-co (pid: 3871, threadinfo ffff8807dadc6000, task 
ffff8807faf10000)
Stack:
 ffff8807faf10000 0000000000000350 0000000000000004 ffff8807da98a040
 ffffffff00000001 0000000000000000 0000000000000064 ffff8807dadc7ce0
 ffff8807dadc7ce0 ffff8807dadc7cf0 ffff8807dadc7cf0 0000000000009d48
Call Trace:
 [<ffffffffa00ac5a0>] ? PVFS_proc_mask_to_eventlog+0x830/0x1110 [pvfs2]
 [<ffffffff810c6929>] ? do_sync_readv_writev+0xa9/0xf0
 [<ffffffff810529c8>] ? thread_group_cputime+0x78/0xb0
 [<ffffffff810c6aba>] ? rw_copy_check_uvector+0x9a/0x140
 [<ffffffff810c6c46>] ? do_readv_writev+0xe6/0x210
 [<ffffffff81033260>] ? get_task_mm+0x10/0x40
 [<ffffffff810c6f0e>] ? sys_writev+0x4e/0x90
 [<ffffffff8140d57b>] ? system_call_fastpath+0x16/0x1b
Code: 08 48 c1 64 24 08 04 48 8b 44 24 08 49 03 07 48 8b 28 4c 8b 75 00 48 39 
e8 75 25 e9 76 01 00 00 66 0f 1f 44 00 00 48 8b 44 24 08 <49> 8b 16 49 03 07 49 
39 c6 0f 84 5c 01 00 00 4c 89 f5 49 89 d6
RIP  [<ffffffffa00ac705>] PVFS_proc_mask_to_eventlog+0x995/0x1110 [pvfs2]
 RSP <ffff8807dadc7ca8>
---[ end trace 6109a457ca33b74a ]---

Attachment: kernel.log.xz
Description: Binary data

<Defaults>
        UnexpectedRequests 50
        EventLogging none
        EnableTracing no
        LogStamp datetime
        BMIModules bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        PrecreateBatchSize 0,32,512,32,32,32,0
        PrecreateLowThreshold 0,16,256,16,16,16,0

        DataStorageSpace /mnt/pvfs2
        MetadataStorageSpace /mnt/pvfs2

        LogFile /var/log/pvfs2/server.log
</Defaults>

<Aliases>
        Alias n01 tcp://n01:3334
        Alias n02 tcp://n02:3334
        Alias n03 tcp://n03:3334
        Alias n04 tcp://n04:3334
        Alias n05 tcp://n05:3334
        Alias n06 tcp://n06:3334
        Alias n07 tcp://n07:3334
        Alias n08 tcp://n08:3334
        Alias n09 tcp://n09:3334
        Alias n10 tcp://n10:3334
        Alias n11 tcp://n11:3334
        Alias n12 tcp://n12:3334
        Alias n13 tcp://n13:3334
        Alias n14 tcp://n14:3334
        Alias n15 tcp://n15:3334
</Aliases>

<Filesystem>
        Name pvfs2-fs
        ID 158402586
        RootHandle 1048576
        FileStuffing yes
        <MetaHandleRanges>
                Range n01 3-307445734561825862
                Range n02 307445734561825863-614891469123651722
                Range n03 614891469123651723-922337203685477582
                Range n04 922337203685477583-1229782938247303442
                Range n05 1229782938247303443-1537228672809129302
                Range n06 1537228672809129303-1844674407370955162
                Range n07 1844674407370955163-2152120141932781022
                Range n08 2152120141932781023-2459565876494606882
                Range n09 2459565876494606883-2767011611056432742
                Range n10 2767011611056432743-3074457345618258602
                Range n11 3074457345618258603-3381903080180084462
                Range n12 3381903080180084463-3689348814741910322
                Range n13 3689348814741910323-3996794549303736182
                Range n14 3996794549303736183-4304240283865562042
                Range n15 4304240283865562043-4611686018427387902
        </MetaHandleRanges>
        <DataHandleRanges>
                Range n01 4611686018427387903-4919131752989213762
                Range n02 4919131752989213763-5226577487551039622
                Range n03 5226577487551039623-5534023222112865482
                Range n04 5534023222112865483-5841468956674691342
                Range n05 5841468956674691343-6148914691236517202
                Range n06 6148914691236517203-6456360425798343062
                Range n07 6456360425798343063-6763806160360168922
                Range n08 6763806160360168923-7071251894921994782
                Range n09 7071251894921994783-7378697629483820642
                Range n10 7378697629483820643-7686143364045646502
                Range n11 7686143364045646503-7993589098607472362
                Range n12 7993589098607472363-8301034833169298222
                Range n13 8301034833169298223-8608480567731124082
                Range n14 8608480567731124083-8915926302292949942
                Range n15 8915926302292949943-9223372036854775802
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta no
                TroveSyncData no
                CoalescingHighWatermark 128
                CoalescingLowWatermark 1
                TroveMethod alt-aio
                AttrCacheSize 8191
                AttrCacheMaxNumElems 65536
        </StorageHints>
        <Distribution>
           Name  simple_stripe
           Param strip_size
           Value 1048576
        </Distribution>
</Filesystem>

Attachment: pgpgCcXdkOLmP.pgp
Description: PGP signature

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to