Hi,

We have PathScale Infinipath QLE7140 cards using the driver that's
included with the 2.6.20.x kernels. The machines in question have two
dual core Opteron 280 processors, SuperMicro H8DCE-HTe mainboards, and
4GB of ECC DDR400. I've tried pvfs-2.6.2 w/2.6.20.1 and the latest CVS
w/2.6.20.3.

When testing with bonnie (not the preferred benchmark, I realize), a
filesystem mounted over the TCP interface works fine, but when mounted
over the IB interface the kernel reports a null pointer dereference in
put_back_slot within one or two test attempts (complete reports below).
When the openib interface test is able to complete successfully it's up
to 2.5 times faster than gigabit/TCP, so we're very interested in making
use of it.

I've attached the pvfs2 fs config file.

The test is run with:
bonnie -s 8G:1024k -f -n 0

Please let me know if you need any more info...

Thanks!
Tad

Error report using 2.6.20.1 w/pvfs-2.6.2:

Mar 16 11:03:24 gx00 kernel: pvfs2: pvfs2_file_read -- wait timed out;
aborting attempt.
Mar 16 11:03:41 gx00 kernel: pvfs2: pvfs2_lookup -- wait timed out;
aborting attempt.
Mar 16 11:03:44 gx00 kernel: pvfs2: pvfs2_cancel -- wait timed out;
aborting attempt.
Mar 16 11:03:44 gx00 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881e061b>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel: PGD 192c8067 PUD 70ca0067 PMD 0
Mar 16 11:03:44 gx00 kernel: Oops: 0002 [1] SMP
Mar 16 11:03:44 gx00 kernel: CPU 3
Mar 16 11:03:44 gx00 kernel: Modules linked in: pvfs2 binfmt_misc ppdev
parport_pc lp parport thermal fan button process
or ac battery autofs4 ib_ipoib ib_umad ib_uverbs md_mod rdma_cm ib_cm
iw_cm ib_sa ib_mad ib_addr iscsi_tcp libiscsi scsi
_transport_iscsi ipv6 ext2 mbcache dm_snapshot dm_mirror dm_mod
w83627hf_wdt w83627hf eeprom adm1026 hwmon_vid i2c_isa t
sdev i2c_nforce2 k8temp psmouse serio_raw ib_ipath ib_core pcspkr
ehci_hcd ohci_hcd evdev fbcon tileblit font bitblit fb
con_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb
Mar 16 11:03:44 gx00 kernel: Pid: 22025, comm: bonnie Not tainted
2.6.20.1-opteron #1
Mar 16 11:03:44 gx00 kernel: RIP: 0010:[<ffffffff881e061b>] 
[<ffffffff881e061b>] :pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel: RSP: 0018:ffff810017189d08  EFLAGS: 00010247
Mar 16 11:03:44 gx00 kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: 0000000002242660
Mar 16 11:03:44 gx00 kernel: RDX: 0000000000000000 RSI: 0000000000000000
RDI: ffffffff881f11c8
Mar 16 11:03:44 gx00 kernel: RBP: ffff810017189d28 R08: 0000000000000040
R09: 0000000000000003
Mar 16 11:03:44 gx00 kernel: R10: ffffffff80696740 R11: ffffffff80219210
R12: ffff810017189e68
Mar 16 11:03:44 gx00 kernel: R13: ffff810017189ed8 R14: 0000000000000001
R15: ffff810017189e18
Mar 16 11:03:44 gx00 kernel: FS:  00002aaaab28eb00(0000)
GS:ffff81011fd3a8c0(0000) knlGS:00000000f7e9e6b0
Mar 16 11:03:44 gx00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Mar 16 11:03:44 gx00 kernel: CR2: 0000000000000000 CR3: 0000000026fcf000
CR4: 00000000000006e0
Mar 16 11:03:44 gx00 kernel: Process bonnie (pid: 22025, threadinfo
ffff810017188000, task ffff81007f85d800)
Mar 16 11:03:44 gx00 kernel: Stack:  ffff81008e94a338 ffff81009c9d42f8
0000000000100000 ffffffff881e06d7
Mar 16 11:03:44 gx00 kernel:  ffff810000000005 0000000000000000
ffffffff881f11c8 ffffffff881f11d0
Mar 16 11:03:44 gx00 kernel:  0000000000000001 ffffffff881dc18e
ffff81011afdd000 ffffffff00008003
Mar 16 11:03:44 gx00 kernel: Call Trace:
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881e06d7>]
:pvfs2:pvfs_bufmap_put+0x37/0x40
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881dc18e>]
:pvfs2:do_direct_readv_writev+0x9be/0xd70
Mar 16 11:03:44 gx00 kernel:  [<ffffffff8022e6a9>]
release_console_sem+0x1e9/0x240
Mar 16 11:03:44 gx00 kernel:  [<ffffffff802433d9>]
remove_wait_queue+0x19/0x60
Mar 16 11:03:44 gx00 kernel:  [<ffffffff881dd955>]
:pvfs2:pvfs2_file_read+0xe5/0x120
Mar 16 11:03:44 gx00 kernel:  [<ffffffff8028411b>] vfs_read+0xdb/0x1a0
Mar 16 11:03:44 gx00 kernel:  [<ffffffff80284633>] sys_read+0x53/0x90
Mar 16 11:03:44 gx00 kernel:  [<ffffffff80209bbe>] system_call+0x7e/0x83
Mar 16 11:03:44 gx00 kernel:
Mar 16 11:03:44 gx00 kernel:
Mar 16 11:03:44 gx00 kernel: Code: c7 04 90 00 00 00 00 48 8b 45 10 c7
00 01 00 00 00 48 8b 7d
Mar 16 11:03:44 gx00 kernel: RIP  [<ffffffff881e061b>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 11:03:44 gx00 kernel:  RSP <ffff810017189d08>
Mar 16 11:03:44 gx00 kernel: CR2: 0000000000000000

Error report using 2.6.20.3 with CVS updated on 3/16:

Mar 16 14:44:03 gx00 kernel: pvfs2: module version
2.6.2pre1-2007-03-16-175232 loaded
Mar 16 14:45:40 gx00 kernel: pvfs2: pvfs2_file_read -- wait timed out;
aborting attempt.
Mar 16 14:46:00 gx00 kernel: pvfs2: pvfs2_cancel -- wait timed out;
aborting attempt.
Mar 16 14:46:00 gx00 kernel: Unable to handle kernel NULL pointer
dereference at 0000000000000000 RIP:
Mar 16 14:46:00 gx00 kernel:  [<ffffffff882555db>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel: PGD 6bdd2067 PUD 6bc8d067 PMD 0
Mar 16 14:46:00 gx00 kernel: Oops: 0002 [1] SMP
Mar 16 14:46:00 gx00 kernel: CPU 0
Mar 16 14:46:00 gx00 kernel: Modules linked in: pvfs2 binfmt_misc ppdev
parport_pc lp parport thermal fan button process
or ac battery autofs4 ib_ipoib ib_umad ib_uverbs md_mod ib_iser rdma_cm
ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp libis
csi scsi_transport_iscsi ipv6 ext2 mbcache dm_snapshot dm_mirror dm_mod
w83627hf_wdt w83627hf eeprom adm1026 hwmon_vid i
2c_isa tsdev ib_ipath ib_core e1000 ehci_hcd k8temp psmouse serio_raw
pcspkr ohci_hcd i2c_nforce2 evdev fbcon tileblit f
ont bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb
Mar 16 14:46:00 gx00 kernel: Pid: 13759, comm: bonnie Not tainted
2.6.20.3-opteron #1
Mar 16 14:46:00 gx00 kernel: RIP: 0010:[<ffffffff882555db>] 
[<ffffffff882555db>] :pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel: RSP: 0018:ffff81007487bd08  EFLAGS: 00010247
Mar 16 14:46:00 gx00 kernel: RAX: 0000000000000000 RBX: 0000000000000000
RCX: 0000000001264360
Mar 16 14:46:00 gx00 kernel: RDX: 0000000000000000 RSI: 0000000000000000
RDI: ffffffff88266188
Mar 16 14:46:00 gx00 kernel: RBP: ffff81007487bd28 R08: 0000000000000000
R09: 0000000000000000
Mar 16 14:46:00 gx00 kernel: R10: 0000000000000000 R11: 0000000000000000
R12: ffff81007487be68
Mar 16 14:46:00 gx00 kernel: R13: ffff81007487bed8 R14: 0000000000000001
R15: ffff81007487be18
Mar 16 14:46:00 gx00 kernel: FS:  00002aaaab28eb00(0000)
GS:ffffffff80627000(0000) knlGS:0000000000000000
Mar 16 14:46:00 gx00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Mar 16 14:46:00 gx00 kernel: CR2: 0000000000000000 CR3: 000000007cb75000
CR4: 00000000000006e0
Mar 16 14:46:00 gx00 kernel: Process bonnie (pid: 13759, threadinfo
ffff81007487a000, task ffff81007fc9a1c0)
Mar 16 14:46:00 gx00 kernel: Stack:  ffff81007cac2538 ffff8100541344b8
0000000000100000 ffffffff88255697
Mar 16 14:46:00 gx00 kernel:  ffff810000000005 0000000000000000
ffffffff88266188 ffffffff88266190
Mar 16 14:46:00 gx00 kernel:  0000000000000001 ffffffff8825118e
00002aaaab290010 0000000000008003
Mar 16 14:46:00 gx00 kernel: Call Trace:
Mar 16 14:46:00 gx00 kernel:  [<ffffffff88255697>]
:pvfs2:pvfs_bufmap_put+0x37/0x40
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8825118e>]
:pvfs2:do_direct_readv_writev+0x9be/0xd70
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8028bfaa>] permission+0xca/0x140
Mar 16 14:46:00 gx00 kernel:  [<ffffffff802435b9>]
remove_wait_queue+0x19/0x60
Mar 16 14:46:00 gx00 kernel:  [<ffffffff88252955>]
:pvfs2:pvfs2_file_read+0xe5/0x120
Mar 16 14:46:00 gx00 kernel:  [<ffffffff8028432b>] vfs_read+0xdb/0x1a0
Mar 16 14:46:00 gx00 kernel:  [<ffffffff80284843>] sys_read+0x53/0x90
Mar 16 14:46:00 gx00 kernel:  [<ffffffff80209bbe>] system_call+0x7e/0x83
Mar 16 14:46:00 gx00 kernel:
Mar 16 14:46:00 gx00 kernel:
Mar 16 14:46:00 gx00 kernel: Code: c7 04 90 00 00 00 00 48 8b 45 10 c7
00 01 00 00 00 48 8b 7d
Mar 16 14:46:00 gx00 kernel: RIP  [<ffffffff882555db>]
:pvfs2:put_back_slot+0x2b/0x70
Mar 16 14:46:00 gx00 kernel:  RSP <ffff81007487bd08>
Mar 16 14:46:00 gx00 kernel: CR2: 0000000000000000
Mar 16 14:46:20 gx00 kernel:  pvfs2: pvfs2_inode_setattr -- wait timed
out; aborting attempt.
Mar 16 14:47:03 gx00 kernel: pvfs2: pvfs2_inode_getattr -- wait timed
out; aborting attempt.
Mar 16 14:47:43 gx00 last message repeated 2 times
Mar 16 14:48:03 gx00 kernel: pvfs2: pvfs2_inode_getattr -- wait timed
out; aborting attempt.

<Defaults>
        UnexpectedRequests 50
        LogFile /var/log/pvfs2-server.log
        EventLogging none
        LogStamp datetime
        BMIModules bmi_ib,bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        TroveMethod alt-aio
</Defaults>

<Aliases>
        Alias gx47 ib://gx47:3335,tcp://gx47:3334
        Alias gx48 ib://gx48:3335,tcp://gx48:3334
        Alias gx49 ib://gx49:3335,tcp://gx49:3334
        Alias gx50 ib://gx50:3335,tcp://gx50:3334
        Alias gx51 ib://gx51:3335,tcp://gx51:3334
        Alias gx52 ib://gx52:3335,tcp://gx52:3334
        Alias gx53 ib://gx53:3335,tcp://gx53:3334
        Alias gx54 ib://gx54:3335,tcp://gx54:3334
</Aliases>

<Filesystem>
        Name pvfs2-fs
        ID 1221584540
        RootHandle 1048576
        <MetaHandleRanges>
                Range gx47 4-536870914
                Range gx48 536870915-1073741825
        </MetaHandleRanges>
        <DataHandleRanges>
                Range gx49 1073741826-1610612736
                Range gx50 1610612737-2147483647
                Range gx51 2147483648-2684354558
                Range gx52 2684354559-3221225469
                Range gx53 3221225470-3758096380
                Range gx54 3758096381-4294967291
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta yes
                TroveSyncData no
                TroveMethod alt-aio
        </StorageHints>
</Filesystem>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to