Hi Bart,
From your strace output, my guess is that cp is running into trouble
with the value of one of the fstat() fields, but its hard to say which one.
Are you able to reproduce this reliably? Could you run the strace again
with the -v option to see if it gives a full listing of what values were
in the stat structs it got before crashing?
-Phil
Bart Taylor wrote:
Hey guys,
I am running into a problem with a system copy command segfaulting on
2.4 kernels. Specifically, I am seeing this show up on RHEL3 machines
running a patched version of PVFS 2.6. Machines running Linux 2.6
kernels do not experience this problem. I believe we may have mentioned
this recently but hoped it would be fixed by some updates pulled into
dcache. That, apparently, is not the case.
The segfault is extremely consistent; it happens every time a cp is
executed with a PVFS2 file system as the target. The target file is
always created with a size of zero, so at least part of the command is
completing. 'dd' commands execute normally.
The setup is simple: 1 server node (RHEL4 2.6 kernel) with the default
interactive genconfig output, and 1 client with a 2.4 kernel. Mount the
file system, execute a copy onto the file system.
Here is the conf file contents:
<Defaults>
UnexpectedRequests 50
EventLogging none
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
TCPBindSpecific yes
</Defaults>
<Aliases>
Alias node1 tcp://node1:3334
</Aliases>
<Filesystem>
Name pvfs2-fs
ID 1227216139
RootHandle 1048576
<MetaHandleRanges>
Range node1 4-2147483650
</MetaHandleRanges>
<DataHandleRanges>
Range node1 2147483651-4294967297
</DataHandleRanges>
<StorageHints>
TroveSyncMeta no
TroveSyncData no
CoalescingHighWatermark infinity
CoalescingLowWatermark 0
TroveSyncMetaTimerSecs 5
DBCacheSizeBytes 1073741824
</StorageHints>
</Filesystem>
And here is the last bit of an strace on a copy command:
[r...@node1 root]# strace cp test.file /mnt/pvfs2/
.....
brk(0) = 0x95ce000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=32148976, ...}) = 0
mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb73f4000
close(3) = 0
geteuid32() = 0
lstat64("/mnt/pvfs2/", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096,
...}) = 0
stat64("/mnt/pvfs2/", {st_mode=S_IFDIR|S_ISVTX|0777, st_size=4096, ...}) = 0
stat64("test.file", {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
stat64("/mnt/pvfs2/test.file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("test.file", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
open("/mnt/pvfs2/test.file", O_WRONLY|O_TRUNC|O_LARGEFILE) = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=5, ...}) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
There is nothing in the client or server logs without turning on
additional logging.
Are there any suggestions on what might be causing this? Can I provide
any additional information that will be helpful for debugging?
Bart.
------------------------------------------------------------------------
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers