Hi,
We have installed PVFS2 2.6.3 over Ethernet on a SUSE distribution,
locally on a biprocessor (PowerPC 970FX) machine.
Some commands like pvfs2-ping, pvfs2-mkdir, pvfs2-ls (w/o parameters)
work fine.
But we can not get it run for some pvfs2-* commands. For instance
pvfs2-cp gets stuck. Here the trace of gdb:
(gdb) bt
#0 0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
#1 0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
incount=128, outcount=0xffff97b0, maps=0xffff93b0, status=0xffff95b0,
poll_timeout=10, external_mutex=0x100d2ce0)
at socket-collection-epoll.c:281
#2 0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
#3 0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
outcount=0xffff9864, error_code_array=0x100d2b80,
actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
context_id=0) at bmi-tcp.c:1303
#4 0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
outcount=0x100d14cc, error_code_array=0x100d2b80,
actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
max_idle_time_ms=10, context_id=0) at bmi.c:944
#5 0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
#6 0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
at thread-mgr.c:815
#7 0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
#8 0x1007025c in job_testcontext (out_id_array_p=0xffff99d0,
inout_count_p=0xffff99b8, returned_user_ptr_array=0xffffd1d0,
out_status_array_p=0xffffa1d0, timeout_ms=10, context_id=1) at job.c:4068
#9 0x1000fdb0 in PINT_client_state_machine_test (op_id=3,
error_code=0xffffd670) at client-state-machine.c:536
---Type <return> to continue, or q <return> to quit---
#10 0x1001041c in PINT_client_wait_internal (op_id=3,
in_op_str=0x100b209c "fs_add", out_error=0xffffd670,
in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
#11 0x10010734 in PVFS_sys_wait (op_id=3, in_op_str=0x100b209c "fs_add",
out_error=0xffffd670) at client-state-machine.c:861
#12 0x10035c4c in PVFS_sys_fs_add (mntent=0x100d3030) at fs-add.sm:205
#13 0x1004c220 in PVFS_util_init_defaults () at pvfs2-util.c:1040
#14 0x1000a5c8 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:135
Some other times (but rarely) is getting stuck at a different place:
(gdb) bt
#0 0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
#1 0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
incount=128, outcount=0xffff9b30, maps=0xffff9730, status=0xffff9930,
poll_timeout=10, external_mutex=0x100d2ce0)
at socket-collection-epoll.c:281
#2 0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
#3 0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
outcount=0xffff9be4, error_code_array=0x100d2b80,
actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
context_id=0) at bmi-tcp.c:1303
#4 0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
outcount=0x100d14cc, error_code_array=0x100d2b80,
actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
max_idle_time_ms=10, context_id=0) at bmi.c:944
#5 0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
#6 0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
at thread-mgr.c:815
#7 0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
#8 0x1007025c in job_testcontext (out_id_array_p=0xffff9d50,
inout_count_p=0xffff9d38, returned_user_ptr_array=0xffffd550,
out_status_array_p=0xffffa550, timeout_ms=10, context_id=1) at job.c:4068
#9 0x1000fdb0 in PINT_client_state_machine_test (op_id=28,
error_code=0xffffda1c) at client-state-machine.c:536
---Type <return> to continue, or q <return> to quit---
#10 0x1001041c in PINT_client_wait_internal (op_id=28,
in_op_str=0x100ac1b8 "io", out_error=0xffffda1c,
in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
#11 0x10010734 in PVFS_sys_wait (op_id=28, in_op_str=0x100ac1b8 "io",
out_error=0xffffda1c) at client-state-machine.c:861
#12 0x1001b78c in PVFS_sys_io (ref=
{handle = 1048570, fs_id = 1957135728, __pad1 = -26176},
file_req=0x100d07d8, file_req_offset=0, buffer=0x40068008,
mem_req=0x100efbd0, credentials=0xffffe060, resp_p=0xffffda90,
io_type=PVFS_IO_WRITE) at sys-io.sm:363
#13 0x1000b078 in generic_write (dest=0xffffddb0,
buffer=0x40068008 "\177ELF\001\002\001", offset=0, count=2469777,
credentials=0xffffe060) at pvfs2-cp.c:365
#14 0x1000a824 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:180
After breaking the program with Ctrl-C, the files appear created. Any
clue where this can come from? It appears like the metadata
communication works but the data not.
Bellow the result of the ping command.
Many thanks
Florin
pvfs2-ping -m ~/florin/mnt/pvfs2/
(1) Parsing tab file...
(2) Initializing system interface...
(3) Initializing each file system found in tab file:
/home/A40001/u72877927/florin/app
s/etc/pvfs2tab...
PVFS2 servers: tcp://localhost:55555
Storage name: pvfs2-fs
Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
/home/A40001/u72877927/florin/mnt/pvfs2: Ok
(4) Searching for /home/A40001/u72877927/florin/mnt/pvfs2/ in pvfstab...
PVFS2 servers: tcp://localhost:55555
Storage name: pvfs2-fs
Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
meta servers:
tcp://localhost:55555
data servers:
tcp://localhost:55555
(5) Verifying that all servers are responding...
meta servers:
tcp://localhost:55555 Ok
data servers:
tcp://localhost:55555 Ok
(6) Verifying that fsid 1957135728 is acceptable to all servers...
Ok; all servers understand fs_id 1957135728
(7) Verifying that root handle is owned by one server...
Root handle: 1048576
Ok; root handle is owned by exactly one server.
=============================================================
The PVFS2 filesystem at /home/A40001/u72877927/florin/mnt/pvfs2/
appears to be correctly configured.
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users