Hi,

many thanks Murali. I have just tried that, but it keeps getting stuck
with an even stranger stack trace:

(gdb) bt
#0  0x0ff4b2d0 in poll () from /lib/tls/libc.so.6
#1  0x0ffc871c in ?? () from /lib/tls/libc.so.6
#2  0x0ffc871c in ?? () from /lib/tls/libc.so.6
Previous frame identical to this frame (corrupt stack?)

Any other suggestions?

Best regards
Florin

On 7/2/07, Murali Vilayannur <[EMAIL PROTECTED]> wrote:
Hi Florin,
Given that both your backtraces point to epoll(), can you run make
clean followed by configure with --disable-epoll, rebuild everything
and see if that works?
If it does work, it probably points to some epoll specific bug on ppc
either in pvfs2 or the libepoll code..
thanks,
Murali

On 7/2/07, Florin Isaila <[EMAIL PROTECTED]> wrote:
> Hi,
>
> We have installed PVFS2 2.6.3 over Ethernet on a SUSE distribution,
> locally on a biprocessor (PowerPC 970FX) machine.
>
> Some commands like pvfs2-ping, pvfs2-mkdir, pvfs2-ls (w/o parameters)
> work fine.
>
> But we can not get it run for some pvfs2-* commands. For instance
> pvfs2-cp gets stuck. Here the trace of gdb:
>
> (gdb) bt
> #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
>     incount=128, outcount=0xffff97b0, maps=0xffff93b0, status=0xffff95b0,
>     poll_timeout=10, external_mutex=0x100d2ce0)
>     at socket-collection-epoll.c:281
> #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
>     outcount=0xffff9864, error_code_array=0x100d2b80,
>     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
>     context_id=0) at bmi-tcp.c:1303
> #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
>     outcount=0x100d14cc, error_code_array=0x100d2b80,
>     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
>     max_idle_time_ms=10, context_id=0) at bmi.c:944
> #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
>     at thread-mgr.c:815
> #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> #8  0x1007025c in job_testcontext (out_id_array_p=0xffff99d0,
>     inout_count_p=0xffff99b8, returned_user_ptr_array=0xffffd1d0,
>     out_status_array_p=0xffffa1d0, timeout_ms=10, context_id=1) at job.c:4068
> #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=3,
>     error_code=0xffffd670) at client-state-machine.c:536
> ---Type <return> to continue, or q <return> to quit---
> #10 0x1001041c in PINT_client_wait_internal (op_id=3,
>     in_op_str=0x100b209c "fs_add", out_error=0xffffd670,
>     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> #11 0x10010734 in PVFS_sys_wait (op_id=3, in_op_str=0x100b209c "fs_add",
>     out_error=0xffffd670) at client-state-machine.c:861
> #12 0x10035c4c in PVFS_sys_fs_add (mntent=0x100d3030) at fs-add.sm:205
> #13 0x1004c220 in PVFS_util_init_defaults () at pvfs2-util.c:1040
> #14 0x1000a5c8 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:135
>
> Some other times (but rarely) is getting stuck at a different place:
>
> (gdb) bt
> #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
>     incount=128, outcount=0xffff9b30, maps=0xffff9730, status=0xffff9930,
>     poll_timeout=10, external_mutex=0x100d2ce0)
>     at socket-collection-epoll.c:281
> #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
>     outcount=0xffff9be4, error_code_array=0x100d2b80,
>     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, max_idle_time=10,
>     context_id=0) at bmi-tcp.c:1303
> #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
>     outcount=0x100d14cc, error_code_array=0x100d2b80,
>     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
>     max_idle_time_ms=10, context_id=0) at bmi.c:944
> #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
>     at thread-mgr.c:815
> #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> #8  0x1007025c in job_testcontext (out_id_array_p=0xffff9d50,
>     inout_count_p=0xffff9d38, returned_user_ptr_array=0xffffd550,
>     out_status_array_p=0xffffa550, timeout_ms=10, context_id=1) at job.c:4068
> #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=28,
>     error_code=0xffffda1c) at client-state-machine.c:536
> ---Type <return> to continue, or q <return> to quit---
> #10 0x1001041c in PINT_client_wait_internal (op_id=28,
>     in_op_str=0x100ac1b8 "io", out_error=0xffffda1c,
>     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> #11 0x10010734 in PVFS_sys_wait (op_id=28, in_op_str=0x100ac1b8 "io",
>     out_error=0xffffda1c) at client-state-machine.c:861
> #12 0x1001b78c in PVFS_sys_io (ref=
>       {handle = 1048570, fs_id = 1957135728, __pad1 = -26176},
>     file_req=0x100d07d8, file_req_offset=0, buffer=0x40068008,
>     mem_req=0x100efbd0, credentials=0xffffe060, resp_p=0xffffda90,
>     io_type=PVFS_IO_WRITE) at sys-io.sm:363
> #13 0x1000b078 in generic_write (dest=0xffffddb0,
>     buffer=0x40068008 "\177ELF\001\002\001", offset=0, count=2469777,
>     credentials=0xffffe060) at pvfs2-cp.c:365
> #14 0x1000a824 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:180
>
>
> After breaking the program with Ctrl-C, the files appear created. Any
> clue where this can come from? It appears like the metadata
> communication works but the data not.
>
> Bellow the result of the ping command.
>
> Many thanks
> Florin
>
> pvfs2-ping -m ~/florin/mnt/pvfs2/
>
> (1) Parsing tab file...
>
> (2) Initializing system interface...
>
> (3) Initializing each file system found in tab file:
> /home/A40001/u72877927/florin/app
>                                s/etc/pvfs2tab...
>
>    PVFS2 servers: tcp://localhost:55555
>    Storage name: pvfs2-fs
>    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
>    /home/A40001/u72877927/florin/mnt/pvfs2: Ok
>
> (4) Searching for /home/A40001/u72877927/florin/mnt/pvfs2/ in pvfstab...
>
>    PVFS2 servers: tcp://localhost:55555
>    Storage name: pvfs2-fs
>    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
>
>    meta servers:
>    tcp://localhost:55555
>
>    data servers:
>    tcp://localhost:55555
>
> (5) Verifying that all servers are responding...
>
>    meta servers:
>    tcp://localhost:55555 Ok
>
>    data servers:
>    tcp://localhost:55555 Ok
>
> (6) Verifying that fsid 1957135728 is acceptable to all servers...
>
>    Ok; all servers understand fs_id 1957135728
>
> (7) Verifying that root handle is owned by one server...
>
>    Root handle: 1048576
>      Ok; root handle is owned by exactly one server.
>
> =============================================================
>
> The PVFS2 filesystem at /home/A40001/u72877927/florin/mnt/pvfs2/
> appears to be correctly configured.
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to