Hi Florin,
Thanks for getting back on that!
This is quite weird. it probably points to some platform-specific library issue.
Since we do use threads, perhaps it is time to retry running configure
by disabling usage of threads and see if that helps?

./configure --disable-thread-safety is something you can try
perhaps ./configure --enable-nptl-workaround is also something you can
try (not together with the previous one though) to workaround glibc
oddities.
Sam, RobL, Pete any ideas? I am lost..:(
Final alternative is to perhaps do a live debug on your machine if possible..
thanks,
Murali

On 7/2/07, Florin Isaila <[EMAIL PROTECTED]> wrote:
Hi,

many thanks Murali. I have just tried that, but it keeps getting stuck
with an even stranger stack trace:

(gdb) bt
#0  0x0ff4b2d0 in poll () from /lib/tls/libc.so.6
#1  0x0ffc871c in ?? () from /lib/tls/libc.so.6
#2  0x0ffc871c in ?? () from /lib/tls/libc.so.6
Previous frame identical to this frame (corrupt stack?)

Any other suggestions?

Best regards
Florin

On 7/2/07, Murali Vilayannur <[EMAIL PROTECTED]> wrote:
> Hi Florin,
> Given that both your backtraces point to epoll(), can you run make
> clean followed by configure with --disable-epoll, rebuild everything
> and see if that works?
> If it does work, it probably points to some epoll specific bug on ppc
> either in pvfs2 or the libepoll code..
> thanks,
> Murali
>
> On 7/2/07, Florin Isaila <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > We have installed PVFS2 2.6.3 over Ethernet on a SUSE distribution,
> > locally on a biprocessor (PowerPC 970FX) machine.
> >
> > Some commands like pvfs2-ping, pvfs2-mkdir, pvfs2-ls (w/o parameters)
> > work fine.
> >
> > But we can not get it run for some pvfs2-* commands. For instance
> > pvfs2-cp gets stuck. Here the trace of gdb:
> >
> > (gdb) bt
> > #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> >     incount=128, outcount=0xffff97b0, maps=0xffff93b0, status=0xffff95b0,
> >     poll_timeout=10, external_mutex=0x100d2ce0)
> >     at socket-collection-epoll.c:281
> > #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
> >     outcount=0xffff9864, error_code_array=0x100d2b80,
> >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, 
max_idle_time=10,
> >     context_id=0) at bmi-tcp.c:1303
> > #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
> >     outcount=0x100d14cc, error_code_array=0x100d2b80,
> >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> >     max_idle_time_ms=10, context_id=0) at bmi.c:944
> > #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> >     at thread-mgr.c:815
> > #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> > #8  0x1007025c in job_testcontext (out_id_array_p=0xffff99d0,
> >     inout_count_p=0xffff99b8, returned_user_ptr_array=0xffffd1d0,
> >     out_status_array_p=0xffffa1d0, timeout_ms=10, context_id=1) at 
job.c:4068
> > #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=3,
> >     error_code=0xffffd670) at client-state-machine.c:536
> > ---Type <return> to continue, or q <return> to quit---
> > #10 0x1001041c in PINT_client_wait_internal (op_id=3,
> >     in_op_str=0x100b209c "fs_add", out_error=0xffffd670,
> >     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > #11 0x10010734 in PVFS_sys_wait (op_id=3, in_op_str=0x100b209c "fs_add",
> >     out_error=0xffffd670) at client-state-machine.c:861
> > #12 0x10035c4c in PVFS_sys_fs_add (mntent=0x100d3030) at fs-add.sm:205
> > #13 0x1004c220 in PVFS_util_init_defaults () at pvfs2-util.c:1040
> > #14 0x1000a5c8 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:135
> >
> > Some other times (but rarely) is getting stuck at a different place:
> >
> > (gdb) bt
> > #0  0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > #1  0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> >     incount=128, outcount=0xffff9b30, maps=0xffff9730, status=0xffff9930,
> >     poll_timeout=10, external_mutex=0x100d2ce0)
> >     at socket-collection-epoll.c:281
> > #2  0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > #3  0x10098d10 in BMI_tcp_testcontext (incount=5, out_id_array=0x100d2b58,
> >     outcount=0xffff9be4, error_code_array=0x100d2b80,
> >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0, 
max_idle_time=10,
> >     context_id=0) at bmi-tcp.c:1303
> > #4  0x1005aa18 in BMI_testcontext (incount=5, out_id_array=0x100d2b58,
> >     outcount=0x100d14cc, error_code_array=0x100d2b80,
> >     actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> >     max_idle_time_ms=10, context_id=0) at bmi.c:944
> > #5  0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > #6  0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> >     at thread-mgr.c:815
> > #7  0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at job.c:4661
> > #8  0x1007025c in job_testcontext (out_id_array_p=0xffff9d50,
> >     inout_count_p=0xffff9d38, returned_user_ptr_array=0xffffd550,
> >     out_status_array_p=0xffffa550, timeout_ms=10, context_id=1) at 
job.c:4068
> > #9  0x1000fdb0 in PINT_client_state_machine_test (op_id=28,
> >     error_code=0xffffda1c) at client-state-machine.c:536
> > ---Type <return> to continue, or q <return> to quit---
> > #10 0x1001041c in PINT_client_wait_internal (op_id=28,
> >     in_op_str=0x100ac1b8 "io", out_error=0xffffda1c,
> >     in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > #11 0x10010734 in PVFS_sys_wait (op_id=28, in_op_str=0x100ac1b8 "io",
> >     out_error=0xffffda1c) at client-state-machine.c:861
> > #12 0x1001b78c in PVFS_sys_io (ref=
> >       {handle = 1048570, fs_id = 1957135728, __pad1 = -26176},
> >     file_req=0x100d07d8, file_req_offset=0, buffer=0x40068008,
> >     mem_req=0x100efbd0, credentials=0xffffe060, resp_p=0xffffda90,
> >     io_type=PVFS_IO_WRITE) at sys-io.sm:363
> > #13 0x1000b078 in generic_write (dest=0xffffddb0,
> >     buffer=0x40068008 "\177ELF\001\002\001", offset=0, count=2469777,
> >     credentials=0xffffe060) at pvfs2-cp.c:365
> > #14 0x1000a824 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:180
> >
> >
> > After breaking the program with Ctrl-C, the files appear created. Any
> > clue where this can come from? It appears like the metadata
> > communication works but the data not.
> >
> > Bellow the result of the ping command.
> >
> > Many thanks
> > Florin
> >
> > pvfs2-ping -m ~/florin/mnt/pvfs2/
> >
> > (1) Parsing tab file...
> >
> > (2) Initializing system interface...
> >
> > (3) Initializing each file system found in tab file:
> > /home/A40001/u72877927/florin/app
> >                                s/etc/pvfs2tab...
> >
> >    PVFS2 servers: tcp://localhost:55555
> >    Storage name: pvfs2-fs
> >    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> >    /home/A40001/u72877927/florin/mnt/pvfs2: Ok
> >
> > (4) Searching for /home/A40001/u72877927/florin/mnt/pvfs2/ in pvfstab...
> >
> >    PVFS2 servers: tcp://localhost:55555
> >    Storage name: pvfs2-fs
> >    Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> >
> >    meta servers:
> >    tcp://localhost:55555
> >
> >    data servers:
> >    tcp://localhost:55555
> >
> > (5) Verifying that all servers are responding...
> >
> >    meta servers:
> >    tcp://localhost:55555 Ok
> >
> >    data servers:
> >    tcp://localhost:55555 Ok
> >
> > (6) Verifying that fsid 1957135728 is acceptable to all servers...
> >
> >    Ok; all servers understand fs_id 1957135728
> >
> > (7) Verifying that root handle is owned by one server...
> >
> >    Root handle: 1048576
> >      Ok; root handle is owned by exactly one server.
> >
> > =============================================================
> >
> > The PVFS2 filesystem at /home/A40001/u72877927/florin/mnt/pvfs2/
> > appears to be correctly configured.
> > _______________________________________________
> > Pvfs2-users mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to