Hi,
many thanks Murali. I have just tried that, but it keeps getting stuck
with an even stranger stack trace:
(gdb) bt
#0 0x0ff4b2d0 in poll () from /lib/tls/libc.so.6
#1 0x0ffc871c in ?? () from /lib/tls/libc.so.6
#2 0x0ffc871c in ?? () from /lib/tls/libc.so.6
Previous frame identical to this frame (corrupt stack?)
Any other suggestions?
Best regards
Florin
On 7/2/07, Murali Vilayannur <[EMAIL PROTECTED]> wrote:
> Hi Florin,
> Given that both your backtraces point to epoll(), can you run make
> clean followed by configure with --disable-epoll, rebuild everything
> and see if that works?
> If it does work, it probably points to some epoll specific bug on ppc
> either in pvfs2 or the libepoll code..
> thanks,
> Murali
>
> On 7/2/07, Florin Isaila <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > We have installed PVFS2 2.6.3 over Ethernet on a SUSE distribution,
> > locally on a biprocessor (PowerPC 970FX) machine.
> >
> > Some commands like pvfs2-ping, pvfs2-mkdir, pvfs2-ls (w/o parameters)
> > work fine.
> >
> > But we can not get it run for some pvfs2-* commands. For instance
> > pvfs2-cp gets stuck. Here the trace of gdb:
> >
> > (gdb) bt
> > #0 0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > #1 0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> > incount=128, outcount=0xffff97b0, maps=0xffff93b0,
status=0xffff95b0,
> > poll_timeout=10, external_mutex=0x100d2ce0)
> > at socket-collection-epoll.c:281
> > #2 0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > #3 0x10098d10 in BMI_tcp_testcontext (incount=5,
out_id_array=0x100d2b58,
> > outcount=0xffff9864, error_code_array=0x100d2b80,
> > actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
max_idle_time=10,
> > context_id=0) at bmi-tcp.c:1303
> > #4 0x1005aa18 in BMI_testcontext (incount=5,
out_id_array=0x100d2b58,
> > outcount=0x100d14cc, error_code_array=0x100d2b80,
> > actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> > max_idle_time_ms=10, context_id=0) at bmi.c:944
> > #5 0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > #6 0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> > at thread-mgr.c:815
> > #7 0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at
job.c:4661
> > #8 0x1007025c in job_testcontext (out_id_array_p=0xffff99d0,
> > inout_count_p=0xffff99b8, returned_user_ptr_array=0xffffd1d0,
> > out_status_array_p=0xffffa1d0, timeout_ms=10, context_id=1) at
job.c:4068
> > #9 0x1000fdb0 in PINT_client_state_machine_test (op_id=3,
> > error_code=0xffffd670) at client-state-machine.c:536
> > ---Type <return> to continue, or q <return> to quit---
> > #10 0x1001041c in PINT_client_wait_internal (op_id=3,
> > in_op_str=0x100b209c "fs_add", out_error=0xffffd670,
> > in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > #11 0x10010734 in PVFS_sys_wait (op_id=3, in_op_str=0x100b209c
"fs_add",
> > out_error=0xffffd670) at client-state-machine.c:861
> > #12 0x10035c4c in PVFS_sys_fs_add (mntent=0x100d3030) at
fs-add.sm:205
> > #13 0x1004c220 in PVFS_util_init_defaults () at pvfs2-util.c:1040
> > #14 0x1000a5c8 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:135
> >
> > Some other times (but rarely) is getting stuck at a different place:
> >
> > (gdb) bt
> > #0 0x0ff5596c in epoll_wait () from /lib/tls/libc.so.6
> > #1 0x100a062c in BMI_socket_collection_testglobal (scp=0x100e48b0,
> > incount=128, outcount=0xffff9b30, maps=0xffff9730,
status=0xffff9930,
> > poll_timeout=10, external_mutex=0x100d2ce0)
> > at socket-collection-epoll.c:281
> > #2 0x1009bf24 in tcp_do_work (max_idle_time=10) at bmi-tcp.c:2681
> > #3 0x10098d10 in BMI_tcp_testcontext (incount=5,
out_id_array=0x100d2b58,
> > outcount=0xffff9be4, error_code_array=0x100d2b80,
> > actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
max_idle_time=10,
> > context_id=0) at bmi-tcp.c:1303
> > #4 0x1005aa18 in BMI_testcontext (incount=5,
out_id_array=0x100d2b58,
> > outcount=0x100d14cc, error_code_array=0x100d2b80,
> > actual_size_array=0x100d2b98, user_ptr_array=0x100d2bc0,
> > max_idle_time_ms=10, context_id=0) at bmi.c:944
> > #5 0x10071fc8 in bmi_thread_function (ptr=0x0) at thread-mgr.c:239
> > #6 0x10072e24 in PINT_thread_mgr_bmi_push (max_idle_time=10)
> > at thread-mgr.c:815
> > #7 0x10071460 in do_one_work_cycle_all (idle_time_ms=10) at
job.c:4661
> > #8 0x1007025c in job_testcontext (out_id_array_p=0xffff9d50,
> > inout_count_p=0xffff9d38, returned_user_ptr_array=0xffffd550,
> > out_status_array_p=0xffffa550, timeout_ms=10, context_id=1) at
job.c:4068
> > #9 0x1000fdb0 in PINT_client_state_machine_test (op_id=28,
> > error_code=0xffffda1c) at client-state-machine.c:536
> > ---Type <return> to continue, or q <return> to quit---
> > #10 0x1001041c in PINT_client_wait_internal (op_id=28,
> > in_op_str=0x100ac1b8 "io", out_error=0xffffda1c,
> > in_class_str=0x100a97d4 "sys") at client-state-machine.c:733
> > #11 0x10010734 in PVFS_sys_wait (op_id=28, in_op_str=0x100ac1b8 "io",
> > out_error=0xffffda1c) at client-state-machine.c:861
> > #12 0x1001b78c in PVFS_sys_io (ref=
> > {handle = 1048570, fs_id = 1957135728, __pad1 = -26176},
> > file_req=0x100d07d8, file_req_offset=0, buffer=0x40068008,
> > mem_req=0x100efbd0, credentials=0xffffe060, resp_p=0xffffda90,
> > io_type=PVFS_IO_WRITE) at sys-io.sm:363
> > #13 0x1000b078 in generic_write (dest=0xffffddb0,
> > buffer=0x40068008 "\177ELF\001\002\001", offset=0, count=2469777,
> > credentials=0xffffe060) at pvfs2-cp.c:365
> > #14 0x1000a824 in main (argc=3, argv=0xffffe3b4) at pvfs2-cp.c:180
> >
> >
> > After breaking the program with Ctrl-C, the files appear created. Any
> > clue where this can come from? It appears like the metadata
> > communication works but the data not.
> >
> > Bellow the result of the ping command.
> >
> > Many thanks
> > Florin
> >
> > pvfs2-ping -m ~/florin/mnt/pvfs2/
> >
> > (1) Parsing tab file...
> >
> > (2) Initializing system interface...
> >
> > (3) Initializing each file system found in tab file:
> > /home/A40001/u72877927/florin/app
> > s/etc/pvfs2tab...
> >
> > PVFS2 servers: tcp://localhost:55555
> > Storage name: pvfs2-fs
> > Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> > /home/A40001/u72877927/florin/mnt/pvfs2: Ok
> >
> > (4) Searching for /home/A40001/u72877927/florin/mnt/pvfs2/ in
pvfstab...
> >
> > PVFS2 servers: tcp://localhost:55555
> > Storage name: pvfs2-fs
> > Local mount point: /home/A40001/u72877927/florin/mnt/pvfs2
> >
> > meta servers:
> > tcp://localhost:55555
> >
> > data servers:
> > tcp://localhost:55555
> >
> > (5) Verifying that all servers are responding...
> >
> > meta servers:
> > tcp://localhost:55555 Ok
> >
> > data servers:
> > tcp://localhost:55555 Ok
> >
> > (6) Verifying that fsid 1957135728 is acceptable to all servers...
> >
> > Ok; all servers understand fs_id 1957135728
> >
> > (7) Verifying that root handle is owned by one server...
> >
> > Root handle: 1048576
> > Ok; root handle is owned by exactly one server.
> >
> > =============================================================
> >
> > The PVFS2 filesystem at /home/A40001/u72877927/florin/mnt/pvfs2/
> > appears to be correctly configured.
> > _______________________________________________
> > Pvfs2-users mailing list
> > [email protected]
> > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >
>