Hi. I hava a problem:
[mix@smart bin]$ trun ./pvfs2-cp ./pvfs2-cp /home/mix/pvfs2fs -n 4 [E 17:54:31.582929] mem_to_bmi_callback_fn: I/O error occurred [E 17:54:31.583273] handle_io_error: flow proto error cleanup started on 0x9d71604: Message too long [E 17:54:31.583374] handle_io_error: flow proto 0x9d71604 canceled 0 operations, will clean up. [E 17:54:31.583469] handle_io_error: flow proto 0x9d71604 error cleanup finished: Message too long [E 17:54:31.584654] mem_to_bmi_callback_fn: I/O error occurred [E 17:54:31.584767] handle_io_error: flow proto error cleanup started on 0x9d71cc0: Message too long [E 17:54:31.584860] handle_io_error: flow proto 0x9d71cc0 canceled 0 operations, will clean up. [E 17:54:31.584961] handle_io_error: flow proto 0x9d71cc0 error cleanup finished: Message too long ^Ctask2: Program /home/mix/orfs/bin/pvfs2-cp exited with exitcode 255. servers reaction: [mix@smart sbin]$ trun ./pvfs2-server ./fs.conf -d -n 0 task1: pvfs2-server started on nodes 0 [S 11/22/2011 20:49:32] PVFS2 Server on node torus0 version 2.8.4-orangefs starting... [E 11/22/2011 20:49:32] BMI_initialize: j=0, ladr = m2://0, proto=m2: bmi_m2 [S 11/22/2011 20:49:34] PVFS2 Server ready. [E 11/22/2011 20:55:37] job_time_mgr_expire: job time out: cancelling flow operation, job_id: 1000. [E 11/22/2011 20:55:37] fp_multiqueue_cancel: flow proto cancel called on 0x83c2a38 [E 11/22/2011 20:55:37] fp_multiqueue_cancel: I/O error occurred [E 11/22/2011 20:55:37] handle_io_error: flow proto error cleanup started on 0x83c2a38: Operation cancelled (possibly due to timeout) [E 11/22/2011 20:55:37] handle_io_error: flow proto 0x83c2a38 canceled 1 operations, will clean up. [mix@smart bin]$ trun ./pvfs2-ls -l /home/mix/pvfs2fs -n 4 task2: pvfs2-ls started on nodes 4 -rwxr-xr-x 1 mix mix 0 2011-11-22 20:53 pvfs2-cp drwxrwxrwx 1 mix mix 4096 2011-11-21 17:24 lost+found task2: Program /home/mix/orfs/bin/pvfs2-ls exited with exitcode 0 file exists but it empty when i run pvfs2-validate one server crushes and other servers doesn't respond to other requests from pvfs2 utilities: [mix@smart bin]$ trun ./pvfs2-validate -d /home/mix/pvfs2fs -n 4 task2: pvfs2-validate started on nodes 4 ^Ctask2: Program /home/mix/orfs/bin/pvfs2-validate exited with exitcode 255. (pvfs2-validate also hangs) server that crushes: [E 11/22/2011 20:57:55] Error: poorly formatted protocol message received. [E 11/22/2011 20:57:55] Too small: message only 0 bytes. [E 11/22/2011 20:57:55] msgpairarray decode error: Protocol error [E 11/22/2011 20:57:55] PVFS2 server: signal 11, faulty address is (nil), from 0x80c30ac [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server [0x80c30ac] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server [0x80e1493] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server(PINT_state_machine_invoke+0x12f) [0x80de9b1] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server(PINT_state_machine_next+0x23c) [0x80ded83] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server(PINT_state_machine_continue+0x18) [0x80dedb7] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server(main+0x665) [0x80586c9] [E 11/22/2011 20:57:55] [bt] /lib/libc.so.6(__libc_start_main+0xe0) [0xbb5390] [E 11/22/2011 20:57:55] [bt] /home/mix/orfs/sbin/pvfs2-server [0x8057f01] task0: Program /home/mix/orfs/sbin/pvfs2-server exited with exitcode 11. then I run servers again and pvfs2-validate doesn't claim about errors and [mix@smart bin]$ trun ./pvfs2-ls -l /home/mix/pvfs2fs -n 4 task2: pvfs2-ls started on nodes 4 -rwxr-xr-x 1 mix mix 0 2011-11-22 20:53 pvfs2-cp drwxrwxrwx 1 mix mix 4096 2011-11-21 17:24 lost+found task2: Program /home/mix/orfs/bin/pvfs2-ls exited with exitcode 0. null-sized file exists and now it is ok for pvfs2-validate. This problem does not occur when I trying to copy small file: mix@smart bin]$ trun ./pvfs2-cp ./pvfs2tab /home/mix/pvfs2fs -n 4 task2: pvfs2-cp started on nodes 4 task2: Program /home/mix/orfs/bin/pvfs2-cp exited with exitcode 0. and back [mix@smart bin]$ trun ./pvfs2-cp /home/mix/pvfs2fs/pvfs2tab /home/mix/p2tab -n 4 task2: pvfs2-cp started on nodes 4 task2: Program /home/mix/orfs/bin/pvfs2-cp exited with exitcode 0. [mix@smart bin]$ trun ./pvfs2-ls -l /home/mix/pvfs2fs -n 4 task2: pvfs2-ls started on nodes 4 -rwxr-xr-x 1 mix mix 0 2011-11-22 20:53 pvfs2-cp -rw-rw-r-- 1 mix mix 60 2011-11-22 21:04 pvfs2tab drwxrwxrwx 1 mix mix 4096 2011-11-21 17:24 lost+found task2: Program /home/mix/orfs/bin/pvfs2-ls exited with exitcode 0. [mix@smart bin]$ diff ./pvfs2tab /home/mix/p2tab [mix@smart bin]$ ls -l /home/mix/p2tab -rw-rw-r-- 1 mix mix 60 2011-11-22 21:44 /home/mix/p2tab no differencies between files It is seems like pvfs2-cp trying to send file with one message but maximum message size is 8kb (in my bmi_m2 method) and in log I found that BMI_post_send_list tryes to send one buffer of size 49216 bytes. early it calls bmi_get_info with option 10 (BMI_GET_UNEXP_SIZE) and send unexpected message to server and receives message from the server. But it never calls bmi_get_info with option 3 (BMI_CHECK_MAXSIZE) and bmi_post_send_list returns BMI_EMSGSIZE. Is it a problem in pvfs2-cp? or bmi method must support sending of big expected messages (10 mb for instance)? Thanks, Mikhail Gilmendinov
_______________________________________________ Pvfs2-developers mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
