Good catch on that error, Chuck. As you noticed, it has been fixed in trunk, from which future releases will be created. 2.8.5 was already released by the time of this message. Thanks so much for reporting the problem and providing a patch.
-- Elaine -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Chuck Cranor Sent: Thursday, February 02, 2012 1:17 PM To: [email protected] Subject: [Pvfs2-users] OrangeFS 2.8.4 lebf_encode_req op 42 PVFS_SERV_TREE_GET_FILE_SIZE bug [[ forwarding rejected message, as no one seems to answer at [email protected] ]] From: [email protected] To: [email protected] Date: Mon, 30 Jan 2012 16:22:47 -0500 You are not allowed to post to this mailing list, and your message has been automatically rejected. If you think that your messages are being rejected in error, contact the mailing list owner at [email protected]. Date: Mon, 30 Jan 2012 16:22:42 -0500 From: Chuck Cranor <[email protected]> To: [email protected] Cc: Garth Gibson <[email protected]> Subject: OrangeFS 2.8.4 lebf_encode_req op 42 PVFS_SERV_TREE_GET_FILE_SIZE bug User-Agent: Mutt/1.4.2.3i Organization: Carnegie Mellon University hi- I've been trying to run the File System Test Suite (test-fs/MPI-IO TEST) on PVFS (it is open source, here: https://sourceforge.net/projects/test-fs/ ), and I found a bug in OrangeFS 2.8.4 that manifests itself with the following error in pvfs2-client.log when the number of I/O nodes is > 58: [E 12:22:20.488759] lebf_encode_req: op 42 needed 536 bytes but alloced only 524 Google search didn't come up with anything on this, so I thought I'd send a note here so the info gets indexed on the web (in case anyone else needs help with this). The problem is due to a buffer size management error in src/proto/PINT-le-bytefield.c. The code is missing the line: reqsize = extra_size_PVFS_servreq_tree_get_file_size; in the PVFS_SERV_TREE_GET_FILE_SIZE case of the lebf_initialize() function. Oddly enough, this very bug was inadvertently(?) fixed way back in 21-Jun-2010 on a different branch: http://www.beowulf-underground.org/pipermail/pvfs2-cvs/2010-June/013283.html with the comment "fixes to make the new server state machines compile with robust security" and that fix was only recently merged into the SVN head (r9123 on 2011-11-04). This is all with UBUNTU-10. I was initially trying to use UBUNTU-11 but I found that the current release of OrangeFS (2.8.4) does not compile on UBUNTU-11 due to Linux kernel API changes (see: http://www.beowulf-underground.org/pipermail/pvfs2-users/2011-September/0034 79.html ). I updated to the CVS head .... cvs -q -d :pserver:[email protected]:/anoncvs \ co -r Orange-Branch pvfs2 (this was back on 15-Oct-2011 before the move to subversion), but I found that the developer branch was unstable, and test-fs/MPI-IO TEST would break it like this (this is with a simple 3 node cluster, with machines h0, h1, and h2): mpirun -H h0,h1,h2 /usr/cmu/fs_test.x -target /m/pvfs/fs_test.out.death \ -strided 1 -supersize 1024 -shift -barriers aopen \ -experiment pvfs_20120130_0 -io posix -time 60 -touch 2 -type 2 \ -check 2 -size 47001 -noextra runs OK the first time. When you run it a second time it fails with: MPI-IO TEST v$Revision: 1.109 $: Mon Jan 30 15:26:35 2012 Mon Jan 30 15:26:35 2012: at operation bopen. Rank 1 Host h1.plfsbed.testbed.marmot.pdl.cmu.local FATAL ERROR 1327955195: Unable to open file /m/pvfs/fs_test.out.death for write. (errno=No such file or directory) Rank 2 Host h2.plfsbed.testbed.marmot.pdl.cmu.local FATAL ERROR 1327955195: Unable to open file /m/pvfs/fs_test.out.death for write. (errno=No such file or directory) Mon Jan 30 15:26:35 2012: at operation aopen. and it leaves a garbage file in 2 of the 3 nodes: h0:/users/chuck/usr/bin# ssh h0 ls -l /m/pvfs total 8 -rw-r--r-- 1 root root 0 2012-01-30 15:26 fs_test.out.death drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found h0:/users/chuck/usr/bin# ssh h1 ls -l /m/pvfs ls: cannot access /m/pvfs/fs_test.out.death: No such file or directory total 4 ?????????? ? ? ? ? ? fs_test.out.death drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found h0:/users/chuck/usr/bin# ssh h2 ls -l /m/pvfs ls: cannot access /m/pvfs/fs_test.out.death: No such file or directory total 4 ?????????? ? ? ? ? ? fs_test.out.death drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found h0:/users/chuck/usr/bin# At that point I decided I would be better off not using developer code, so I had to downgrade my OS from UBUNTU-11 to UBUNTU-10 in order to be able to compile the current OrangeFS 2.8.4 release. That's when I discovered the "lebf_encode_req op 42 error"... --- src/proto/PINT-le-bytefield.c-ORIG 2012-01-26 08:22:09.000000000 -0700 +++ src/proto/PINT-le-bytefield.c 2012-01-26 08:22:26.000000000 -0700 @@ -276,6 +276,7 @@ resp.u.tree_get_file_size.error = NULL; resp.u.tree_get_file_size.handle_count = 0; resp.u.tree_get_file_size.caller_handle_index = 0; + reqsize = extra_size_PVFS_servreq_tree_get_file_size; respsize = extra_size_PVFS_servresp_tree_get_file_size; break; case PVFS_SERV_NUM_OPS: /* sentinel, should not hit */ chuck _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
