Good catch on that error, Chuck. As you noticed, it has been fixed in trunk,
from which future releases will be created. 2.8.5 was already released by
the time of this message. Thanks so much for reporting the problem and
providing a patch.

-- Elaine

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Chuck
Cranor
Sent: Thursday, February 02, 2012 1:17 PM
To: [email protected]
Subject: [Pvfs2-users] OrangeFS 2.8.4 lebf_encode_req op 42
PVFS_SERV_TREE_GET_FILE_SIZE bug


[[ forwarding rejected message, as no one seems to 
   answer at [email protected] ]]


From: [email protected]
To: [email protected]
Date: Mon, 30 Jan 2012 16:22:47 -0500

You are not allowed to post to this mailing list, and your message has
been automatically rejected.  If you think that your messages are
being rejected in error, contact the mailing list owner at
[email protected].


Date: Mon, 30 Jan 2012 16:22:42 -0500
From: Chuck Cranor <[email protected]>
To: [email protected]
Cc: Garth Gibson <[email protected]>
Subject: OrangeFS 2.8.4 lebf_encode_req op 42 PVFS_SERV_TREE_GET_FILE_SIZE
bug
User-Agent: Mutt/1.4.2.3i
Organization: Carnegie Mellon University

hi-

    I've been trying to run the File System Test Suite (test-fs/MPI-IO TEST)
on PVFS (it is open source, here: https://sourceforge.net/projects/test-fs/
),
and I found a bug in OrangeFS 2.8.4 that manifests itself with the 
following error in pvfs2-client.log when the number of I/O nodes is > 58:


[E 12:22:20.488759] lebf_encode_req: op 42 needed 536 bytes but alloced only
524

Google search didn't come up with anything on this, so I thought I'd
send a note here so the info gets indexed on the web (in case anyone
else needs help with this).

The problem is due to a buffer size management error in 
src/proto/PINT-le-bytefield.c.  The code is missing the line:

         reqsize = extra_size_PVFS_servreq_tree_get_file_size;

in the PVFS_SERV_TREE_GET_FILE_SIZE case of the lebf_initialize() 
function.


Oddly enough, this very bug was inadvertently(?) fixed way back 
in 21-Jun-2010 on a different branch:

http://www.beowulf-underground.org/pipermail/pvfs2-cvs/2010-June/013283.html

with the comment "fixes to make the new server state machines compile 
with robust security" and that fix was only recently merged into the SVN
head (r9123 on 2011-11-04).



This is all with UBUNTU-10.   I was initially trying to use UBUNTU-11
but I found that the current release of OrangeFS (2.8.4) does not 
compile on UBUNTU-11 due to Linux kernel API changes (see:

http://www.beowulf-underground.org/pipermail/pvfs2-users/2011-September/0034
79.html

).   I updated to the CVS head ....

  cvs -q -d :pserver:[email protected]:/anoncvs \
          co -r Orange-Branch pvfs2

(this was back on 15-Oct-2011 before the move to subversion), but I
found that the developer branch was unstable, and test-fs/MPI-IO TEST
would break it like this (this is with a simple 3 node cluster, with
machines h0, h1, and h2):


mpirun -H h0,h1,h2 /usr/cmu/fs_test.x -target /m/pvfs/fs_test.out.death \
  -strided 1 -supersize 1024 -shift -barriers aopen \
  -experiment pvfs_20120130_0 -io posix -time 60 -touch 2 -type 2 \
  -check 2 -size 47001 -noextra 

runs OK the first time.  When you run it a second time it fails with:

MPI-IO TEST v$Revision: 1.109 $: Mon Jan 30 15:26:35 2012

Mon Jan 30 15:26:35 2012: at operation bopen.
Rank 1 Host h1.plfsbed.testbed.marmot.pdl.cmu.local FATAL ERROR 1327955195:
Unable to open file /m/pvfs/fs_test.out.death for write. (errno=No such file
or directory)
Rank 2 Host h2.plfsbed.testbed.marmot.pdl.cmu.local FATAL ERROR 1327955195:
Unable to open file /m/pvfs/fs_test.out.death for write. (errno=No such file
or directory)
Mon Jan 30 15:26:35 2012: at operation aopen.


and it leaves a garbage file in 2 of the 3 nodes:

h0:/users/chuck/usr/bin# ssh h0 ls -l /m/pvfs
total 8
-rw-r--r-- 1 root root    0 2012-01-30 15:26 fs_test.out.death
drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found
h0:/users/chuck/usr/bin# ssh h1 ls -l /m/pvfs
ls: cannot access /m/pvfs/fs_test.out.death: No such file or directory
total 4
?????????? ? ?    ?       ?                ? fs_test.out.death
drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found
h0:/users/chuck/usr/bin# ssh h2 ls -l /m/pvfs
ls: cannot access /m/pvfs/fs_test.out.death: No such file or directory
total 4
?????????? ? ?    ?       ?                ? fs_test.out.death
drwxrwxrwx 1 root root 4096 2012-01-30 14:51 lost+found
h0:/users/chuck/usr/bin# 




At that point I decided I would be better off not using developer code,
so I had to downgrade my OS from UBUNTU-11 to UBUNTU-10 in order to
be able to compile the current OrangeFS 2.8.4 release.  That's when
I discovered the "lebf_encode_req op 42 error"...

--- src/proto/PINT-le-bytefield.c-ORIG  2012-01-26 08:22:09.000000000 -0700
+++ src/proto/PINT-le-bytefield.c       2012-01-26 08:22:26.000000000 -0700
@@ -276,6 +276,7 @@
                resp.u.tree_get_file_size.error = NULL;
                resp.u.tree_get_file_size.handle_count = 0;
                 resp.u.tree_get_file_size.caller_handle_index = 0;
+                reqsize = extra_size_PVFS_servreq_tree_get_file_size;
                respsize = extra_size_PVFS_servresp_tree_get_file_size;
                break;
             case PVFS_SERV_NUM_OPS:  /* sentinel, should not hit */



chuck


_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to