I wrote:
At 11:42 AM -0500 12/5/03, Tom Edelson wrote:

(Still back at Perl 5.6.1, and VMS V7.2-2), I can get an access violation from Perl by giving the "glob" function a really long argument, which also would return a lot of files. For example, rename a copy of Perl to a very deep directory location (I did it on an ODS-5 volume) and do the equivalent of a glob on PERL_ROOT:[LIB]:

$ set ver
$ @ temp
$ PERL -e "PRINT (glob 
'$1$DUA1400:[RE_TOEDEL.VERY_VERY_LONG_DIRECTORY_NAME.MAN_IS_THIS_EVER_A_LONG_DIR_NAME.HOW_COULD_ANYONE_GIVE_A
_DIRECTORY_SUCH_A_LONG_NAME.2003-11-25.INSTALL.PGM.PERL.LIB]*.*') ? 'yes' : 'no';"
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000000636C7FFF, 
PC=0000000000145664, PS=0000001B

One thing to keep in mind is that we definitely don't handle
resultant path specs longer than NAM$C_MAXRSS (i.e., 255).

OK, I've now taken a look at this and I don't think length is the
problem, or at least not in the way I thought it was. Either RMS or
lib$find_file is converting the directory name to a short version
composed of the DID if the full result won't fit in the string we give
it, so the 255 limit is not the issue. But there is a genuine and very subtle bug here.


Tom, if you could try the patch below and confirm that it solves the
problem you are seeing, I'll try to explain what I *think* was
happening. The patch is against the latest bleadperl but in a really
ancient section of code, so if you have trouble applying it, just
replace "(unsigned long int) *rslt" with "*((unsigned short int*)rslt)".

It's actually harder to explain why the existing code ever worked than
it is to explain why it failed. The background here is that
lib$find_file, when passed a varying string descriptor (DSC$K_CLASS_VS)
for the result string, will write the resultant length into the first
word of the buffer that the descriptor points to.

I believe the problem starts from the fact that the buffer here is a
character string, which, this being C, is an array of one-byte signed
integers (we should never forget how truly bizarre that is, however
familiar). When we try to get the resultant length by doing the cast
"(unsigned long int) *rslt" we are saying take the one-byte signed
integer at rslt[0] and convert it to an unsigned long. This works fine
up to a certain value of rslt[0],(probably 0xF0, or 240 decimal), but
beyond that the rules for converting the sign bit give us a longword
that is definitely not a real string length, and when added to a pointer
gives really strange results off in never-never land. Were we attempting
to handle string lengths beyond what can be stored in a byte, we would
have still other problems since the second byte of the length word would
be ignored.

So what we really want to do is "*((unsigned short int*)rslt)", or, in
other words, treat rslt as a pointer to an unsigned word, and only then
(now that we know what kind of beast it points to) dereference it. The
patch also has an additional safety check to make sure we never get
stuck thinking the end of the string is before the beginning.

--- doio.c;-0   Thu Oct 23 03:28:10 2003
+++ doio.c      Fri Dec  5 18:53:14 2003
@@ -2290,8 +2290,9 @@
                if (*cp == '?') *cp = '%';  /* VMS style single-char wildcard */
            while (ok && ((sts = lib$find_file(&wilddsc,&rsdsc,&cxt,
                                               &dfltdsc,NULL,NULL,NULL))&1)) {
-               end = rstr + (unsigned long int) *rslt;
-               if (!hasver) while (*end != ';') end--;
+               /* 1st word of varying string descriptor contains result length */
+               end = rstr + *((unsigned short int*)rslt);
+               if (!hasver) while (*end != ';' && end > rstr) end--;
                *(end++) = '\n';  *end = '\0';
                for (cp = rstr; *cp; cp++) *cp = _tolower(*cp);
                if (hasdir) {

Reply via email to