John E. Malmberg wrote:
Craig A. Berry wrote:
On May 3, 2009, at 9:20 PM, John E. Malmberg wrote:
John E. Malmberg wrote:
Perl_sv_upgrade(pTHX_ register SV *const sv, svtype new_type)
case SVt_PVMG:
...
new_body_inline(new_body, new_type);
new_type = SVt_PVMG,
SVt_PVMG has a value of 7.
new_body = 44.
PL_Body_roots[sv_type] = 44.
From the code, it looks like this was expected to contain a valid
pointer.
From looking at the source code, it appears that the linked list of
bodies is corrupted. my_perl->Ibodyroots[7] has 44.
Yes, I see the same thing.
I have been looking at the S_more_bodies routine. Would it be
practical to put an assert on for a pointer being added to the linked
list with a value above 512? On VMS, the first page of memory is
protected no access.
I haven't had much time to poke at this, but I think an assert there
would only help if the body is created with a bogus pointer in the
SVt_PVMG slot rather than created with a good pointer that gets
clobbered later, and I think the second explanation is more likely. I
merely observe (without yet a chance fully to pursue) that 44 is a
suspicious number on a couple of different fronts.
I put some asserts in, and can confirm that the linked list is not
corrupted when it is set up.
With the bodies code, a "arena" of memory is allocated and for the body
type 7 in question, it is divided up into 32 byte chunks.
So it is possible that there is a buffer overrun if something writes 40
bytes into the body.
That might be possible if some struct is being cast on the memory and it
has a different size on VMS than on other platforms due to alignment
issues. On Alpha / VMS by default the compiler adds padding to
structures so that the members are more naturally aligned.
The structure below will have a size of 64 bits as padding will be added
to have member b start on a longword boundary.
struct foo {
char a;
long b;
}
The corruption is consistently on this same linked list.
If this were the case of a memory cell being used after it was freed, I
would expect corruptions to occur in more random places.
Running with -Dm shows that various 44-byte chunks of memory get
allocated, including arenas that are multiples of 44 in size, so if
there is a legitimate size of 44 that is added to something that
should be a good value but is actually NULL, that might be one
explanation for where the bad smell is coming from.
44 / 0x2c is the value of SS$_ABORT, which is the return value of the
system() call in IPC::Cmd::_run, which is called somewhere in the
chain following from CPANPLUS::Dist::_resolve_prereqs.[1] If there is
something inappropriate going on with a vmsish pragma and the return
value of the system() call, that's another place where something could
go wrong, but also as yet another wacky theory that I haven't been
able to prove.
Since no one else is reporting this failure, I will start looking at the
VMS specific code for implementing system() so see if I can find anything.
Unfortunately many other things in VMS can return same code, but so far,
that is the best theory I have seen.
I've attached a version of the test script that is slimmed down from
400+ lines to 99 lines but still produces the access violation.
Thanks, I will try that.
It consistently crashes when not run in debug, but only crashes
sometimes when I have it in the debugger.
I have it crashing on my assert now instead of the access violation.
My next plan is to put some code to walk that body linked list at
various places where the code implementing the system() call is writing
the status value to memory to see if the corruption can be detected.
The base of the body linked list is a off of the my_perl context variable.
[1] IPC::Cmd::_run does not quote arguments, so in its current form
it's not really suitable as a cross-platform way to run Perl
one-liners. For example, when it means to run:
perl "-M100" "-e1"
Perl v1410065408.0.0 required--this is only v5.11.0, stopped.
BEGIN failed--compilation aborted.
%SYSTEM-F-ABORT, abort
it's actually running:
$ perl -M100 -e1
syntax error at -e line 0, near "use 100 ("
Execution of -e aborted due to compilation errors.
%SYSTEM-F-ABORT, abort
So the CPANPLUS::Dist test is not distinguishing between a syntax
error and a version check failure. I don't think it makes any
difference for the access violation, but it's something I noticed
while trying to pursue that.
That probably explains some of the failures besides the access
violation. The other is probably related to a sample file having a '~'
character in the name.
-John
wb8...@qsl.net
Personal Opinion Only