Re: Access violation in SV.C new_body_inline()

2009-05-09 Thread John E. Malmberg

John E. Malmberg wrote:

Craig A. Berry wrote:

On May 3, 2009, at 9:20 PM, John E. Malmberg wrote:


John E. Malmberg wrote:



Perl_sv_upgrade(pTHX_ register SV *const sv, svtype new_type)

case SVt_PVMG:
...
   new_body_inline(new_body, new_type);
new_type = SVt_PVMG,
SVt_PVMG has a value of 7.
new_body = 44.
PL_Body_roots[sv_type] = 44.
From the code, it looks like this was expected to contain a valid 
pointer.


From looking at the source code, it appears that the linked list of 
bodies is corrupted.  my_perl->Ibodyroots[7] has 44.


Yes, I see the same thing.

I have been looking at the S_more_bodies routine.  Would it be 
practical to put an assert on for a pointer being added to the linked 
list with a value above 512?  On VMS, the first page of memory is 
protected no access.



I haven't had much time to poke at this, but I think an assert there 
would only help if the body is created with a bogus pointer in the 
SVt_PVMG slot rather than created with a good pointer that gets 
clobbered later, and I think the second explanation is more likely.  I 
merely observe (without yet a chance fully to pursue) that 44 is a 
suspicious number on a couple of different fronts.


I put some asserts in, and can confirm that the linked list is not 
corrupted when it is set up.


With the bodies code, a "arena" of memory is allocated and for the body 
type 7 in question, it is divided up into 32 byte chunks.


So it is possible that there is a buffer overrun if something writes 40 
bytes into the body.


That might be possible if some struct is being cast on the memory and it 
has a different size on VMS than on other platforms due to alignment 
issues.  On Alpha / VMS by default the compiler adds padding to 
structures so that the members are more naturally aligned.


The structure below will have a size of 64 bits as padding will be added
to have member b start on a longword boundary.

struct foo {
char a;
long b;
}

The corruption is consistently on this same linked list.

If this were the case of a memory cell being used after it was freed, I 
would expect corruptions to occur in more random places.


Running with -Dm shows that various 44-byte chunks of memory get 
allocated, including arenas that are multiples of 44 in size, so if 
there is a legitimate size of 44 that is added to something that 
should be a good value but is actually NULL, that might be one 
explanation for where the bad smell is coming from.


44 / 0x2c is the value of SS$_ABORT, which is the return value of the 
system() call in IPC::Cmd::_run, which is called somewhere in the 
chain following from CPANPLUS::Dist::_resolve_prereqs.[1]  If there is 
something inappropriate going on with a vmsish pragma and the return 
value of the system() call, that's another place where something could 
go wrong, but also as yet another wacky theory that I haven't been 
able to prove.


Since no one else is reporting this failure, I will start looking at the 
 VMS specific code for implementing system() so see if I can find anything.


Unfortunately many other things in VMS can return same code, but so far, 
that is the best theory I have seen.
 
I've attached a version of the test script that is slimmed down from 
400+ lines to 99 lines but still produces the access violation.


Thanks, I will try that.


It consistently crashes when not run in debug, but only crashes 
sometimes when I have it in the debugger.


I have it crashing on my assert now instead of the access violation.

My next plan is to put some code to walk that body linked list at 
various places where the code implementing the system() call is writing 
the status value to memory to see if the corruption can be detected.


The base of the body linked list is a off of the my_perl context variable.

[1]  IPC::Cmd::_run does not quote arguments, so in its current form 
it's not really suitable as a cross-platform way to run Perl 
one-liners.  For example, when it means to run:


perl "-M100" "-e1"
Perl v1410065408.0.0 required--this is only v5.11.0, stopped.
BEGIN failed--compilation aborted.
%SYSTEM-F-ABORT, abort

it's actually running:

$ perl -M100 -e1
syntax error at -e line 0, near "use 100 ("
Execution of -e aborted due to compilation errors.
%SYSTEM-F-ABORT, abort

So the CPANPLUS::Dist test is not distinguishing between a syntax 
error and a version check failure.  I don't think it makes any 
difference for the access violation, but it's something I noticed 
while trying to pursue that.


That probably explains some of the failures besides the access 
violation.  The other is probably related to a sample file having a '~' 
character in the name.


-John
wb8...@qsl.net
Personal Opinion Only



Re: maint-5.10-1131-gdfc0ab6 on VMS status

2009-05-09 Thread John E. Malmberg

Craig A. Berry wrote:
With a -Duseithreads build using HP C V7.3-018 on OpenVMS IA64 V8.3-1H1, 
I get:


ext/Cwd/t/cwd.FAILED at 
test 23
lib/Archive/Extract/t/01_Archive-Extract..FAILED at 
test 74
lib/CPANPLUS/Dist/Build/t/02_CPANPLUS-Dist-Build..FAILED at 
test 11
lib/CPANPLUS/t/19_CPANPLUS-Dist...FAILED at 
test 62
lib/ExtUtils/t/basic..FAILED--expected 
81 tests, saw 84
lib/Module/Build/t/compat.FAILED at 
test 22
lib/Module/Build/t/extFAILED at 
test 142

Failed 7 tests out of 1560, 99.55% okay.

This is identical to blead except for the Cwd test, which succeeds in 
blead.


The next step is digging around in all the relevant RT queues to see 
what's already been fixed but not brought into the core yet.


As of blead 2009-04-25.23.16:08, with the patches that I have previously 
submitted, I only have the two CPANPLUS tests failing.


-John
wb8...@qsl.net
Personal Opinion Only