Re: forcing a user: possible reason

Tom Duerbusch Wed, 21 Dec 2005 10:05:35 -0800

I ended up crashing VM.  (I think I played around just too much.)  But
crashing VM caused a lessor outage then bring eveything down normally.


Anyway...based on the console logs for the VSE machine that was hung.

Saturday morning.

Three backup jobs started in dynamic partitions.  (this is normal)
The second one got "1Q85I  N2, FEE  ~~~~GETVIS 24".
Operations ignored that machine the entire weekend.
As this machine is the central node in a Power PNET network....
Each time a job was shipped from one VSE machine to another VSE
machine, the job goes thru this "hung" machinie.  Now we started getting
"1Q85I J-RV1,A70  ~~~~~GETVIS 24" for each transmission and they hung.
(funny how Operations didn't notice some jobs were disappearing....)

Monday morning.
Someone noticed that jobs were not being transmitted.
They signed on to CICS and looked at the current console.  They didn't
display backward to see what the problem was, just that there seemed to
be some problem.
So they forced the VSE system (and it came down, normally)....

When the VSE machine was IPL'ed, it started comming up.  I see Power
being started and that is where the console log is lost.  I'm guessing I
lost a console buffer or two.  One of the next things that would have
started was PNET and the other 8 VSE machines would have had their PNET
connections started.

When I look at the console logs for some of the other machines, I do
see the PNET connection established message.

My guess is that at this time, all the PNET traffic all started, and I
ran out of GETVIS-24 memory again.  (Never tested for all machines
PNETing at the same time).

VSE hangs, no jobs started (many jobs were in cdisp=x).  

They don't know what to do, so they flush VSE again.  And this time,
VSE is really hung.  "Use count" is 5.  I assume that is the number of
transmitters that are waiting for getvis-24 memory.

Does it seem reasonable that VSE could hang, and VM couldn't force it
when it is waiting for I/O from a virtual device?  I have VCTCA for
connections...not real devices!

I'm trying to understand the conditions that caused this problem.  I
now have PCS issue a GETVIS and GETVIS F1 on the console every 4 hours. 
I initially thought the problem was caused by fragmentation of 24 bit
Getvis area.  But there shouldn't have been any fragmentation after the
IPL, so it seems that PNET is taking more that I expected.

I need to add 3 more VSE machines to the "collective".  But now I think
I need to upgrade the VSE/ESA 2.3.2 machine that is the center of the
hub, to VSE/ESA 2.7 to get more virtual space.  (z/VSE 3.1 is out as SVC
seems to be in effect.  We will never be able to migrate all of our
VSE/ESA 2.3.2 applications in 12 months...short story is we have to
migrate from our Total database to DB2 which takes a while.)

Tom Duerbusch
THD Consulting

Re: forcing a user: possible reason

Reply via email to