I ended up crashing VM. (I think I played around just too much.) But crashing VM caused a lessor outage then bring eveything down normally.
Anyway...based on the console logs for the VSE machine that was hung. Saturday morning. Three backup jobs started in dynamic partitions. (this is normal) The second one got "1Q85I N2, FEE ~~~~GETVIS 24". Operations ignored that machine the entire weekend. As this machine is the central node in a Power PNET network.... Each time a job was shipped from one VSE machine to another VSE machine, the job goes thru this "hung" machinie. Now we started getting "1Q85I J-RV1,A70 ~~~~~GETVIS 24" for each transmission and they hung. (funny how Operations didn't notice some jobs were disappearing....) Monday morning. Someone noticed that jobs were not being transmitted. They signed on to CICS and looked at the current console. They didn't display backward to see what the problem was, just that there seemed to be some problem. So they forced the VSE system (and it came down, normally).... When the VSE machine was IPL'ed, it started comming up. I see Power being started and that is where the console log is lost. I'm guessing I lost a console buffer or two. One of the next things that would have started was PNET and the other 8 VSE machines would have had their PNET connections started. When I look at the console logs for some of the other machines, I do see the PNET connection established message. My guess is that at this time, all the PNET traffic all started, and I ran out of GETVIS-24 memory again. (Never tested for all machines PNETing at the same time). VSE hangs, no jobs started (many jobs were in cdisp=x). They don't know what to do, so they flush VSE again. And this time, VSE is really hung. "Use count" is 5. I assume that is the number of transmitters that are waiting for getvis-24 memory. Does it seem reasonable that VSE could hang, and VM couldn't force it when it is waiting for I/O from a virtual device? I have VCTCA for connections...not real devices! I'm trying to understand the conditions that caused this problem. I now have PCS issue a GETVIS and GETVIS F1 on the console every 4 hours. I initially thought the problem was caused by fragmentation of 24 bit Getvis area. But there shouldn't have been any fragmentation after the IPL, so it seems that PNET is taking more that I expected. I need to add 3 more VSE machines to the "collective". But now I think I need to upgrade the VSE/ESA 2.3.2 machine that is the center of the hub, to VSE/ESA 2.7 to get more virtual space. (z/VSE 3.1 is out as SVC seems to be in effect. We will never be able to migrate all of our VSE/ESA 2.3.2 applications in 12 months...short story is we have to migrate from our Total database to DB2 which takes a while.) Tom Duerbusch THD Consulting
