subject:"SVCDUMP capture phase statistics....."

Re: SVCDUMP capture phase statistics.....

2008-04-17 Thread Jim Mulder

IBM Mainframe Discussion List IBM-MAIN@BAMA.UA.EDU wrote on 04/17/2008 
01:05:29 AM:

 thanks for fixing the time stamp! 

  The thanks for that one should go to Ralph Sharpe. 

 I looked at RMF M3, at the storage statistics, in particular the 
 working set size:
 18:36:30   234
 18:37:00 19097 (60%delay, of this 27%common, 33% locl, 33% other)
 18:37:30 70784 (57%delay, of this 57% locl, 53% other)
 18:38:00 no delay (capture phase is done), WSS is 72xxx

  In addition to the page-in delays, they may be delays 
waiting for page-out I/O to make frames available.
 
 PGIN Rate at 18:37:00 33 for GRS
  18:37:30 50 for GRS, 433 for NDM and 63 for VTAM
  18:38:00 52 for VTAM, 18 for GRS
 
 From the top of my head, I have no real clue how we can drop the 
 time the global capture phase takes, well, other than turning off 
 GRSQ completely (or setting this slip trap: sl set,c=0c4,j=NDM,
 a=nodump,e; which probably wouldn't do much good as this is a dump 
 scheduled by VTAM:-) )
 
 That lpar has 8704M real with the local page ds's normally at 10%. 
 Should IMS ever decide to take a dump, chances are good that it will
 be useless as we have to run q=no with these capture times in order 
 not to hit the TCPIP timeouts.

  There isn't much you can do other than spend real money to buy
more real storage to avoid paging, and spend real money to buy
to fastest DASD boxes and attach them to the fastest channels 
(if you aren't already doing that).

  There are some things z/OS development could do, the most 
important of which for your needs would be to move GRSQ processing
to occur after system nondispatchabilty has been reset, which 
might allow you to use q=yes without TCPIP timeouts. 

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-17 Thread Barbara Nitz

The thanks for that one should go to Ralph Sharpe. 
Thanks Ralph! :-)

There isn't much you can do other than spend real money to buy
more real storage to avoid paging, and spend real money to buy
to fastest DASD boxes and attach them to the fastest channels 
(if you aren't already doing that).

I was afraid of that. As it happens, we're just buying more real, but that was 
intended for our ever-growing VMs, or rather the linuxes under VM. My colleague 
will attempt to get some more for that system. Unfortunately, NDM dumping with 
maddening regularity is a fact of life, and we cannot create the same load on 
the test systems that we have in production, so we cannot get them fixed before 
going to production.

And then we'll wait for z/OS development to do something for us :-)

Best regards, Barbara
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-17 Thread Tom Marchant

On Thu, 17 Apr 2008 07:05:29 +0200, Barbara Nitz wrote:

That lpar has 8704M real with the local page ds's normally at 10%.

You didn't mention how many local page data sets you have.  Perhaps more 
locals, spread across more devices and channels would help.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-17 Thread Barbara Nitz

You didn't mention how many local page data sets you have. Perhaps more 
locals, spread across more devices and channels would help.

10 locals, to each its own volume, each 3300cyl big. More channels? No clue, I 
consider that 'hardware'. They're all behind the same controller. 

The interesting thing was looking at RMF (only showing three here):
X12P27 339033990-6 
X12P01 339032105   
X12P31 339032105   
I went and asked why there is one 3990-6 when the others are 2105 (all), and 
was told that RMF has no clue. Hardware configuration is identical.
DS P,D23E,1 
UNIT DTYPE  M CNT VOLSER  CHPID=PATH STATUS 
 RTYPE   SSID CFW TC   DFW   PIN  DC-STATE CCA  DDC   ALT  CU-TYPE  
D23E,33903 ,A,000,X12P27,B3=+ B4=+ B5=+ B6=+ B7=+ BE=+  
 2105D200  Y  YY.  YY.N   SIMPLEX   3E   3E2105 

Nobody (including me) wanted to open an ETR for this. Well, we're rolling out a 
refresh, maybe its fixed in there somewhere...

Best regards, Barbara
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-16 Thread Barbara Nitz

Wl,

after setting GRSQ to CONTENTION (IBM default) and fortunately Q=NO, too, NDM 
proved true to prediction and dumped again in production with an 'already 
fixed' problem :-)

But: Here's the relevant part of the dump statistics (some stuff snipped):
Dump was complete 

Total dump capture time   00:00:59.980731   

System nondispatchability start   09/18/2042 01:53:47.370496 - nice
System set nondispatchable09/18/2042 01:53:47.370496
Time to become nondispatchable00:00:00.00   

Global storage start  04/15/2008 18:37:11.899468
Global storage end04/15/2008 18:37:19.323612
Global storage capture time   00:00:07.424144   

System reset dispatchable 09/18/2042 01:53:47.370496
System was nondispatchable00:00:00.00   

Asid 01DA (NDM):  
  Local storage start 04/15/2008 18:37:12.316164
  Local storage end   04/15/2008 18:38:11.861746
  Local storage capture time  00:00:59.545582   
  Tasks reset dispatchable04/15/2008 18:38:11.861760
  Tasks were nondispatchable  00:00:59.545596   

Asid 0042 (VTAM):  
  Local storage start 04/15/2008 18:37:12.316141
  Local storage end   04/15/2008 18:37:50.354166
  Local storage capture time  00:00:38.038024   
  Tasks reset dispatchable04/15/2008 18:37:50.354187
  Tasks were nondispatchable  00:00:38.038046   

Dump Exits  
  Exit address04353880  
  Home ASID   0005  
  Exit start  04/15/2008 18:37:19.323615
  Exit end04/15/2008 18:37:45.453518
  Exit time   00:00:26.129902   
  Exit attributes:  Global, Sdump, SYSMDUMP   

What the heck is the GRS exit doing so long with the IBM default of 
grsq(CONTENTION)? This was longer than with the grsq(all) setting! It is still 
a global exit and would still run during global nondisp when Q=YES is set.

Best regards, Barbara
  
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-16 Thread Jim Mulder

IBM Mainframe Discussion List IBM-MAIN@BAMA.UA.EDU wrote on 04/16/2008 
03:38:13 AM:

 after setting GRSQ to CONTENTION (IBM default) and fortunately Q=NO,
 too, NDM proved true to prediction and dumped again in production 
 with an 'already fixed' problem :-)
 
 But: Here's the relevant part of the dump statistics (some stuff 
snipped):
 Dump was complete 
 
 Total dump capture time   00:00:59.980731 
 
 System nondispatchability start   09/18/2042 01:53:47.370496 - nice
 System set nondispatchable09/18/2042 01:53:47.370496
 Time to become nondispatchable00:00:00.00 

  There is a FIN APAR OA22730 for IEAVTSFS displaying 
year 2042 timestamps, and this has been fixed in z/OS 1.10.
 
 
 Global storage start  04/15/2008 18:37:11.899468
 Global storage end04/15/2008 18:37:19.323612
 Global storage capture time   00:00:07.424144 
 
 System reset dispatchable 09/18/2042 01:53:47.370496
 System was nondispatchable00:00:00.00 
 
 Asid 01DA (NDM): 
   Local storage start 04/15/2008 18:37:12.316164
   Local storage end   04/15/2008 18:38:11.861746
   Local storage capture time  00:00:59.545582 
   Tasks reset dispatchable04/15/2008 18:38:11.861760
   Tasks were nondispatchable  00:00:59.545596 
 
 Asid 0042 (VTAM): 
   Local storage start 04/15/2008 18:37:12.316141
   Local storage end   04/15/2008 18:37:50.354166
   Local storage capture time  00:00:38.038024 
   Tasks reset dispatchable04/15/2008 18:37:50.354187
   Tasks were nondispatchable  00:00:38.038046 
 
 Dump Exits 
   Exit address04353880 
   Home ASID   0005 
   Exit start  04/15/2008 18:37:19.323615
   Exit end04/15/2008 18:37:45.453518
   Exit time   00:00:26.129902 
   Exit attributes:  Global, Sdump, SYSMDUMP 
 
 What the heck is the GRS exit doing so long with the IBM default of 
 grsq(CONTENTION)? This was longer than with the grsq(all) setting! 
 It is still a global exit and would still run during global nondisp 
 when Q=YES is set.

  Seeing as NDM and VTAM dumping also took fairly long time, 
my first guess would be that dump capture caused significant paging
activity.  Even with GRSQ(CONTENTION), GRS still dumps all of the
local ENQ information for the system being dumped, and that 
can be a very large amount of storage.  The purpose of 
GRSQ(CONTENTION) is to reduce the amount of data
that GRSQ processing needs to obtain via XCF signalling from
the other systems. 

  The dump exit interface currently does not allow 
dump exits to take advantage of an RSM internal interface
used by other parts of SDUMP data capture processing to reduce
paging effects.  I think that is currently considered for the
second release after z/OS 1.10 (no promises, of course).

  Also some time we would like to look at including some
system performance information in the SVCDUMP capture statistics.
For example, it should not be too hard to snapshot some counters
from the RCE or ASMVT at the beginning and end of dump capture
in order to calulate the amount of paging that the system did
while the dump was being captured. 

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-16 Thread Barbara Nitz

Hi Jim,

thanks for fixing the time stamp! I think this time around it was somewhere 
else than last time.

I looked at RMF M3, at the storage statistics, in particular the working set 
size:
18:36:30   234
18:37:00 19097 (60%delay, of this 27%common, 33% locl, 33% other)
18:37:30 70784 (57%delay, of this 57% locl, 53% other)
18:38:00 no delay (capture phase is done), WSS is 72xxx

PGIN Rate at 18:37:00 33 for GRS
 18:37:30 50 for GRS, 433 for NDM and 63 for VTAM
 18:38:00 52 for VTAM, 18 for GRS

From the top of my head, I have no real clue how we can drop the time the 
global capture phase takes, well, other than turning off GRSQ completely (or 
setting this slip trap: sl set,c=0c4,j=NDM,a=nodump,e; which probably wouldn't 
do much good as this is a dump scheduled by VTAM:-) )

That lpar has 8704M real with the local page ds's normally at 10%. Should IMS 
ever decide to take a dump, chances are good that it will be useless as we have 
to run q=no with these capture times in order not to hit the TCPIP timeouts.

Best regards, Barbara
-- 
GMX startet ShortView.de. Hier findest Du Leute mit Deinen Interessen!
Jetzt dabei sein: http://www.shortview.de/[EMAIL PROTECTED]

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-11 Thread Jim Mulder

IBM Mainframe Discussion List IBM-MAIN@BAMA.UA.EDU wrote on 04/11/2008 
01:11:14 AM:
 I was quite unaware, 
 though (or didn't read it properly because I didn't want to 
 subconsciously), that GRS collection is a global exit. I would have 
 sworn it is local.
 
 I just tested with grsq set to contention and the capture time 
 decreased drastically. So let my scars be a warning to you! :-) 
 Guess I need to say goodbye to that debugging help for the sake of 
 20s gain in production. (That means we can go back to q=yes, though!)

  We are painfully aware of the requirement to move GRSQ processing 
to occur after system nondispatchability has been reset.  We can't
just change that GRS exit to be a local exit, because then it 
would run for each address space being dumped, and we only want
it to run once per dump. 

 (using another Jim-Mulder-
 Special, LISTTOD)

  LISTTOD is a Bob Wright special.  You are thinking of the 
local (to Poughkeepsie System Test) VERBX CNVTOD, which we used 
before LISTTOD existed.

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-11 Thread Patrick Falcone

much snipped below
   
  Hi Jim,
   
  Which was one of the reasons I questioned Barbara about having NDM in 
discretionary. The 25 seconds would have been at least 2 WLM adjustments but on 
a potentially fully loaded LPAR I'm wondering if there is something that alerts 
WLM/SRM to make a discretionary address space dispatchable under certain 
circumstances. Although Barbara stated that NDM appeared to be in it could have 
been in ready.

Jim Mulder [EMAIL PROTECTED] wrote:
  IBM Mainframe Discussion List wrote on 04/10/2008 
02:05:33 AM:


 
This is probably because of the partial dump reason code that 
says only fixed storage was dumped for the address space. This
happens when SDUMP detects that the dump task in the address 
space never got started (after 25 seconds), so it instead
dumps the fixed frames of the address space, accessing them
via their real storage addresses. Apparently this processing 
doesn't set the Local Storage End timestamp (that can be corrected
in a future release). So the question would be, why didn't
the NDM dump task get going for 25 seconds (that should be 25 
seconds after resetting system nondispatchability, I think).
Since NDM's LSQA should be in the dump, possibly 
SUMM FOR ASID(x'16A') might have some clues. Is the dump task
still in a wait? Is the ECB POSTed? 


Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY



--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

SVCDUMP capture phase statistics.....

2008-04-10 Thread Barbara Nitz

..and how to interpret them.

Yesterday connect:direct took another of those abend0c4 that Sterling always 
tells us 'they're all fixed'. They're all from ISTAICPT in SRB mode... And of 
course they always occur in production where there is extremely high load (both 
CPU and workload)

The problem was that it took a full minute between IST413I VTAM DUMPING FOR JOB 
NDM and IEA794I SVC DUMP HAS CAPTURED, with system-wide non-dispatchability due 
to Q=YES 28seconds. This causes TCPIP to get 'adjacency failures' and to drop 
lots of MQ channel connections, which has a major impact on customers connected 
to us. Which means a lot of management attention.

The dump statistics tell me this:
Total dump capture time   00:00:57.956068
System nondispatchability start   04/09/2008 15:04:43.405987 
System set nondispatchable04/09/2008 15:04:43.406106 
Global storage start  04/09/2008 15:04:43.053199 
Global storage end04/09/2008 15:04:48.466431 
Global storage capture time   00:00:05.413231 
System reset dispatchable 04/09/2008 15:05:12.204912 
System was nondispatchable00:00:28.798924

Asid 016A (NDM):   
  Local storage start 04/09/2008 15:05:12.204988 
  Local storage end   09/18/2042 01:53:47.370496  -- very 
interesting time stamp
Local storage capture time  10:48:35.165507

  Tasks reset dispatchable04/09/2008 15:05:39.356416 
  Tasks were nondispatchable  00:00:27.151414

Exit address04353880   
  Home ASID   0005 DUMPSRV  
  Exit time   00:00:20.810908
  Exit attributes:  Global, Sdump, SYSMDUMP

I've got three questions here:
1. Why is there this interesting time stamp that says the dumps will be 
finished in 2042?
2. Global capture phase was a mere 5 seconds, why did it take 24 seconds after 
global capture was finished for the system to become dispatchable again?
3. What the heck took DUMPSRV 20 seconds in the exit?

== FLAGS SET IN SDUSDATA: Dump all PSAs, current PSA, nucleus SQA, LSQA,
rgn-private area, LPA mod. for rgn, trace, CSA, SWA,summary dump
== FLAGS SET IN SDUFLAG2: SUBPLST, KEYLIST, LISTD
== FLAGS SET IN SDUCNTL1: SRB 
== FLAGS SET IN SDUTYP1: FAILRC 
== FLAGS SET IN SDUEXIT: GRSQ, MASTER trace, SMSX, XESDATA, IOS, RSM, OE 
== FLAGS SET IN SDUSDAT3: IO

The dump is 8929 trks big and was partial, MAXSPACE is 1500M, 6 logical cps, 
8.7G real.
partial dump reason codes:
During dump processing of local storage, the system issued a PURGEDQ because a 
hung address space was detected. This will result in the loss of some storage 
related to the address space.
During dump processing of a possibly hung address space, dump processing 
obtained only fixed storage for the address space

NDM runs in a discretionary SC, VTAM in SYSSTC.

Any idea what's going on? (I am hoping to get a faster answer/ideas what to 
change here than by opening an ETR with IBM, especially as this may be some 
sort of tuning problem, except for the 2042 time stamp.)

Thanks for reading, best regards, Barbara

-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Patrick Falcone

Hi Barbara,
   
  I might consider moving NDM up in importance, middle level, to get it some 
resources. I might also watch its behavior as NDM has been known to take 
resources that might affect other workloads. If that's the case you might 
consider resource capping it as well.
   
  
Barbara Nitz [EMAIL PROTECTED] wrote:
  ..and how to interpret them.

Yesterday connect:direct took another of those abend0c4 that Sterling always 
tells us 'they're all fixed'. They're all from ISTAICPT in SRB mode... And of 
course they always occur in production where there is extremely high load (both 
CPU and workload)

The problem was that it took a full minute between IST413I VTAM DUMPING FOR JOB 
NDM and IEA794I SVC DUMP HAS CAPTURED, with system-wide non-dispatchability due 
to Q=YES 28seconds. This causes TCPIP to get 'adjacency failures' and to drop 
lots of MQ channel connections, which has a major impact on customers connected 
to us. Which means a lot of management attention.

The dump statistics tell me this:
Total dump capture time 00:00:57.956068 
System nondispatchability start 04/09/2008 15:04:43.405987 
System set nondispatchable 04/09/2008 15:04:43.406106 
Global storage start 04/09/2008 15:04:43.053199 
Global storage end 04/09/2008 15:04:48.466431 
Global storage capture time 00:00:05.413231 
System reset dispatchable 04/09/2008 15:05:12.204912 
System was nondispatchable 00:00:28.798924

Asid 016A (NDM): 
Local storage start 04/09/2008 15:05:12.204988 
Local storage end 09/18/2042 01:53:47.370496 -- very interesting time stamp
Local storage capture time 10:48:35.165507 

Tasks reset dispatchable 04/09/2008 15:05:39.356416 
Tasks were nondispatchable 00:00:27.151414

Exit address 04353880 
Home ASID 0005 DUMPSRV 
Exit time 00:00:20.810908 
Exit attributes: Global, Sdump, SYSMDUMP

I've got three questions here:
1. Why is there this interesting time stamp that says the dumps will be 
finished in 2042?
2. Global capture phase was a mere 5 seconds, why did it take 24 seconds after 
global capture was finished for the system to become dispatchable again?
3. What the heck took DUMPSRV 20 seconds in the exit?

== FLAGS SET IN SDUSDATA: Dump all PSAs, current PSA, nucleus SQA, LSQA,
rgn-private area, LPA mod. for rgn, trace, CSA, SWA,summary dump
== FLAGS SET IN SDUFLAG2: SUBPLST, KEYLIST, LISTD
== FLAGS SET IN SDUCNTL1: SRB 
== FLAGS SET IN SDUTYP1: FAILRC 
== FLAGS SET IN SDUEXIT: GRSQ, MASTER trace, SMSX, XESDATA, IOS, RSM, OE 
== FLAGS SET IN SDUSDAT3: IO

The dump is 8929 trks big and was partial, MAXSPACE is 1500M, 6 logical cps, 
8.7G real.
partial dump reason codes:
During dump processing of local storage, the system issued a PURGEDQ because a 
hung address space was detected. This will result in the loss of some storage 
related to the address space.
During dump processing of a possibly hung address space, dump processing 
obtained only fixed storage for the address space

NDM runs in a discretionary SC, VTAM in SYSSTC.

Any idea what's going on? (I am hoping to get a faster answer/ideas what to 
change here than by opening an ETR with IBM, especially as this may be some 
sort of tuning problem, except for the 2042 time stamp.)

Thanks for reading, best regards, Barbara

-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Barbara Nitz

Patrick,

thanks for the tip, but we have a definite reason to run it in discretionary: 
It takes too many resources cpu wise :-)

We do not want to use resource capping, because at certain times C:D is set to 
STCHIGH (we have two cutoff times a day, where transfers have to be in before a 
certain time, and when there are too many transfers coming in, we increase cpu, 
but only under managemant direction...)

In addition, my extra time is spent in the global non-disp phase (20seconds too 
many that I don't know why they are spent), and during that time nothing but 
dump capture runs on the lpar (that's the crux). So C:Ds dispatching priority 
wouldn't make a difference that I can see. 

Oh, and one more thing: Exit address 04353880 
Home ASID 0005 DUMPSRV 
Exit time 00:00:20.810908 
Exit attributes: Global, Sdump, SYSMDUMP
is ISGSDUMP. I just hope that *that* doesn't run during global capture 
phase It *is* listed later, and we go against explicit IBM recommendations 
here, which had helped us tremendously in the past.

But looking again at the time stamps, this may just be the additional 20 
seconds. 

Regards, Barbara
-- 
Pt! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Bob Rutledge


Barbara Nitz wrote:

...


  Local storage end   09/18/2042 01:53:47.370496  -- very 
interesting time stamp


...

1. Why is there this interesting time stamp that says the dumps will be 
finished in 2042?


Dunno why, but it's the highest possible TOD clock (all ones) at GMT+2.

Bob

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

SVCDUMP capture phase statistics.....

2008-04-10 Thread George Kozakos

Hi Barbara,
I think I can answer two of your questions

 1. Why is there this interesting time stamp that says the dumps will be   
 finished in 2042?  


The timestamp was X' '. This is 09/17/2042 23:53:47.370495
UTC.
Your system must be GMT+2.

 2. Global capture phase was a mere 5 seconds, why did it take 24 seconds  
 after global   
 capture was finished for the system to become dispatchable again?  


I think this is because the system stays non-dispatchable until the global
exits
complete.

Regards,
George Kozakos
z/OS Function Test/Level 3 Supervisor

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Tony Harminc

2008/4/10 Barbara Nitz [EMAIL PROTECTED]:

   Local storage end   09/18/2042 01:53:47.370496  -- very 
 interesting time stamp

  I've got three questions here:
  1. Why is there this interesting time stamp that says the dumps will be 
 finished in 2042?

A TOD value of all ones (X'') is 2042-09-17
23:53:47.370495, so I imagine this is that value converted to your
local time.

Sounds like it displayed a field that was initialized to the worst
possible estimate.

Tony H.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Jim Mulder

IBM Mainframe Discussion List IBM-MAIN@BAMA.UA.EDU wrote on 04/10/2008 
02:05:33 AM:

 ..and how to interpret them.
 
 Yesterday connect:direct took another of those abend0c4 that 
 Sterling always tells us 'they're all fixed'. They're all from 
 ISTAICPT in SRB mode... And of course they always occur in 
 production where there is extremely high load (both CPU and workload)
 
 The problem was that it took a full minute between IST413I VTAM 
 DUMPING FOR JOB NDM and IEA794I SVC DUMP HAS CAPTURED, with system-
 wide non-dispatchability due to Q=YES 28seconds. This causes TCPIP 
 to get 'adjacency failures' and to drop lots of MQ channel 
 connections, which has a major impact on customers connected to us. 
 Which means a lot of management attention.
 
 The dump statistics tell me this:
 Total dump capture time   00:00:57.956068 
 System nondispatchability start   04/09/2008 15:04:43.405987 
 System set nondispatchable04/09/2008 15:04:43.406106 
 Global storage start  04/09/2008 15:04:43.053199 
 Global storage end04/09/2008 15:04:48.466431 
 Global storage capture time   00:00:05.413231 
 System reset dispatchable 04/09/2008 15:05:12.204912 
 System was nondispatchable00:00:28.798924
 
 Asid 016A (NDM): 
   Local storage start 04/09/2008 15:05:12.204988 
   Local storage end   09/18/2042 01:53:47.370496  -- 
 Local storage capture time  10:48:35.165507 
 
   Tasks reset dispatchable04/09/2008 15:05:39.356416 
   Tasks were nondispatchable  00:00:27.151414
 
 Exit address04353880 
   Home ASID   0005 DUMPSRV 
   Exit time   00:00:20.810908 
   Exit attributes:  Global, Sdump, SYSMDUMP
 
 I've got three questions here:
 1. Why is there this interesting time stamp that says the dumps will
 be finished in 2042?

  That's just how a timestamp of all zeros gets converted by
BLSUXTOD.  I kind of hacked that IEAVTSFS formatter together in a
hurry one evening when I really wanted to look at some of the 
SDUMP statistics, but didn't want to decipher it by hand.
So I didn't think at the time to check for zeros before
converting it.  The more interesting question would be why
the local storage end timestamp apparently didn't get stored. 
This is probably because of the partial dump reason code that 
says only fixed storage was dumped for the address space.  This
happens when SDUMP detects that the dump task in the address 
space never got started (after 25 seconds), so it instead
dumps the fixed frames of the address space, accessing them
via their real storage addresses.  Apparently this processing 
doesn't set the Local Storage End timestamp (that can be corrected
in a future release).  So the question would be, why didn't
the NDM dump task get going for 25 seconds (that should be 25 
seconds after resetting system nondispatchability, I think).
Since NDM's LSQA should be in the dump, possibly 
SUMM FOR ASID(x'16A') might have some clues.  Is the dump task
still in a wait?  Is the ECB POSTed? 

 2. Global capture phase was a mere 5 seconds, why did it take 24 
 seconds after global capture was finished for the system to become 
 dispatchable again?

  System nondispatchability is not reset until the global 
exits complete.

 3. What the heck took DUMPSRV 20 seconds in the exit?

  You are probably running in GRS Star mode, possibly with the 
option that requests all data from all systems for SDATA=GRSQ. 
Does your GRSCNFxx specify GRSQ(ALL)? 

 
 == FLAGS SET IN SDUSDATA: Dump all PSAs, current PSA, nucleus SQA, 
LSQA,
 rgn-private area, LPA mod. for rgn, trace, CSA, SWA,summary dump
 == FLAGS SET IN SDUFLAG2: SUBPLST, KEYLIST, LISTD
 == FLAGS SET IN SDUCNTL1: SRB 
 == FLAGS SET IN SDUTYP1: FAILRC 
 == FLAGS SET IN SDUEXIT: GRSQ, MASTER trace, SMSX, XESDATA, IOS, RSM, 
OE 
 == FLAGS SET IN SDUSDAT3: IO
 
 The dump is 8929 trks big and was partial, MAXSPACE is 1500M, 6 
 logical cps, 8.7G real.
 partial dump reason codes:
 During dump processing of local storage, the system issued a PURGEDQ
 because a hung address space was detected. This will result in the 
 loss of some storage related to the address space.
 During dump processing of a possibly hung address space, dump 
 processing obtained only fixed storage for the address space
 
 NDM runs in a discretionary SC, VTAM in SYSSTC.
 
 Any idea what's going on? (I am hoping to get a faster answer/ideas 
 what to change here than by opening an ETR with IBM, especially as 
 this may be some sort of tuning problem, except for the 2042 time 
stamp.)


Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

2008-04-10 Thread Barbara Nitz

Everyone's spot on: We're GMT+2 (stupid DST in Europe :-) ), so that explains 
why we didn't go through a time warp. :-)

Now, why am I not surprised that this verbx (I just wrote verby again!)ieavtsfs 
is a Jim-Mulder-Special? :-) Thank goodness it is there, though, and if you, 
Jim, want to fix something in a later release, that's fine.

The blame lays squarely in my court, I guess. I wanted GRSQ(ALL) because we 
just had to merge two completely disparate sysplexes into one. They don't share 
anything except GRS Star, and I wanted to see any cross-system deadly embraces 
that may have been caused by our throwing together the RNLs of both sysplexes. 
I was quite unaware, though (or didn't read it properly because I didn't want 
to subconsciously), that GRS collection is a global exit. I would have sworn it 
is local.

I just tested with grsq set to contention and the capture time decreased 
drastically. So let my scars be a warning to you! :-) Guess I need to say 
goodbye to that debugging help for the sake of 20s gain in production. (That 
means we can go back to q=yes, though!)

And sorry that I just didn't put the timestamps into the post. Because 
system-wide non-disp was reported to be reset a few lines above, I assumed that 
everything below that line would be local, again. (Talk about making 
assumptions!) I realized later that the time stamp was actually inside the 
global non-disp window, and the exit address did the rest to make me feel 
guilty.

I just looked at the dump task in the NDM address space. It must have been 
posted (link is 00), but the tcbflags say that the task is being swapped out. 
(Actually, all tasks are being swapped out.) 
I would have said that this is due to NDM running in discretionary, but I am 
not sure. SRM says that the address space is on the IN q, the WEB still 
pointing back to that SSRB has a DP of x'00C0 8000', while ASCBDPH=x'00C1'. 
ASCBEWST is (using another Jim-Mulder-Special, LISTTOD) 04/09/2008 
15:04:43.107551, which is almost right at the beginning when the problem 
occured (15:04:43.048477).
-
There are a ton of global SRBs (that I didn't check, with 1.8 they're always 
there for a dynamic dump) and there is one local ssrb with a cpsw in ieavsrbs 
and RMTR=IEAVESPM. regs in the ssrb are all zero, and ptcb is the one that had 
the 0C4 in srb-to-task percolation for the original abend. Given my luck, the 
RT1W for that SRB is in use (my debugging teacher told me that I would never 
need to look at RT1Ws because they're never in use - I always get those that 
are) with a linkage stack entry for IEAVTSSD BAKR'ing to IEAVTSSM. That's 
summary dump processing, I think.

Any more words of wisdom on this? (well, other than 'stupidity on the part of 
the OP :-) ?

Best regards, Barbara
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

Re: SVCDUMP capture phase statistics.....

17 matches

Site Navigation

Mail list logo

Footer information