Re: DFSORT statement sequence
Frank Yaeger wrote: >It's false. The INCLUDE statement is processed before the SORT statement regardless of the order in which you specify them. Thanks very much. >For a figure showing the order of processing of DFSORT control statements, see: Thanks for supplying that figure. Who said a pic is worth 1000 words? ;) >Note that OUTFIL INCLUDE is processed after SORT so that would be less efficient than using an INCLUDE statement which is processed before SORT. Thanks about the note about OUTFIL INCLUDE. It is really kind of you for helping out here. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: DFSORT statement sequence
Elardus Engelbrecht wrote on 02/09/2009 04:52:51 AM: > Where is it documented that the sequence of sort control statement does > play a role in performance? My searches on Google, IBM-MAIN and DFSORT > couldn't produce something 'hard and fast'... > > My gut feeling is that > > INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') > SORT FIELDS=(1,8,CH,A) > > is better over this > > SORT FIELDS=(1,8,CH,A) > INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') > > for very LARGE input. > > Is this true or false? (If false, I can always do a COPY with > INCLUDE and then > do my SORT FIELDS) It's false. The INCLUDE statement is processed before the SORT statement regardless of the order in which you specify them. For a figure showing the order of processing of DFSORT control statements, see: http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA30/FIGSTMTSEQ?SHELF=&DT=20080528171007&CASE=&ScrollTOP=FIGSTMTSEQ#FIGSTMTSEQ Note that OUTFIL INCLUDE is processed after SORT so that would be less efficient than using an INCLUDE statement which is processed before SORT. Frank Yaeger - DFSORT Development Team (IBM) - yae...@us.ibm.com Specialties: FINDREP, WHEN=GROUP, DATASORT, ICETOOL, Symbols, Migration => DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort/ -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: DFSORT statement sequence
David Betten wrote: >Correct. As each record is processed, the INCLUDE criteria is evaluated. If a the record does not meet the INCLUDE criteria it is discarded and not included in the sort. Great! Thanks for your excellent answer. That answered my question! >This is one of those "it depends" types of situations. If a large percentage of the records are likely to be included, I think you are fine with just doing 1 sort with the include statements. If a small percentage are likely to be included, you may be better off doing the copy with INCLUDE followed by the sort. Ok. I'll sort out my DFSORT jobs and see what I get. >The reason I say this is that when sorting with INCLUDE, DFSORT has no way of knowing how many records will match the criteria. Therefore, we need to allocate resources (main storage, work space, etc.) under the assumption that all records will be included. So you may have some wasted resources if only a small percentage of the records are actually included. Interesting you mentioned wasted resources. I will look into this too. Thanks for your help. I appreciate it much! Back to my table to sort things out. (pun intended... ;D ) Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: DFSORT statement sequence
> Thanks for your kind answer. Does that means that AFTER reading all the > matching input records, that subset is THEN sorted? Correct. As each record is processed, the INCLUDE criteria is evaluated. If a the record does not meet the INCLUDE criteria it is discarded and not included in the sort. > The reason is I have to select a wide variety of different inputs toproduce a > subset and THEN sort that lot for reporting. My goal: to have as fewbytes to > sort after selecting what I need with one pass of the input dataset. > > I wonder, if I do all my COPY and using INCLUDE statements and then do this > two stage method: > > COPYFROM(INPUT) TO(TEMP1) USING() > COPYFROM(TEMP1) TO(TEMP2) USING() > > Will this work better for large input datasets? This is one of those "it depends" types of situations. If a large percentage of the records are likely to be included, I think you are fine with just doing 1 sort with the include statements. If a small percentage are likely to be included, you may be better off doing the copy with INCLUDE followed by the sort. The reason I say this is that when sorting with INCLUDE, DFSORT has no way of knowing how many records will match the criteria. Therefore, we need to allocate resources (main storage, work space, etc.) under the assumption that all records will be included. So you may have some wasted resources if only a small percentage of the records are actually included. -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: DFSORT statement sequence
David Betten wrote: >It is FALSE. All of the control statements are read at initialization and the include fields are evaluated during the input phase of the sort to select only the records that meet the criteria. Thanks for your kind answer. Does that means that AFTER reading all the matching input records, that subset is THEN sorted? The reason is I have to select a wide variety of different inputs to produce a subset and THEN sort that lot for reporting. My goal: to have as few bytes to sort after selecting what I need with one pass of the input dataset. I wonder, if I do all my COPY and using INCLUDE statements and then do this two stage method: COPYFROM(INPUT) TO(TEMP1) USING() COPYFROM(TEMP1) TO(TEMP2) USING() Will this work better for large input datasets? >Have a nice day, The same to you! ;-D Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
Re: DFSORT statement sequence
It is FALSE. All of the control statements are read at initialization and the include fields are evaluated during the input phase of the sort to select only the records that meet the criteria. Have a nice day, Dave Betten DFSORT Development, Performance Lead IBM Corporation email: bet...@us.ibm.com DFSORT/MVSontheweb at http://www.ibm.com/storage/dfsort/ IBM Mainframe Discussion List wrote on 02/09/2009 07:52:51 AM: > Good day > > Where is it documented that the sequence of sort control statement does > play a role in performance? My searches on Google, IBM-MAIN and DFSORT > couldn't produce something 'hard and fast'... > > My gut feeling is that > > INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') > SORT FIELDS=(1,8,CH,A) > > is better over this > > SORT FIELDS=(1,8,CH,A) > INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') > > for very LARGE input. > > Is this true or false? (If false, I can always do a COPY with > INCLUDE and then > do my SORT FIELDS) > > Thank you in advance. > > Groete / Greetings > Elardus Engelbrecht > > -- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO > Search the archives at http://bama.ua.edu/archives/ibm-main.html -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html
DFSORT statement sequence
Good day Where is it documented that the sequence of sort control statement does play a role in performance? My searches on Google, IBM-MAIN and DFSORT couldn't produce something 'hard and fast'... My gut feeling is that INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') SORT FIELDS=(1,8,CH,A) is better over this SORT FIELDS=(1,8,CH,A) INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH') for very LARGE input. Is this true or false? (If false, I can always do a COPY with INCLUDE and then do my SORT FIELDS) Thank you in advance. Groete / Greetings Elardus Engelbrecht -- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html