Re: DFSORT statement sequence

2009-02-09 Thread Elardus Engelbrecht
Frank Yaeger wrote:
>It's false.  The INCLUDE statement is processed before the SORT statement 
regardless of the order in which you specify them.

Thanks very much.

>For a figure showing the order of processing of DFSORT control statements, 
see:

Thanks for supplying that figure. Who said a pic is worth 1000 words? ;)

>Note that OUTFIL INCLUDE is processed after SORT so that would be less 
efficient than using an INCLUDE statement which is processed before SORT.

Thanks about the note about OUTFIL INCLUDE.

It is really kind of you for helping out here.

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: DFSORT statement sequence

2009-02-09 Thread Frank Yaeger
Elardus Engelbrecht wrote on 02/09/2009 04:52:51 AM:

> Where is it documented that the sequence of sort control statement does
> play a role in performance? My searches on Google, IBM-MAIN and DFSORT
> couldn't produce something 'hard and fast'...
>
> My gut feeling is that
>
> INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')
> SORT FIELDS=(1,8,CH,A)
>
> is better over this
>
> SORT FIELDS=(1,8,CH,A)
> INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')
>
> for very LARGE input.
>
> Is this true or false? (If false, I can always do a COPY with
> INCLUDE and then
> do my SORT FIELDS)

It's false.  The INCLUDE statement is processed before the SORT
statement regardless of the order in which you specify them.

For a figure showing the order of processing of DFSORT control
statements, see:

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/ICE1CA30/FIGSTMTSEQ?SHELF=&DT=20080528171007&CASE=&ScrollTOP=FIGSTMTSEQ#FIGSTMTSEQ

Note that OUTFIL INCLUDE is processed after SORT so that would be
less efficient than using an INCLUDE statement which is processed
before SORT.

Frank Yaeger - DFSORT Development Team (IBM) - yae...@us.ibm.com
Specialties: FINDREP, WHEN=GROUP, DATASORT, ICETOOL, Symbols, Migration

 => DFSORT/MVS is on the Web at http://www.ibm.com/storage/dfsort/
--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: DFSORT statement sequence

2009-02-09 Thread Elardus Engelbrecht
David Betten wrote:

>Correct.  As each record is processed, the INCLUDE criteria is evaluated. If a 
the record does not meet the INCLUDE criteria it is discarded and not included 
in the sort.

Great! Thanks for your excellent answer. That answered my question!

>This is one of those "it depends" types of situations.  If a large percentage 
of the records are likely to be included, I think you are fine with just doing 
1 
sort with the include statements.  If a small percentage are likely to be 
included, you may be better off doing the copy with INCLUDE followed by the 
sort.

Ok. I'll sort out my DFSORT jobs and see what I get.

>The reason I say this is that when sorting with INCLUDE, DFSORT has no way
of knowing how many records will match the criteria. Therefore, we need to 
allocate resources (main storage, work space, etc.) under the assumption 
that all records will be included. So you may have some wasted resources if 
only a small percentage of the records are actually included.

Interesting you mentioned wasted resources. I will look into this too.

Thanks for your help. I appreciate it much! 

Back to my table to sort things out.  (pun intended... ;D )

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: DFSORT statement sequence

2009-02-09 Thread David Betten
> Thanks for your kind answer. Does that means that AFTER reading all the
> matching input records, that subset is THEN sorted?

Correct.  As each record is processed, the INCLUDE criteria is evaluated.
If a the record does not meet the INCLUDE criteria it is discarded and not
included in the sort.


> The reason is I have to select a wide variety of different inputs
toproduce a
> subset and THEN sort that lot for reporting. My goal: to have as fewbytes
to
> sort after selecting what I need with one pass of the input dataset.
>
> I wonder, if I do all my COPY and using INCLUDE statements and then do
this
> two stage method:
>
> COPYFROM(INPUT) TO(TEMP1) USING()
> COPYFROM(TEMP1) TO(TEMP2) USING()
>
> Will this work better for large input datasets?


This is one of those "it depends" types of situations.  If a large
percentage of the records are likely to be included, I think you are fine
with just doing 1 sort with the include statements.  If a small percentage
are likely to be included, you may be better off doing the copy with
INCLUDE followed by the sort.

The reason I say this is that when sorting with INCLUDE, DFSORT has no way
of knowing how many records will match the criteria.  Therefore, we need to
allocate resources (main storage, work space, etc.) under the assumption
that all records will be included.  So you may have some wasted resources
if only a small percentage of the records are actually included.
--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: DFSORT statement sequence

2009-02-09 Thread Elardus Engelbrecht
David Betten wrote:

>It is FALSE.  All of the control statements are read at initialization and the 
include fields are evaluated during the input phase of the sort to select only 
the records that meet the criteria.

Thanks for your kind answer. Does that means that AFTER reading all the 
matching input records, that subset is THEN sorted?

The reason is I have to select a wide variety of different inputs to produce a 
subset and THEN sort that lot for reporting. My goal: to have as few bytes to 
sort after selecting what I need with one pass of the input dataset.

I wonder, if I do all my COPY and using INCLUDE statements and then do this 
two stage method:

COPYFROM(INPUT) TO(TEMP1) USING() 
COPYFROM(TEMP1) TO(TEMP2) USING() 

Will this work better for large input datasets? 

>Have a nice day,

The same to you! ;-D

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


Re: DFSORT statement sequence

2009-02-09 Thread David Betten
It is FALSE.  All of the control statements are read at initialization and
the include fields are evaluated during the input phase of the sort to
select only the records that meet the criteria.

Have a nice day,
Dave Betten
DFSORT Development, Performance Lead
IBM Corporation
email:  bet...@us.ibm.com
DFSORT/MVSontheweb at http://www.ibm.com/storage/dfsort/

IBM Mainframe Discussion List  wrote on 02/09/2009
07:52:51 AM:

> Good day
>
> Where is it documented that the sequence of sort control statement does
> play a role in performance? My searches on Google, IBM-MAIN and DFSORT
> couldn't produce something 'hard and fast'...
>
> My gut feeling is that
>
> INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')
> SORT FIELDS=(1,8,CH,A)
>
> is better over this
>
> SORT FIELDS=(1,8,CH,A)
> INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')
>
> for very LARGE input.
>
> Is this true or false? (If false, I can always do a COPY with
> INCLUDE and then
> do my SORT FIELDS)
>
> Thank you in advance.
>
> Groete / Greetings
> Elardus Engelbrecht
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
> Search the archives at http://bama.ua.edu/archives/ibm-main.html

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html


DFSORT statement sequence

2009-02-09 Thread Elardus Engelbrecht
Good day

Where is it documented that the sequence of sort control statement does 
play a role in performance? My searches on Google, IBM-MAIN and DFSORT 
couldn't produce something 'hard and fast'...

My gut feeling is that 

INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')
SORT FIELDS=(1,8,CH,A)

is better over this

SORT FIELDS=(1,8,CH,A)
INCLUDE COND=(1,8,CH,EQ,C'ABCDEFGH')

for very LARGE input.

Is this true or false? (If false, I can always do a COPY with INCLUDE and then 
do my SORT FIELDS)

Thank you in advance.

Groete / Greetings
Elardus Engelbrecht

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html