Re: PIPElines vs. *nix

2018-06-13 Thread Paul Gilmartin
On Wed, 13 Jun 2018 19:00:19 -0400, Hobart Spitz wrote:

>Gil, I know you mean well.
>
>Your first two points are addressed by one answer:  The goal in the test is
>to compare the efficiency of record-oriented piping versus
>character-by-character piping.
> 
Sorry; I missed that emphasis.  COUNT WORDS is probably a fairer comparison
to "wc", since both must inspect every character.

>In the first case, bypassing the *nix piping mechanism defeats the goal of
>the test.
>
But you could amplify the sensitivity with:
cat m170file.data | cat | cat | ... | cat >/dev/null

Or Pipe into HOLE to eliminate the overhead of COUNT or wc.
Also see whether the "-u" option for "cat" makes a difference.
Is there a Pipelines stage comparably neutral to "cat" you could use?

>In the second case, cat can't finish until wc starts reading the last 4k
>(or so) characters at the very end of the data.  With 170 M of data, that
>means that the timing would be off by very much less than 0.01%, which
>greatly exceeds normal variation.  
>
"exceeds"?  "is exceeded by"
I believe the z/OS UNIX kernel buffers are 131k, still an inconsequential
fraction of 170 M.

>I've heard of PIPElines with 10K-15K (mostly generated) stages.  Depending
>on the topology, such a thing would be impossibly slow with
>character-by-character piping.  And that's assuming you could even do
>deterministic multi-stream piping the way PIPElines does.
>
Deterministing multi-stream is valuable, and upstream propagation of
termination is precious.  Multi-streams are possible in C; impractical
in shell.

>Think of the difference between *nix piping and PIPElines piping this way:
>Your have a dinner party with a dozen guests.
>
>Under *nix piping, you pass each and every string bean, corn kernel, and
>pea to the plate of the person next to you, one-at-a-time, and only one
>vegetable can move at a time.  If you even finished such a dinner, you
>would never get anyone to show up at another.  And we haven't even
>considered the cache-flooding implications.
>
>Under PIPElines piping, whole plates of vegetables get passed around in the
>familiar way.  The serving utensil the "cache loader".  Vegetables that you
>don't want just stay in the serving bowl (slow memory or disk).
> 
I don't think that's fair.  I'd expect a well-designed "cat" to issue a 
nonblocking
read() into an application buffer, which may be smaller than the kernel buffer,
and kernel to move the data with MVCL, not an IC; STC loop.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: PIPElines vs. *nix

2018-06-13 Thread Hobart Spitz
Gil, I know you mean well.

Your first two points are addressed by one answer:  The goal in the test is
to compare the efficiency of record-oriented piping versus
character-by-character piping.

In the first case, bypassing the *nix piping mechanism defeats the goal of
the test.

In the second case, cat can't finish until wc starts reading the last 4k
(or so) characters at the very end of the data.  With 170 M of data, that
means that the timing would be off by very much less than 0.01%, which
greatly exceeds normal variation.  Further, the timings were reported to me
in more detail than I believe time reports.  Therefore, it was probably not
the time output that gave 75 secs.  I just choose to use the only total
time that was reported to me.

Finally, I got confused and assumed that the original PIPElines case was
abbreviated or there was a typo.  I incorrectly added the $ cms and
quotes.  That is entirely on me.  It should have just read:

PIPE < /home/../m170file.data | count bytes | cons


I've heard of PIPElines with 10K-15K (mostly generated) stages.  Depending
on the topology, such a thing would be impossibly slow with
character-by-character piping.  And that's assuming you could even do
deterministic multi-stream piping the way PIPElines does.

Think of the difference between *nix piping and PIPElines piping this way:
Your have a dinner party with a dozen guests.

Under *nix piping, you pass each and every string bean, corn kernel, and
pea to the plate of the person next to you, one-at-a-time, and only one
vegetable can move at a time.  If you even finished such a dinner, you
would never get anyone to show up at another.  And we haven't even
considered the cache-flooding implications.

Under PIPElines piping, whole plates of vegetables get passed around in the
familiar way.  The serving utensil the "cache loader".  Vegetables that you
don't want just stay in the serving bowl (slow memory or disk).

Bon appetit.  :-)

I hope this helps.

OREXXMan
JCL is the buggy whip of 21st century computing.  Stabilize it.
Put Pipelines in the z/OS base.  Would you rather process data one
character at a time (Unix/C style), or one record at a time?
IBM has been looking for an HLL for program products; REXX is that language.

On Wed, Jun 13, 2018 at 10:07 AM, Paul Gilmartin <
000433f07816-dmarc-requ...@listserv.ua.edu> wrote:

> On 2018-06-13, at 07:18:40, Hobart Spitz wrote:
>
> > Cross posted to CMSTSO Pipelines and IBM-MAIN
> >
> > Someone shared with me a performance comparison between Pipelines vs.
> > native *nix commands, both on OPENVM.
> >
> > Under the OPENVM shell, this command ran 75 secs. with a 170M file in
> BFS:
> >
> > $ time cat m170file.data | wc -b
> >
> The "cat" is superfluous.  Why not just:
> > $ time   wc -b  m170file.data
>
> In fact, you were timing "cat", not "wc".
>
> > This command, also under OPENVM shell with the same file, ran 9 secs.:
> >
> > $ cms 'PIPE < /home/../m170file.data | count bytes | cons'
> >
> Won't Pipelines respect a path relative to current working directory?
> If not, shame on Pipelines.
>
> -- gil
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: PIPElines vs. *nix

2018-06-13 Thread Paul Gilmartin
On 2018-06-13, at 07:18:40, Hobart Spitz wrote:

> Cross posted to CMSTSO Pipelines and IBM-MAIN
> 
> Someone shared with me a performance comparison between Pipelines vs.
> native *nix commands, both on OPENVM.
> 
> Under the OPENVM shell, this command ran 75 secs. with a 170M file in BFS:
> 
> $ time cat m170file.data | wc -b
>  
The "cat" is superfluous.  Why not just:
> $ time   wc -b  m170file.data

In fact, you were timing "cat", not "wc".

> This command, also under OPENVM shell with the same file, ran 9 secs.:
> 
> $ cms 'PIPE < /home/../m170file.data | count bytes | cons'
>  
Won't Pipelines respect a path relative to current working directory?
If not, shame on Pipelines.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


PIPElines vs. *nix

2018-06-13 Thread Hobart Spitz
Cross posted to CMSTSO Pipelines and IBM-MAIN

Someone shared with me a performance comparison between Pipelines vs.
native *nix commands, both on OPENVM.

Under the OPENVM shell, this command ran 75 secs. with a 170M file in BFS:

$ time cat m170file.data | wc -b


This command, also under OPENVM shell with the same file, ran 9 secs.:

$ cms 'PIPE < /home/../m170file.data | count bytes | cons'

Unfortunately, the person who sent this to me wasn't in a position to spent
any more time or resources on this, so I invite any one inclined to run a
similar comparison and post the results.

You may need something like this to avoid an abend depending on your system:

$ cms 'pipe filedesc 0 | count bytes | cons' < m170.data


Under OPENMVS, e.g., try something like:

$ tso 'PIPE < /home/../m170file.data | count bytes | cons

(Caution, I have not used OPENMVS/USS, so the syntax could be wrong.)

MVSers who don't have PIPElines, and VMers who want to, can try comparing
*nix equivalents to REXX using LINEIN() if you have it, or EXECIO * if you
don't.  This will tell us if Pipelines' design is a bigger contributor to
efficiency, or if it is the superiority of record-orientation vs.
character-orientation.

I recommend using RITA to get stage level statistics.  I suspect that
scanning for CR/LF is more expensive than counting bytes in PIPElines,
while the cost might be similar in *nix.

Some variations you may want to try:

   - Count lines and/or words.
   - Try different mainframe *nix version.
   - Add more filters.
   - Add filters that drop and/or add records.
   - Add some filters that change records.
   - Use file(s) already in the CMS/MVS file systems.  The