Ole,

It'd be great if parallel either used bash or emulated it in supporting <(...) 
and >(...) on all platforms.  Emulation might allow parallel to clean up named 
pipes created at end of run, which bash is shy about.  Start thinking of 
parallel as a shell of sorts.  I am guessing bash and ksh93 features can be 
dynamically linked into anything.  One thing that ksh did, but bash didn't, is 
make <(...) and >(...) a 'word', so it has implicit white pace around it; you 
cannot concatenate it into a sed script string after a 'w' command without 
passing it to a subroutine or the like to strip them.


$ echo X | sed 'w '>(sed 's/^/b/'>&2)'
s/^/a/' ; sleep 1
aX
bX
$


With <(...) and >(...) shell scripting, you can create a complex tree of 
pipeline-parallel processing with no temp files and minimal latency.  As the 
core counts blossom, we need such strategies to turn this resource into fast, 
low latency processing.

On nice UNIX's, on can say the input file is '/dev/stdin', so you can present 
the standard input as a file name without an extra 'cat'.  I recall fixing a 32 
bit app running under sh reading a file that grew > 4Gb by letting the shell 
(large file ready) open it with < and telling the app to read /dev/stdin.  I am 
not sure if parallel could emulate this somehow for the other O/S.  Ditto for 
stdout and stderr.  (This begs the question of managing pricise time annotated 
stderr and stdout logs that keep each parallel run separate, or log lines 
together.  Alas, too many apps think stdout is good for logging, while other 
treat it as a data stream and keep logging on stderr.  Occasionally I use 
stderr for data, on O/S and shells without better ways to have a second output 
stream.)

Of course, some apps do seeks, so you need to make a temp file to satisfy such 
apps.  The temp file could be auto-delete if already opened by shell or 
parallel, deleted and passed as /dev/fd/#.  I guess if you have no /dev/fd/ or 
the like on your OS, you need a more complex temp file deletion strategy.  Not 
all /tmp are cleaned periodically or by reboot.  Can someone write a 
buffered/recording pipe that accepts seeks (data on 64 bit heap or tmpfile())?

Best,

David

 

-----Original Message-----
From: parallel-request <parallel-requ...@gnu.org>
To: parallel <parallel@gnu.org>
Sent: Sun, Mar 23, 2014 12:00 pm
Subject: Parallel Digest, Vol 47, Issue 8


Send Parallel mailing list submissions to
        parallel@gnu.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.gnu.org/mailman/listinfo/parallel
or, via email, send a message with subject or body 'help' to
        parallel-requ...@gnu.org

You can reach the person managing the list at
        parallel-ow...@gnu.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Parallel digest..."


Today's Topics:

   1. --cat and --fifo (Ole Tange)
   2. Re: Recommendations for getting Parallel-like ::: behavior
      using Bash (Rhys Ulerich)


----------------------------------------------------------------------

Message: 1
Date: Sun, 23 Mar 2014 01:41:33 +0100
From: Ole Tange <ta...@gnu.org>
To: "parallel@gnu.org" <parallel@gnu.org>
Subject: --cat and --fifo
Message-ID:
        <CA+4vN7wve3XrsfdbDaPGwpuJU-D+0q6vqHAst1R6LQ=yjr0...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Sometimes I meet commands that cannot read from stdin, but only from a
file or a fifo. You may be lucky that you can do:

    parallel --pipe 'command <(cat)'

But you may have to wrap that kind of commands  to make them work with
'parallel --pipe':

    parallel --pipe 'cat > {#}; command {#}; _EXIT=$?; rm {#}; exit $_EXIT'
    parallel --pipe 'mkfifo {#}; (command {#}) & _PID=$!; cat > {#};
wait $_PID;  _EXIT=$?; rm {#}; exit $_EXIT'

Not really elegant and if the file {#} already exists, it will be over
written. So I have implemented --cat and --fifo:

    parallel --pipe --cat 'command {}'
    parallel --pipe --fifo 'command {}'

The do the same as above except the filename is a tmpfile, so the
chance for overwriting 0 if run locally, and close to 0 if run
remotely.

--cat and --fifo do not make sense without --pipe, and I am thinking
that I could probably autodetect if the command contains {} then it
means '--pipe --cat'. But that might be surprising to the user, that
including {} in the command will run slower (as the cat will first
save stdin to a tmpfile).

--cat and --fifo could also just imply --pipe.

What do you think? How would you like them to work? Do you have more
describing names?

Test --cat and --fifo by:

    git clone git://git.savannah.gnu.org/parallel.git


/Ole



------------------------------

Message: 2
Date: Sat, 22 Mar 2014 23:59:10 -0500
From: Rhys Ulerich <rhys.uler...@gmail.com>
To: GNU Parallel <parallel@gnu.org>
Subject: Re: Recommendations for getting Parallel-like ::: behavior
        using Bash
Message-ID:
        <CAKDqugTBdCFeU0e+kQ=RxQ=3uuy_d2k6nvx9w-4bb5h-1_d...@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> I like how GNU Parallel does its ::: magic ...
>
> Has anyone implemented something similar in pure bash?

My quick version of :::-like processing for bash looks like the following:

declare -a cmd
while [ $# -gt 0 ]; do
    [ "$1" = ":::" ] && break
    cmd+=("$1")
    shift
done
if [ "$1" = ":::" ]; then
    while shift && [ $# -gt 0 ]; do
        echo "${cmd[@]}" "$1"
    done
else
    while read line; do
        echo "${cmd[@]}" "$line"
    done
fi

This breaks on multiple ::: and totally ignores ::::.

An "...Only experts do this on purpose...." homage might go in that
final else clause.

- Rhys



------------------------------

_______________________________________________
Parallel mailing list
Parallel@gnu.org
https://lists.gnu.org/mailman/listinfo/parallel


End of Parallel Digest, Vol 47, Issue 8
***************************************

 

Reply via email to