Re: Proposal: Checked subprocess calls

2016-11-06 Thread Chris Angelico
On Mon, Nov 7, 2016 at 7:15 AM, Peter Bortas  wrote:
> Chris: What you have seem generally useful, but it lies in a namespace
> that will get a bit busy if we implement all the special cases as we
> think of them. I have similar function not checked in that would
> confuse users if we both committed. No in the least because I find
> exceptions useless for most smaller scripts unless I just plan on not
> catching anything. Which means my scripts are full of very similar
> code but where the functions return 0 on failure, not throwing, but
> dumping the failure pretty-printed to the console. Something I've also
> been planning to Process for a while, but not come up with a set of
> functions that doesn't make it confusing for users to choose among all
> the stuff.

Since this is of interest, I've rebased the branch onto current 8.1,
so we have a clean starting point for discussion.

> As I see it there are a few things that should happen in regards to
> external process spawning:
>
> 1. Can we come up with an almost as easy to use API as Process.run
> that plays better with memory and latency?

My first thought along those lines is to have something that returns a
pipe or buffer for stdout. Maybe a specially-enhanced one that also
deals with the return value?

Ideally, it should have a single return value that can be used
directly for the most obvious usage, which is receiving stdout.
Receiving stderr and/or the return value would ideally be possible,
but if that isn't possible, I don't mind a convenience function that
you have to set aside when you want more control.

More of a brainstorm or stream-of-consciousness than an actual theory,
but that's my thinking.

ChrisA


Re: Proposal: Checked subprocess calls

2016-11-06 Thread Peter Bortas
We discussed this a bit during the Pike Conferance. These are my thoughts on it:

On Mon, Oct 17, 2016 at 10:07 PM, Stephen R. van den Berg  wrote:
> Chris Angelico wrote:
>>> // This leaves stdin and stdout and stderr unaltered
>>> Process.pipe.run("fgrep -e test").run("sort").run("wc");
>
>>If Pike were a shell language, this would make sense. But I would much
>
> It would make sense, for any programming language, not only for
> shell languages.

The pipe-syntax seems interesting, though it probably has some corner
cases where you could create objects where several parts of the pipe
tries to run at the same time.

>>prefer this:
>
>>sizeof(Process.check_output("fgreb -e test") / "\n");

That's one of those places where we would benefit from an easier to
use interface for Process.Process. Process.run sucks up memory for
problems that doesn't really need it. It should preferably be streamed
and/or auto-iterated in more of the manner of:

count( "\n", Process.check_output("fgreb -e test") );

where this in effect becomes a line or character iterator and only
keeps at most one line in memory at the time.

>>Pike isn't primarily about invoking subprocesses; it has a rich set of
>>text processing primitives built-in, so trying to make subprocess
>>chaining smoother is usually a waste of effort.
>
> If it would be a *lot* of effort, I'd agree.  But I'm guessing that
> it actually is easier to implement than you might expect; and then it
> makes for a very clutterfree and straightforward way to start one
> or more (piped) processes.

The pipe-syntax and Chris's easy to understand convenience function
seem orthogonal with Chris's API easier to grasp. So I'd be glad to
see versions of both eventually go in. I'm not overly enthused with
continuing the tradition I introduced with Process.run to just buffer
everything in memory if it can be avoided though.

Chris: What you have seem generally useful, but it lies in a namespace
that will get a bit busy if we implement all the special cases as we
think of them. I have similar function not checked in that would
confuse users if we both committed. No in the least because I find
exceptions useless for most smaller scripts unless I just plan on not
catching anything. Which means my scripts are full of very similar
code but where the functions return 0 on failure, not throwing, but
dumping the failure pretty-printed to the console. Something I've also
been planning to Process for a while, but not come up with a set of
functions that doesn't make it confusing for users to choose among all
the stuff.

As I see it there are a few things that should happen in regards to
external process spawning:

1. Can we come up with an almost as easy to use API as Process.run
that plays better with memory and latency?

The stupidest version of that would be to have the call where you
specify the number of bytes or lines to read and then hands back the
result together with a function to call for the next portion. That's
real stupid, but still preferable to playing with Process() directly
and risk a lock-up because you didn't handle your pipes perfectly.

2. Make more convenience-APIs

I count both Stephens pipes and Chris easy-call APIs to this.


(1) have to be thought about before we do (2), because depending on
the solution for (1) or we will end up with duplicate or triplicate
APIs for the same thing. I promise to pour some real energy into
thinking about this in a few weeks when I have time again, but
meanwhile please jump in with what everyone's ideas are about what
problems you are trying to solve and how you imagine it solved.

Regards,
-- 
Peter Bortas, NSC


Re: Proposal: Checked subprocess calls

2016-10-17 Thread Stephen R. van den Berg
Chris Angelico wrote:
>> // This leaves stdin and stdout and stderr unaltered
>> Process.pipe.run("fgrep -e test").run("sort").run("wc");

>If Pike were a shell language, this would make sense. But I would much

It would make sense, for any programming language, not only for
shell languages.

>prefer this:

>sizeof(Process.check_output("fgreb -e test") / "\n");

>Pike isn't primarily about invoking subprocesses; it has a rich set of
>text processing primitives built-in, so trying to make subprocess
>chaining smoother is usually a waste of effort.

If it would be a *lot* of effort, I'd agree.  But I'm guessing that
it actually is easier to implement than you might expect; and then it
makes for a very clutterfree and straightforward way to start one
or more (piped) processes.
-- 
Stephen.


Re: Proposal: Checked subprocess calls

2016-10-16 Thread Chris Angelico
On Sun, Oct 16, 2016 at 8:12 PM, Stephen R. van den Berg  wrote:
> Well, ok, fair enough.  But then, try to improve on the interface and
> preferably make it work like this:
> (unless there already is an easy Pike-API for this, I'm not intimately
> familiar with the Process-group)
>
> // This leaves stdin and stdout and stderr unaltered
> Process.pipe.run("fgrep -e test").run("sort").run("wc");
>

If Pike were a shell language, this would make sense. But I would much
prefer this:

sizeof(Process.check_output("fgreb -e test") / "\n");

Pike isn't primarily about invoking subprocesses; it has a rich set of
text processing primitives built-in, so trying to make subprocess
chaining smoother is usually a waste of effort.

ChrisA


Re: Proposal: Checked subprocess calls

2016-10-16 Thread Stephen R. van den Berg
Chris Angelico wrote:
>There's a strong convention that zero == success, nonzero == failure,
>so this would apply as-is to a lot of programs. Obviously this
>shouldn't be the one and only way to run a subprocess (this is NOT a
>proposed change to Process.run, it's a separate function), so if you
>want to invoke grep(1) and accept 0 (lines found) and 1 (no lines
>found) but not 2 (error), then you'd make an app-specific function;
>but 0 vs other is common enough that I think this fits the standard
>library.

>Example of prior art: Python's subprocess.check_call and check_output
>functions raise an exception on non-zero return value:

Well, ok, fair enough.  But then, try to improve on the interface and
preferably make it work like this:
(unless there already is an easy Pike-API for this, I'm not intimately
familiar with the Process-group)

// This leaves stdin and stdout and stderr unaltered
Process.pipe.run("fgrep -e test").run("sort").run("wc");

string s = "Your.input.text";

// This leaves stdout and stderr unaltered
Process.pipe.stdin(s).run("fgrep -e test").run("sort").run("wc");

Stdio.File in = Stdio.File("foo.bar.file");
Stdio.File out = Stdio.FakeFile();

// This leaves stderr unaltered
Process.pipe.stdin(in).run("fgrep -e test").run("sort")
  .run("wc").stdout(out);
string output = out->read();

Stdio.File in = Stdio.File("foo.bar.file");
Stdio.File out = Stdio.FakeFile();
Stdio.File devnull = Stdio.File("/dev/null","w");

// This leaves stderr unaltered, except for the "sort" run, we silence it there
Process.pipe.stdin(in).run("fgrep -e test").stderr(devnull).run("sort")
  .run("wc").stdout(out);
string output = out->read();

Stdio.File in = Stdio.File("foo.bar.file");
Stdio.File out = Stdio.FakeFile();
Stdio.File devnull = Stdio.File("/dev/null","w");
Stdio.File sortout = Stdio.FakeFile();

// This leaves stderr unaltered, except for the "sort" run, we silence it there
// The stdout of sort is copied to both the sortout file, and to the wc
// process (compare "man 1 tee")
Process.pipe.stdin(in).run("fgrep -e test").stderr(devnull).run("sort")
  .tee(sortout).run("wc").stdout(out);
string sortoutput = sortout->read();
string output = out->read();


In all this, yes, throw exceptions if any of the processes returns non-zero
exitcodes.
In essence the above interface would allow you to run arbitrarily complex
(shell-like) pipes, basically supporting everything bash does too.
Maybe the only thing missing here would be the ability to ignore certain
signals per remainder of the process train, e.g.:

Process.pipe.stdin(in).blocksignal(SIGHUP).run("fgrep -e test")
 .stderr(devnull).run("sort").tee(sortout)
 .unblocksignal(SIGHUP).run("wc").stdout(out);

Which would tie SIGHUP to SIG_IGN for fgrep and sort, and allow it through
again for wc.

As an extra convenience function, I could imagine this:

string output = Process.pipe.stdin(in).run("fgrep -e test").run("sort")
  .run("wc").stdoutstring;

Which would generate that output string you are after in one-go.
-- 
Stephen.


Re: Proposal: Checked subprocess calls

2016-10-15 Thread Chris Angelico
On Sat, Oct 15, 2016 at 8:12 PM, Stephen R. van den Berg  wrote:
> Chris Angelico wrote:
>>Two features added to the Process module. Firstly, a simple wrapper
>>Process.check_run that calls Process.run and throws an error if the
>>exit code isn't 0;
>
> This is a bit overkill, I'd say.  It's not generic enough to put in the
> lib.  What if you want to check for a certain range of exitcodes instead?
> If you want this, I'd say you inherit the class and add your own convenience
> functions to it, but at application level.

There's a strong convention that zero == success, nonzero == failure,
so this would apply as-is to a lot of programs. Obviously this
shouldn't be the one and only way to run a subprocess (this is NOT a
proposed change to Process.run, it's a separate function), so if you
want to invoke grep(1) and accept 0 (lines found) and 1 (no lines
found) but not 2 (error), then you'd make an app-specific function;
but 0 vs other is common enough that I think this fits the standard
library.

Example of prior art: Python's subprocess.check_call and check_output
functions raise an exception on non-zero return value:

https://docs.python.org/3/library/subprocess.html#subprocess.CalledProcessError

>> and secondly, a means for Process.run() to leave
>>stderr attached to the console. The intention is for this to be used
>
> That seems useful, so I'd welcome that.
> But then support it for stdout too.
> Then again, maybe this is better off generalised, into requiring/allowing
> you to specify things like:
>
> ({"stdout":Stdio.stdout, "stderr":Stdio.stderr})
>
> instead of the magic "-".
>
> This would support adding different objects too, in order to redirect
> directly into a file or some other pipe.

If you don't want to redirect either, or if you want to redirect them
both to files, use Process.Process or Process.create_process directly.
The advantage of Process.run is that it captures the output. I suppose
you might conceivably want to capture stderr but leave stdout attached
to the console, but it's a lot less common than "run program, give it
input, retrieve output, but if it displays an error, let that be
seen". I can't think of any use-cases for the converse. Process.run()
is great, but there are a lot of times when I have to either replicate
half of its code, or use run() and risk squashing an unexpected error.
With check_run, I'd be able to have virtually the same API, but with
the declaration that a program error is an exception.

Consider a simple way to get audio file information:

string info = Process.run(({"soxi", "audio_01.wav"}))->stdout;
//proceed to parse the given info, eg:
sscanf(info, "%*[\n]%{%s: %s\n%}", array lines);
mapping fields = (mapping)lines;

If soxi is not available, this will raise an immediate exception,
rather than charging on blindly; but if audio_01.wav isn't found,
there's no indication of the actual problem - you just don't get any
useful output. Using check_run causes an instant failure, saying that
the process exited 1; and keeping stderr attached to the console would
let the user see the message from soxi:

string info = Process.check_run(({"soxi", "audio_01.wav"}),
(["stderr": "-"]))->stdout;

In fact, this usage could itself be wrapped up another level, if
desired. I've tossed another commit onto the branch to add a
check_output function, but this one is less significant (it's just a
one-liner). With check_output, omitting the modifiers mapping gives
the natural and obvious behaviour of the above line of code.

ChrisA


Re: Proposal: Checked subprocess calls

2016-10-15 Thread Stephen R. van den Berg
Chris Angelico wrote:
>Two features added to the Process module. Firstly, a simple wrapper
>Process.check_run that calls Process.run and throws an error if the
>exit code isn't 0;

This is a bit overkill, I'd say.  It's not generic enough to put in the
lib.  What if you want to check for a certain range of exitcodes instead?
If you want this, I'd say you inherit the class and add your own convenience
functions to it, but at application level.

> and secondly, a means for Process.run() to leave
>stderr attached to the console. The intention is for this to be used

That seems useful, so I'd welcome that.
But then support it for stdout too.
Then again, maybe this is better off generalised, into requiring/allowing
you to specify things like:

({"stdout":Stdio.stdout, "stderr":Stdio.stderr})

instead of the magic "-".

This would support adding different objects too, in order to redirect
directly into a file or some other pipe.
-- 
Stephen.