Re: [tup] Execute :-rules in bash with `set -o pipefail`

Erik Fri, 11 Dec 2015 11:32:44 -0800

I probably use Tup a little differently than most people. Instead of using 
it to build software (which I do sometimes), I use it for repeated data 
analysis and graph generation. One my tup rules looks something like

: input.json |> < %f simulator | jq -f extract_something.jq > %o |> 
output.json

where the simulator produces a lot of output, but in this case I only care 
about one aspect. Without pipefail, the simulator might fail, jq will get 
empty input, which it's fine with, and the result is an empty output file. 
This then causes the next rule that uses output.json to fail, because it's 
empty, but the problem was the previous rule.

In general, I find it hard to believe that (especially with tup), there's 
ever a time when you wouldn't want pipefail to be active, but maybe I'm 
unique in that feeling. One interesting thing that I didn't realize is that 
sh is run with the -e flag, which means any failed command is a failure. 
E.g. currently in tup the rule

: |> false; echo foo |>

will fail, but the rule

: |> false | echo foo |>

will succeed.

Thanks for pointing me to the appropriate place in tup for where commands 
were executed. With it I was able to write an interposer that does what I 
want. Essentially it looks for execle calls that look like '/bin/sh -e -c 
cmd' and turns them into execve calls that look like '/bin/bash -e -o 
pipefail -c cmd'.

I like your suggestion about the ^p^ flag, although it seems a little 
strange that a flag would also change execution to bash. However, it feels 
like a reasonable way to accomplish this without requiring pipefail to be 
active for all :-rules. After looking at the code though, another option 
that seems easy would be to have a tup config option that specifies the 
command that executes all of :-rules. This may be somewhat frowned upon due 
to "changing any of these options should not affect the end result of a 
successful build" (*), but having the default:

updater.command = /bin/sh -e -c

be there, but have the ability to change it to

updater.command = /bin/bash -e -o pipefail -c

would be very convenient, and would provide a lot of flexibility with how 
tup is run. E.g. a user could just specify bash to get access to the more 
advanced bash syntax. The downside is that it has the potential to violate 
the rules (*) of a tup config as stated above. An potentially absurd 
example would be setting

update.command = /usr/bin/env python -c

which would execute all :-rules in python.

I'm curious about everyone's thoughts about this. I have my hacky solution, 
which means I'm fine for the time being, but I'd be up for submitting a 
pull request of a more reasonable implementation of one of these ideas if 
it seemed worthwhile.

Erik

On Tuesday, December 8, 2015 at 11:37:02 AM UTC-5, [email protected] wrote:
>
> On Mon, Dec 7, 2015 at 4:47 PM, Erik <[email protected] <javascript:>> 
> wrote:
>
> I generally want my tup :-rules to be run in bash with `set -o pipefail` 
>> enabled. I could add this to every single rule but that's verbose and error 
>> prone. Someone on stackoverflow recommended I try doing this with shared 
>> library interposition, which seemed like it might work. My hypothesis at 
>> the time was that the commands were executed using glibc's `system` call. 
>> That seems to be false, but spending some time perusing the codebase, I 
>> couldn't actually find where the commands were executed. It doesn't help 
>> that I'm not very familiar with fuse or pthread. This leaves me with a few 
>> questions:
>>
>>    - Is shared library interposition a viable root to accomplish this? 
>>    If the commands are executed almost directly by a call to some shared 
>>    library this seems like it could work, wouldn't be too hard, and wouldn't 
>>    be mandatory, or require me (or anyone else) to patch and recompile tup 
>>    source.
>>    - If it is viable can someone point me to the appropriate place to 
>>    look for where to do the intercept? In addition to reading relevant 
>> pieces 
>>    of the code (or so I thought), I also tried tracing command execution 
>> with 
>>    ltrace, but it seemed to interfere with fuse, and I couldn't manage to 
>> get 
>>    fuse running with ltrace active.
>>    - If it's not viable, is there a better method to accomplish this 
>>    that is relatively trivial (on the order of writing a simple interposer)?
>>    - If none of these, is there a potentially better way to improve tup 
>>    to allow this? I'm open to contributing a pull request if there's a 
>>    relevant piece to write. It seems like maybe with the lua api you could 
>>    have a function that arbitrarily modifies every command before it's 
>>    executed, but then you'd have to fetch back to the lua api from main tup 
>>    which seems like a pain and potentially slow. There could be a specific 
>>    flag to do specifically this (e.g. wrap every command in "bash -c 
>> '<command 
>>    with escaped apostrophes>'") but that seems a little hard coded and less 
>>    portable. I'm open to other suggestions.
>>
>> That sounds like it may be a good thing for tup to support. What kinds of 
> commands are you running that you'd want this set universally?
>
> The code is a little wonky to follow because you can't fork a process from 
> a process running fuse without race conditions. The sub-processes are 
> executed in master_fork.c using execle() - 
> https://github.com/gittup/tup/blob/master/src/tup/server/master_fork.c#L654
>
> Note that tup currently just runs '/bin/sh -e -c [cmd]', and it doesn't 
> look like the stock bourne shell supports pipefail as far as I can tell. So 
> I guess we'd need some way to set a flag or something in the rules, and 
> then run bash if using pipefail?
>
> Also, I'm not sure of the best way to communicate to tup that a certain 
> rule should use pipefail. Currently all of the information about a rule is 
> in the :-rule line itself, so we'd probably want to use a ^-flag or 
> something:
>
> : |> ^p^ cmd1 | cmd2 |>
>
> So it would run 'cmd1 | cmd2' with pipefail enabled. However, since you'd 
> have to add the ^p flag for each rule that uses it, I guess all that really 
> buys you is fewer keystrokes vs specifying bash directly.
>
> Would others find this useful?
>
> -Mike
>

-- 
-- 
tup-users mailing list
email: [email protected]
unsubscribe: [email protected]
options: http://groups.google.com/group/tup-users?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"tup-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [tup] Execute :-rules in bash with `set -o pipefail`

Reply via email to