On Sun, Jul 23, 2017 at 2:28 PM, Andrew Morgan <[email protected]> wrote:
> On 07/20/2017 08:27 AM, Jean-Philippe Ouellet wrote:
>> On Thu, Jul 20, 2017 at 11:22 AM, Jean-Philippe Ouellet 
>> <[email protected]> wrote:
>>> On Thu, Jul 20, 2017 at 1:42 AM, Andrew Morgan 
>>> <[email protected]> wrote:
>>>> Also did a test with moving in an enormous folder, daemon took up 16%
>>>> CPU for a second in htop then right back to 0%, so seems pretty well
>>>> optimized for now. inotify finds all the files and folders in way until
>>>> a few hundred milli-seconds, so we may need to scale our period for
>>>> calling qvm-file-trust with a list of files down a bit (unless python
>>>> can take in 10K+ full filepaths as arguments).
>>>
>>> During exec(2), the kernel places arguments somewhere at the top of
>>> the stack, along with your environment variables and some other stuff.
>>> Thus, the limit is likely actually some number of total bytes (also
>>> dependent on other things like the total size of your current
>>> environment), rather than the limit being only a fixed number of
>>> arguments. This means you would have to check not just the number of
>>> arguments, but the sum of the lengths of each.
>>>
>>> If you find yourself running into problems with to much data in argv
>>> for a single exec, you may wish to consider letting xargs handle
>>> splitting the paths into an appropriate number of separate execs of
>>> your python script. This is one of the reasons it exists. If you do
>>> this, be sure to split the paths with '\0' and use xargs -0.
>>>
>>> Consider this example:
>>> $ cat argc.c
>>> #include <stdio.h>
>>> int main(int argc) { printf("%d\n", argc); }
>>>
>>> $ make argc
>>> cc     argc.c   -o argc
>>>
>>> $ yes AAAA | head -$((1024*100)) | xargs ./argc
>>> 26214
>>> 26214
>>> 26214
>>> 23762
>>>
>>> $ yes AAAAAAAAAAAA | head -$((1024*100)) | xargs ./argc
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 10082
>>> 1591
>>>
>>> You may also wish to set an artificially small max length
>>
>> Either with xargs -s, or in your own script if you don't use xargs.
>> The same concern exists either way.
>>
>> ISTM that being extra cautious at the expense of a few extra execs is
>> a good trade-off. If performance really mattered you wouldn't be
>> execing in the first place.
>>
>>> to guard
>>> against any potential edge cases which xargs itself may have or may
>>> develop in the future which may cause final arguments to get dropped
>>> or truncated, as such bugs may be unlikely to be found and may have
>>> very bad consequences (files not being marked as untrusted).
>>>
>>> Cheers,
>>> Jean-Philippe
>>
>
> So the exec* family of C functions separates char pointers by spaces,

Err... not sure what you mean by that. Perhaps you are confusing
exec's behavior with echo's?

> and it doesn't seem to be configurable, thus I may have to keep the
> space separation but escape spaces in the argument list.
>
> user@dev$ echo "hello there" this is a test for many words and xargs in
> one go | xargs -s 24 ./argc
> 5
> 5
> 4
> 3
> 2
> user@dev$ echo "hello\ there" this is a test for many words and xargs in
> one go | xargs -s 24 ./argc
> 4
> 5
> 4
> 3
> 2
>
> I'll note it _only_ works if there is a preceding backslash and the
> words are surrounded by double-quotes.

That's why I suggested using xargs -0 and splitting filenames with '\0'.

Consider the following:

$ cat dumpargs.c
#include <stdio.h>

int
main(int argc, char *argv[])
{
    int i;
    printf("%d args:\n", argc);
    for (i = 0; i < argc; i++)
        printf("\targv[%d]: %s\n", i, argv[i]);
    return 0;
}

$ make dumpargs
cc     dumpargs.c   -o dumpargs

$ split0() { for x in "$@"; do printf "%s\0" "$x"; done }

$ split0 "hello there" this is a test for many words in xargs in one
go | hexdump -C
00000000  68 65 6c 6c 6f 20 74 68  65 72 65 00 74 68 69 73  |hello there.this|
00000010  00 69 73 00 61 00 74 65  73 74 00 66 6f 72 00 6d  |.is.a.test.for.m|
00000020  61 6e 79 00 77 6f 72 64  73 00 69 6e 00 78 61 72  |any.words.in.xar|
00000030  67 73 00 69 6e 00 6f 6e  65 00 67 6f 00           |gs.in.one.go.|
0000003d

$ split0 "hello there" this is a test for many words in xargs in one
go | xargs -0 -s24 ./dumpargs
2 args:
    argv[0]: ./dumpargs
    argv[1]: hello there
4 args:
    argv[0]: ./dumpargs
    argv[1]: this
    argv[2]: is
    argv[3]: a
3 args:
    argv[0]: ./dumpargs
    argv[1]: test
    argv[2]: for
3 args:
    argv[0]: ./dumpargs
    argv[1]: many
    argv[2]: words
4 args:
    argv[0]: ./dumpargs
    argv[1]: in
    argv[2]: xargs
    argv[3]: in
3 args:
    argv[0]: ./dumpargs
    argv[1]: one
    argv[2]: go

Trying to guarantee different parsers escape things & interpret
escapes the same way is a battle best avoided entirely.

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/CABQWM_CP%2BgE5ZrZ0BkUpGQZVHv3L44U5Sin0eLJQ8r92bBgA2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to