-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sun, Jul 23, 2017 at 11:28:52AM -0700, Andrew Morgan wrote:
> On 07/20/2017 08:27 AM, Jean-Philippe Ouellet wrote:
> > On Thu, Jul 20, 2017 at 11:22 AM, Jean-Philippe Ouellet 
> > <[email protected]> wrote:
> >> On Thu, Jul 20, 2017 at 1:42 AM, Andrew Morgan 
> >> <[email protected]> wrote:
> >>> Also did a test with moving in an enormous folder, daemon took up 16%
> >>> CPU for a second in htop then right back to 0%, so seems pretty well
> >>> optimized for now. inotify finds all the files and folders in way until
> >>> a few hundred milli-seconds, so we may need to scale our period for
> >>> calling qvm-file-trust with a list of files down a bit (unless python
> >>> can take in 10K+ full filepaths as arguments).
> >>
> >> During exec(2), the kernel places arguments somewhere at the top of
> >> the stack, along with your environment variables and some other stuff.
> >> Thus, the limit is likely actually some number of total bytes (also
> >> dependent on other things like the total size of your current
> >> environment), rather than the limit being only a fixed number of
> >> arguments. This means you would have to check not just the number of
> >> arguments, but the sum of the lengths of each.
> >>
> >> If you find yourself running into problems with to much data in argv
> >> for a single exec, you may wish to consider letting xargs handle
> >> splitting the paths into an appropriate number of separate execs of
> >> your python script. This is one of the reasons it exists. If you do
> >> this, be sure to split the paths with '\0' and use xargs -0.
> >>
> >> Consider this example:
> >> $ cat argc.c
> >> #include <stdio.h>
> >> int main(int argc) { printf("%d\n", argc); }
> >>
> >> $ make argc
> >> cc     argc.c   -o argc
> >>
> >> $ yes AAAA | head -$((1024*100)) | xargs ./argc
> >> 26214
> >> 26214
> >> 26214
> >> 23762
> >>
> >> $ yes AAAAAAAAAAAA | head -$((1024*100)) | xargs ./argc
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 10082
> >> 1591
> >>
> >> You may also wish to set an artificially small max length
> > 
> > Either with xargs -s, or in your own script if you don't use xargs.
> > The same concern exists either way.
> > 
> > ISTM that being extra cautious at the expense of a few extra execs is
> > a good trade-off. If performance really mattered you wouldn't be
> > execing in the first place.
> > 
> >> to guard
> >> against any potential edge cases which xargs itself may have or may
> >> develop in the future which may cause final arguments to get dropped
> >> or truncated, as such bugs may be unlikely to be found and may have
> >> very bad consequences (files not being marked as untrusted).
> >>
> >> Cheers,
> >> Jean-Philippe
> > 
> 
> So the exec* family of C functions separates char pointers by spaces,

No, you're observing "xargs" behaviour.

> and it doesn't seem to be configurable, thus I may have to keep the
> space separation but escape spaces in the argument list.



> user@dev$ echo "hello there" this is a test for many words and xargs in
> one go | xargs -s 24 ./argc
> 5
> 5
> 4
> 3
> 2
> user@dev$ echo "hello\ there" this is a test for many words and xargs in
> one go | xargs -s 24 ./argc
> 4
> 5
> 4
> 3
> 2
> 
> I'll note it _only_ works if there is a preceding backslash and the
> words are surrounded by double-quotes.

As Jean-Philippe pointed before, use \0 for separating arguments (-0,
- --null), _instead_ of spaces:

$ echo -e '"hello 
there"\0this\0is\0a\0test\0for\0many\0words\0and\0xargs\0in\0one\0go' | xargs 
-0s 26 ./argc
3
6
3
5

$ echo -e 'hello 
there\0this\0is\0a\0test\0for\0many\0words\0and\0xargs\0in\0one\0go' | xargs 
-0s 26 ./argc
3
6
3
5


> Again I'm not entirely sure if a workaround for the large amount of
> arguments I'm handing python is needed, but one strong benefit of using
> xargs (or a similar method) is the ability to split up the list and
> parallel calls. Since the script simply marks each file as untrusted
> when called from the daemon, it should be fine to parallelize.

I wouldn't worry about parallel execution now. Lets have it working
first. But yes, using xargs makes it easy too.

> Another blocker is that the script current sys.exit's on an error. This
> behavior is undesirable as depending on which file errors out, all
> subsequent files will not attempt to be marked. In this case, is it best
> to thus catch the call, store the error number, then return the error at
> the end of processing all files regardless of what it may be?

Probably, something like:

    retcode = 0
    for filename in ...:
        try:
            do something
        except ...:
            ... log exception
            retcode = 1

    sys.exit(retcode)


> As long as that number is only overridden by erratic behavior, then any
> script calling it should detect the non-zero return code and act
> accordingly. The only issue would be if the calling script attempted to
> act based on the specific error number which may be overridden during
> execution with a later error.

It should be easy to define exit codes, like "exit code X means _any_ of
files is Y". Anyway IMO it's fine to allow multiple filenames only while
changing attributes.

- -- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZdQNlAAoJENuP0xzK19cssDYH/2Zt/ib1hN7koqbHn63SNJmL
RWXD+Ozmv7gX1IIB89ENuBz43fy2QiyHxZCHBehhsLmN8hZEshZL1LQ2+5/MCo4F
/I9e3nfekz+q3aH+Fz88TBwWuXuXP9VNNr8LYOkXOhLRdrgvpsr1oUReeIdvNtid
5aCnCS8rE40dAhvG/zjUdhV4k8csP7TevxquuFCLbHsM4znj1cGovDP/CcEPaB47
Jt4SnhsOa0m58huqsNGyLFQXSRsD5NkICUKKpzy5RPB8cZgdG47CtozXzgenbSlM
N4NOiPuq9BXUlyl7chPdHw/Ve84K5dPpSMboYsPdLbnlUyrzQDDwe0nBXsk02uk=
=ie4u
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/20170723201325.GQ1095%40mail-itl.
For more options, visit https://groups.google.com/d/optout.

Reply via email to