On 07/23/2017 11:59 PM, Marek Marczykowski-Górecki wrote:
> On Sun, Jul 23, 2017 at 01:29:17PM -0700, Andrew Morgan wrote:
>> On 07/23/2017 12:58 PM, Jean-Philippe Ouellet wrote:
>>> On Sun, Jul 23, 2017 at 2:28 PM, Andrew Morgan 
>>> <anoa-1ww2luigxekrvmdpamj...@public.gmane.org> wrote:
>>>> On 07/20/2017 08:27 AM, Jean-Philippe Ouellet wrote:
>>>>> On Thu, Jul 20, 2017 at 11:22 AM, Jean-Philippe Ouellet 
>>>>> <jpo-pjaqau27...@public.gmane.org> wrote:
>>>>>> On Thu, Jul 20, 2017 at 1:42 AM, Andrew Morgan 
>>>>>> <anoa-1ww2luigxekrvmdpamj...@public.gmane.org> wrote:
>>>>>>> Also did a test with moving in an enormous folder, daemon took up 16%
>>>>>>> CPU for a second in htop then right back to 0%, so seems pretty well
>>>>>>> optimized for now. inotify finds all the files and folders in way until
>>>>>>> a few hundred milli-seconds, so we may need to scale our period for
>>>>>>> calling qvm-file-trust with a list of files down a bit (unless python
>>>>>>> can take in 10K+ full filepaths as arguments).
>>>>>>
>>>>>> During exec(2), the kernel places arguments somewhere at the top of
>>>>>> the stack, along with your environment variables and some other stuff.
>>>>>> Thus, the limit is likely actually some number of total bytes (also
>>>>>> dependent on other things like the total size of your current
>>>>>> environment), rather than the limit being only a fixed number of
>>>>>> arguments. This means you would have to check not just the number of
>>>>>> arguments, but the sum of the lengths of each.
>>>>>>
>>>>>> If you find yourself running into problems with to much data in argv
>>>>>> for a single exec, you may wish to consider letting xargs handle
>>>>>> splitting the paths into an appropriate number of separate execs of
>>>>>> your python script. This is one of the reasons it exists. If you do
>>>>>> this, be sure to split the paths with '\0' and use xargs -0.
>>>>>>
>>>>>> Consider this example:
>>>>>> $ cat argc.c
>>>>>> #include <stdio.h>
>>>>>> int main(int argc) { printf("%d\n", argc); }
>>>>>>
>>>>>> $ make argc
>>>>>> cc     argc.c   -o argc
>>>>>>
>>>>>> $ yes AAAA | head -$((1024*100)) | xargs ./argc
>>>>>> 26214
>>>>>> 26214
>>>>>> 26214
>>>>>> 23762
>>>>>>
>>>>>> $ yes AAAAAAAAAAAA | head -$((1024*100)) | xargs ./argc
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 10082
>>>>>> 1591
>>>>>>
>>>>>> You may also wish to set an artificially small max length
>>>>>
>>>>> Either with xargs -s, or in your own script if you don't use xargs.
>>>>> The same concern exists either way.
>>>>>
>>>>> ISTM that being extra cautious at the expense of a few extra execs is
>>>>> a good trade-off. If performance really mattered you wouldn't be
>>>>> execing in the first place.
>>>>>
>>>>>> to guard
>>>>>> against any potential edge cases which xargs itself may have or may
>>>>>> develop in the future which may cause final arguments to get dropped
>>>>>> or truncated, as such bugs may be unlikely to be found and may have
>>>>>> very bad consequences (files not being marked as untrusted).
>>>>>>
>>>>>> Cheers,
>>>>>> Jean-Philippe
>>>>>
>>>>
>>>> So the exec* family of C functions separates char pointers by spaces,
>>>
>>> Err... not sure what you mean by that. Perhaps you are confusing
>>> exec's behavior with echo's?
>>>
>>>> and it doesn't seem to be configurable, thus I may have to keep the
>>>> space separation but escape spaces in the argument list.
>>>>
>>>> user@dev$ echo "hello there" this is a test for many words and xargs in
>>>> one go | xargs -s 24 ./argc
>>>> 5
>>>> 5
>>>> 4
>>>> 3
>>>> 2
>>>> user@dev$ echo "hello\ there" this is a test for many words and xargs in
>>>> one go | xargs -s 24 ./argc
>>>> 4
>>>> 5
>>>> 4
>>>> 3
>>>> 2
>>>>
>>>> I'll note it _only_ works if there is a preceding backslash and the
>>>> words are surrounded by double-quotes.
>>>
>>> That's why I suggested using xargs -0 and splitting filenames with '\0'.
> 
>> Er, sorry. I'm referring to the following lines of C/C++ code I have in
>> qubes-trust-daemon:
> 
>>      // Fork and attempt to call qvm-file-trust
>>      switch (child_pid=fork()) {
>>          case 0:
>>              // We're the child, call qvm-file-trust
>>              execv("/usr/bin/qvm-file-trust", (char **) qvm_argv);
> 
>>              // Unreachable if no error
>>              perror("execl qvm-file-trust");
>>              exit(1);
>> ...
> 
>> Filenames are contained within qvm_argv and are passed to qvm-file-trust
>> as arguments separated by space. 
> 
> And here you are wrong. Arguments are passed exactly as you given them
> in qvm_argv, regardless of spaces inside.
> Also, remember that argv[0] should be a program name...
> 
> 

Yes, a few lines before argv[0] is set to the program name and any
arguments are included before file paths are pumped into the array :)

As for the arguments situation, if I understand correctly execv launches
a program and hands it its *argv array as if taken from stdin. If that
is so, would it then be correct to simply instead use execl in the
following manner?

    execl("/usr/bin/xargs", "-0", "file name 1\0filename 2\0filename3\0")

Essentially just one long string of all arguments, the final string
being built through a loop? That should work if I'm understanding it
correctly (correct me if I'm not), though not sure if an enormous string
of many, many filenames is desirable.

Alternatively you could simply use execv and insert a null character at
the end of each string (presumably there's one already there though).

While toying around with above solutions, I in the mean time built a
simple splitter in C++ that would handle running a separate instance of
qvm-file-trust after a configurable amount of arguments had been passed.
You can find that implementation here:

https://github.com/anoadragon453/qubes-mime-types/blob/master/qubes-trust-daemon.cpp#L115

I'm not entirely sure if that is enough or if a solution such as xargs
that adjusts to system conditions is necessary, but for now I can say
that the splitter is enough to actually now have qubes-trust-daemon
function properly. It now watches folders and subfolders moved into
watched folders, as well as marking any file it comes across as
untrusted along the way.

I also changed the daemon to mark all files in untrusted directories as
untrusted upon first run, to ensure that if the user added a new
untrusted directory while the daemon was offline, then the files within
would still be marked appropriately.

Andrew Morgan

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to qubes-devel+unsubscr...@googlegroups.com.
To post to this group, send email to qubes-devel@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/qubes-devel/ol47me%24j3k%241%40blaine.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to