> What does it not fix?
The $892 mentions environ varibales with spaces or empty one. And command line parsing is done in loader.cc to get loader options, and in commands.cc, to get argv/argc for app, and to implement runscript. This one only touched loader.cc options, and ensured that commands.cc can get argv[i] with spaces. It didn't fix runcript. Also, when commands.cc re-assembles and re-parses cmdline, the initial "aa bb" argv[i] get incorrectly split on space.

> everything else in the command line is glued together again into one string
In https://github.com/cloudius-systems/osv/blob/master/loader.cc#L319
    // concatenate everything
    for (auto i = 0; i < ac; i++) {
        line += std::string(av[i]) + " ";
   }
It doesn't expect av[i] to contain space. Also " or ' or \ inside would not work, I guess.

> above parse_cmdline() there is an old (3 year old) comment saying it cannot use std::string.
> I guess you checked and this is incorrect?
I have not checked. I just blindly tried to use some existing parser, and not invent/write my own.
So this code might be broken and problematic.
PS:
I also used not only debug_early, but also printf, and even std::cout, and it worked - for me. On the other side, Gregor complained in https://github.com/cloudius-systems/osv/issues/892#issuecomment-323910628 about std::cout not working (inside parse_cmdline, I think). So it might depend on gcc/boost/something version.

> Why do we even need this argc/argv? We don't, really....
And neither we don't really need this patch ? ;) I vote to abandon it.

I was looking if there is way to get from boost::tokenizer both token and its start/end position in original string. Then, loader would consume first few tokes (the loader options, they all start with '-') from argv array. The remaining part would be consumed by commands.cc - end position of last loader option would be passed as commandline to it.

--env=AAA="aa bb"
is problematic case here, as it contains spaces.
So we need to support quoting and spaces and escaping.
Nothing from C++/std/boost can be used.
strtok from C doesn't support quoting/escaping.
Are you aware of any C alternative to strtok with quoting/escaping support? Just asking before trying to do it myself.

Justin

On 08/22/2017 01:09 PM, Nadav Har'El wrote:

On Mon, Aug 21, 2017 at 7:01 PM, Justin Cinkelj <[email protected] <mailto:[email protected]>> wrote:

    The strtok_r doesn't allow escaping the input parameters, which means
    each space is considered a separator. The boost::tokenizer supports
    escaping. This commit replaces strtok_r with boost::tokenizer. The
    final __argv is still continus in memory, as expected by some
    programs.

    Spaces can be quoted with " or with ', and " or ' can be escaped
    with \.

    (partially) fixes #892


What does it not fix?

Wow, I didn't realize how screwed up our boot-time command line parsing is...

OSv takes a string from the boot image, and then (in parse_cmdline() called in arch_setup_free_memory()) this is split arbitrarily on on whitespace to create argc, argv. Why do we even need this argc/argv? We don't, really.... this is *not* the command line of the actual application, just of the OSv kernel. This split-up argc/argv is then used to parse the kernel's command line options (hence the "--env" bug) and after those options are parsed, everything else in the command line is glued together again into one string - and then passed to parse_command_line() which splits it up again, but using a different mechanism (also support quoting).

What an unholy mess...

A few more comments below.


    Signed-off-by: Justin Cinkelj <[email protected]
    <mailto:[email protected]>>
    ---
     core/commands.cc | 52
    ++++++++++++++++++++++++++++++++++++++++++----------
     1 file changed, 42 insertions(+), 10 deletions(-)

    diff --git a/core/commands.cc b/core/commands.cc
    index 6287d76..86b25c2 100644
    --- a/core/commands.cc
    +++ b/core/commands.cc
    @@ -19,6 +19,8 @@
     #include <sys/types.h>
     #include <sys/stat.h>
     #include <fcntl.h>
    +#include <string>
    +#include <boost/tokenizer.hpp>

     namespace qi = boost::spirit::qi;
     namespace ascii = boost::spirit::ascii;
    @@ -320,13 +322,8 @@ std::string getcmdline()

     int parse_cmdline(const char *p)
     {
    -    char* save;
    -
         if (args.size()) {
    -        // From the strtok manpage, we see that: "The first call
    to strtok()
    -        // sets this pointer to point to the first byte of the
    string." It
    -        // follows from this that the first argument contains the
    address we
    -        // should use to free the memory allocated for the string
    +        // free date returend by malloc, and stored into args_data
             free(args[0]);
         }

    @@ -336,16 +333,51 @@ int parse_cmdline(const char *p)
         }
         osv_cmdline = strdup(p);

    -    char* cmdline = strdup(p);
    +    std::string escSep("\\"); // escape character


above parse_cmdline() there is an old (3 year old) comment saying it cannot use std::string.
I guess you checked and this is incorrect?

    +    std::string delim(" \t\n"); // split on spaces, tab, newline
    +    std::string quote("\"'"); // allow quoted arguments
    +    boost::escaped_list_separator<char> esc(escSep, delim, quote);
    +    typedef boost::tokenizer<boost::escaped_list_separator<char>>
    tokenizer;
    +
    +    // Split cmdline into tokens.
    +    std::string cmdline = p;
    +    tokenizer token{cmdline, esc};
    +    // args2 has same content as __argv, but strings in the later
    are also
    +    // consecutive in memory.
    +    std::vector<char*> args2;
    +    for (const auto &tt : token) {
    +        // strdup will create copy of temporal variable tt.
    +        args2.push_back(strdup(tt.c_str()));
    +    }

    -    while ((p = strtok_r(cmdline, " \t\n", &save)) != nullptr) {
    -        args.push_back(const_cast<char *>(p));
    -        cmdline = nullptr;
    +    // Copy content from args2 to one large array args_data -
    string with multiple '\0' inside.
    +    // args than contains pointers into args_data.


I doubt we actually need this step - unless I'm missing something, the application never sees this split up argv, rather, it is all glued back together to a single string, and split again by a similarly named but ver different function parse_command_line().

    +    size_t sz = 0;
    +    for (const auto arg: args2) {
    +        sz += strlen(arg) + 1;
    +    }
    +    char* args_data = (char*)malloc(sz);
    +    size_t pos = 0;
    +    for (const auto arg: args2) {
    +        memcpy(args_data + pos, arg, strlen(arg) + 1);
    +        args.push_back(args_data + pos);
    +        pos += strlen(arg) + 1;
    +        free(arg);
         }
    +    assert(pos == sz);
         args.push_back(nullptr);
    +
         __argv = args.data();
         __argc = args.size() - 1;

    +#if 1


This should of course go before being committed.

    +    int ii = 0;
    +    char** ch;
    +    for (ch = __argv; *ch != nullptr; ch++, ii++) {
    +        printf("DBG av[%d] = '%s'\n", ii, *ch);
    +    }
    +#endif
    +
         return 0;
     }

    --
    2.9.4

    --
    You received this message because you are subscribed to the Google
    Groups "OSv Development" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:osv-dev%[email protected]>.
    For more options, visit https://groups.google.com/d/optout
    <https://groups.google.com/d/optout>.



--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to