On Mon, Aug 21, 2017 at 7:01 PM, Justin Cinkelj <[email protected]>
wrote:

> The strtok_r doesn't allow escaping the input parameters, which means
> each space is considered a separator. The boost::tokenizer supports
> escaping. This commit replaces strtok_r with boost::tokenizer. The
> final __argv is still continus in memory, as expected by some programs.
>
> Spaces can be quoted with " or with ', and " or ' can be escaped with \.
>
> (partially) fixes #892
>

What does it not fix?

Wow, I didn't realize how screwed up our boot-time command line parsing
is...

OSv takes a string from the boot image, and then (in parse_cmdline() called
in arch_setup_free_memory())  this is split arbitrarily on on whitespace to
create argc, argv.
Why do we even need this argc/argv? We don't, really.... this is *not* the
command line of the actual application, just of the OSv kernel.
This split-up argc/argv is then used to parse the kernel's command line
options (hence the "--env" bug) and after those options are
parsed, everything else in the command line is glued together again into
one string - and then passed to parse_command_line() which splits it up
again, but using a different mechanism (also support quoting).

What an unholy mess...

A few more comments below.


>
> Signed-off-by: Justin Cinkelj <[email protected]>
> ---
>  core/commands.cc | 52 ++++++++++++++++++++++++++++++
> ++++++++++++----------
>  1 file changed, 42 insertions(+), 10 deletions(-)
>
> diff --git a/core/commands.cc b/core/commands.cc
> index 6287d76..86b25c2 100644
> --- a/core/commands.cc
> +++ b/core/commands.cc
> @@ -19,6 +19,8 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>  #include <fcntl.h>
> +#include <string>
> +#include <boost/tokenizer.hpp>
>
>  namespace qi = boost::spirit::qi;
>  namespace ascii = boost::spirit::ascii;
> @@ -320,13 +322,8 @@ std::string getcmdline()
>
>  int parse_cmdline(const char *p)
>  {
> -    char* save;
> -
>      if (args.size()) {
> -        // From the strtok manpage, we see that: "The first call to
> strtok()
> -        // sets this pointer to point to the first byte of the string." It
> -        // follows from this that the first argument contains the address
> we
> -        // should use to free the memory allocated for the string
> +        // free date returend by malloc, and stored into args_data
>          free(args[0]);
>      }
>
> @@ -336,16 +333,51 @@ int parse_cmdline(const char *p)
>      }
>      osv_cmdline = strdup(p);
>
> -    char* cmdline = strdup(p);
> +    std::string escSep("\\"); // escape character
>

above parse_cmdline() there is an old (3 year old) comment saying it cannot
use std::string.
I guess you checked and this is incorrect?

+    std::string delim(" \t\n"); // split on spaces, tab, newline
> +    std::string quote("\"'"); // allow quoted arguments
> +    boost::escaped_list_separator<char> esc(escSep, delim, quote);
> +    typedef boost::tokenizer<boost::escaped_list_separator<char>>
> tokenizer;
> +
> +    // Split cmdline into tokens.
> +    std::string cmdline = p;
> +    tokenizer token{cmdline, esc};
> +    // args2 has same content as __argv, but strings in the later are also
> +    // consecutive in memory.
> +    std::vector<char*> args2;
> +    for (const auto &tt : token) {
> +        // strdup will create copy of temporal variable tt.
> +        args2.push_back(strdup(tt.c_str()));
> +    }
>
> -    while ((p = strtok_r(cmdline, " \t\n", &save)) != nullptr) {
> -        args.push_back(const_cast<char *>(p));
> -        cmdline = nullptr;
> +    // Copy content from args2 to one large array args_data - string with
> multiple '\0' inside.
> +    // args than contains pointers into args_data.
>

I doubt we actually need this step - unless I'm missing something, the
application never sees this split up argv, rather, it is all glued back
together to a single string, and split again by a similarly named but ver
different function parse_command_line().

+    size_t sz = 0;
> +    for (const auto arg: args2) {
> +        sz += strlen(arg) + 1;
> +    }
> +    char* args_data = (char*)malloc(sz);
> +    size_t pos = 0;
> +    for (const auto arg: args2) {
> +        memcpy(args_data + pos, arg, strlen(arg) + 1);
> +        args.push_back(args_data + pos);
> +        pos += strlen(arg) + 1;
> +        free(arg);
>      }
> +    assert(pos == sz);
>      args.push_back(nullptr);
> +
>      __argv = args.data();
>      __argc = args.size() - 1;
>
> +#if 1
>

This should of course go before being committed.

+    int ii = 0;
> +    char** ch;
> +    for (ch = __argv; *ch != nullptr; ch++, ii++) {
> +        printf("DBG av[%d] = '%s'\n", ii, *ch);
> +    }
> +#endif
> +
>      return 0;
>  }
>
> --
> 2.9.4
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to