On Mon, Aug 21, 2017 at 7:01 PM, Justin Cinkelj <[email protected]> wrote:
> The strtok_r doesn't allow escaping the input parameters, which means > each space is considered a separator. The boost::tokenizer supports > escaping. This commit replaces strtok_r with boost::tokenizer. The > final __argv is still continus in memory, as expected by some programs. > > Spaces can be quoted with " or with ', and " or ' can be escaped with \. > > (partially) fixes #892 > What does it not fix? Wow, I didn't realize how screwed up our boot-time command line parsing is... OSv takes a string from the boot image, and then (in parse_cmdline() called in arch_setup_free_memory()) this is split arbitrarily on on whitespace to create argc, argv. Why do we even need this argc/argv? We don't, really.... this is *not* the command line of the actual application, just of the OSv kernel. This split-up argc/argv is then used to parse the kernel's command line options (hence the "--env" bug) and after those options are parsed, everything else in the command line is glued together again into one string - and then passed to parse_command_line() which splits it up again, but using a different mechanism (also support quoting). What an unholy mess... A few more comments below. > > Signed-off-by: Justin Cinkelj <[email protected]> > --- > core/commands.cc | 52 ++++++++++++++++++++++++++++++ > ++++++++++++---------- > 1 file changed, 42 insertions(+), 10 deletions(-) > > diff --git a/core/commands.cc b/core/commands.cc > index 6287d76..86b25c2 100644 > --- a/core/commands.cc > +++ b/core/commands.cc > @@ -19,6 +19,8 @@ > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > +#include <string> > +#include <boost/tokenizer.hpp> > > namespace qi = boost::spirit::qi; > namespace ascii = boost::spirit::ascii; > @@ -320,13 +322,8 @@ std::string getcmdline() > > int parse_cmdline(const char *p) > { > - char* save; > - > if (args.size()) { > - // From the strtok manpage, we see that: "The first call to > strtok() > - // sets this pointer to point to the first byte of the string." It > - // follows from this that the first argument contains the address > we > - // should use to free the memory allocated for the string > + // free date returend by malloc, and stored into args_data > free(args[0]); > } > > @@ -336,16 +333,51 @@ int parse_cmdline(const char *p) > } > osv_cmdline = strdup(p); > > - char* cmdline = strdup(p); > + std::string escSep("\\"); // escape character > above parse_cmdline() there is an old (3 year old) comment saying it cannot use std::string. I guess you checked and this is incorrect? + std::string delim(" \t\n"); // split on spaces, tab, newline > + std::string quote("\"'"); // allow quoted arguments > + boost::escaped_list_separator<char> esc(escSep, delim, quote); > + typedef boost::tokenizer<boost::escaped_list_separator<char>> > tokenizer; > + > + // Split cmdline into tokens. > + std::string cmdline = p; > + tokenizer token{cmdline, esc}; > + // args2 has same content as __argv, but strings in the later are also > + // consecutive in memory. > + std::vector<char*> args2; > + for (const auto &tt : token) { > + // strdup will create copy of temporal variable tt. > + args2.push_back(strdup(tt.c_str())); > + } > > - while ((p = strtok_r(cmdline, " \t\n", &save)) != nullptr) { > - args.push_back(const_cast<char *>(p)); > - cmdline = nullptr; > + // Copy content from args2 to one large array args_data - string with > multiple '\0' inside. > + // args than contains pointers into args_data. > I doubt we actually need this step - unless I'm missing something, the application never sees this split up argv, rather, it is all glued back together to a single string, and split again by a similarly named but ver different function parse_command_line(). + size_t sz = 0; > + for (const auto arg: args2) { > + sz += strlen(arg) + 1; > + } > + char* args_data = (char*)malloc(sz); > + size_t pos = 0; > + for (const auto arg: args2) { > + memcpy(args_data + pos, arg, strlen(arg) + 1); > + args.push_back(args_data + pos); > + pos += strlen(arg) + 1; > + free(arg); > } > + assert(pos == sz); > args.push_back(nullptr); > + > __argv = args.data(); > __argc = args.size() - 1; > > +#if 1 > This should of course go before being committed. + int ii = 0; > + char** ch; > + for (ch = __argv; *ch != nullptr; ch++, ii++) { > + printf("DBG av[%d] = '%s'\n", ii, *ch); > + } > +#endif > + > return 0; > } > > -- > 2.9.4 > > -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
