> What does it not fix?
The $892 mentions environ varibales with spaces or empty one. And
command line parsing is done in loader.cc to get loader options, and in
commands.cc, to get argv/argc for app, and to implement runscript. This
one only touched loader.cc options, and ensured that commands.cc can get
argv[i] with spaces. It didn't fix runcript. Also, when commands.cc
re-assembles and re-parses cmdline, the initial "aa bb" argv[i] get
incorrectly split on space.
> everything else in the command line is glued together again into one
string
In https://github.com/cloudius-systems/osv/blob/master/loader.cc#L319
// concatenate everything
for (auto i = 0; i < ac; i++) {
line += std::string(av[i]) + " ";
}
It doesn't expect av[i] to contain space. Also " or ' or \ inside would
not work, I guess.
> above parse_cmdline() there is an old (3 year old) comment saying it
cannot use std::string.
> I guess you checked and this is incorrect?
I have not checked. I just blindly tried to use some existing parser,
and not invent/write my own.
So this code might be broken and problematic.
PS:
I also used not only debug_early, but also printf, and even std::cout,
and it worked - for me.
On the other side, Gregor complained in
https://github.com/cloudius-systems/osv/issues/892#issuecomment-323910628
about
std::cout not working (inside parse_cmdline, I think). So it might
depend on gcc/boost/something version.
> Why do we even need this argc/argv? We don't, really....
And neither we don't really need this patch ? ;) I vote to abandon it.
I was looking if there is way to get from boost::tokenizer both token
and its start/end position in original string.
Then, loader would consume first few tokes (the loader options, they all
start with '-') from argv array.
The remaining part would be consumed by commands.cc - end position of
last loader option would be passed as commandline to it.
--env=AAA="aa bb"
is problematic case here, as it contains spaces.
So we need to support quoting and spaces and escaping.
Nothing from C++/std/boost can be used.
strtok from C doesn't support quoting/escaping.
Are you aware of any C alternative to strtok with quoting/escaping
support? Just asking before trying to do it myself.
Justin
On 08/22/2017 01:09 PM, Nadav Har'El wrote:
On Mon, Aug 21, 2017 at 7:01 PM, Justin Cinkelj
<[email protected] <mailto:[email protected]>> wrote:
The strtok_r doesn't allow escaping the input parameters, which means
each space is considered a separator. The boost::tokenizer supports
escaping. This commit replaces strtok_r with boost::tokenizer. The
final __argv is still continus in memory, as expected by some
programs.
Spaces can be quoted with " or with ', and " or ' can be escaped
with \.
(partially) fixes #892
What does it not fix?
Wow, I didn't realize how screwed up our boot-time command line
parsing is...
OSv takes a string from the boot image, and then (in parse_cmdline()
called in arch_setup_free_memory()) this is split arbitrarily on on
whitespace to create argc, argv.
Why do we even need this argc/argv? We don't, really.... this is *not*
the command line of the actual application, just of the OSv kernel.
This split-up argc/argv is then used to parse the kernel's command
line options (hence the "--env" bug) and after those options are
parsed, everything else in the command line is glued together again
into one string - and then passed to parse_command_line() which splits
it up again, but using a different mechanism (also support quoting).
What an unholy mess...
A few more comments below.
Signed-off-by: Justin Cinkelj <[email protected]
<mailto:[email protected]>>
---
core/commands.cc | 52
++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 42 insertions(+), 10 deletions(-)
diff --git a/core/commands.cc b/core/commands.cc
index 6287d76..86b25c2 100644
--- a/core/commands.cc
+++ b/core/commands.cc
@@ -19,6 +19,8 @@
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
+#include <string>
+#include <boost/tokenizer.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
@@ -320,13 +322,8 @@ std::string getcmdline()
int parse_cmdline(const char *p)
{
- char* save;
-
if (args.size()) {
- // From the strtok manpage, we see that: "The first call
to strtok()
- // sets this pointer to point to the first byte of the
string." It
- // follows from this that the first argument contains the
address we
- // should use to free the memory allocated for the string
+ // free date returend by malloc, and stored into args_data
free(args[0]);
}
@@ -336,16 +333,51 @@ int parse_cmdline(const char *p)
}
osv_cmdline = strdup(p);
- char* cmdline = strdup(p);
+ std::string escSep("\\"); // escape character
above parse_cmdline() there is an old (3 year old) comment saying it
cannot use std::string.
I guess you checked and this is incorrect?
+ std::string delim(" \t\n"); // split on spaces, tab, newline
+ std::string quote("\"'"); // allow quoted arguments
+ boost::escaped_list_separator<char> esc(escSep, delim, quote);
+ typedef boost::tokenizer<boost::escaped_list_separator<char>>
tokenizer;
+
+ // Split cmdline into tokens.
+ std::string cmdline = p;
+ tokenizer token{cmdline, esc};
+ // args2 has same content as __argv, but strings in the later
are also
+ // consecutive in memory.
+ std::vector<char*> args2;
+ for (const auto &tt : token) {
+ // strdup will create copy of temporal variable tt.
+ args2.push_back(strdup(tt.c_str()));
+ }
- while ((p = strtok_r(cmdline, " \t\n", &save)) != nullptr) {
- args.push_back(const_cast<char *>(p));
- cmdline = nullptr;
+ // Copy content from args2 to one large array args_data -
string with multiple '\0' inside.
+ // args than contains pointers into args_data.
I doubt we actually need this step - unless I'm missing something, the
application never sees this split up argv, rather, it is all glued
back together to a single string, and split again by a similarly named
but ver different function parse_command_line().
+ size_t sz = 0;
+ for (const auto arg: args2) {
+ sz += strlen(arg) + 1;
+ }
+ char* args_data = (char*)malloc(sz);
+ size_t pos = 0;
+ for (const auto arg: args2) {
+ memcpy(args_data + pos, arg, strlen(arg) + 1);
+ args.push_back(args_data + pos);
+ pos += strlen(arg) + 1;
+ free(arg);
}
+ assert(pos == sz);
args.push_back(nullptr);
+
__argv = args.data();
__argc = args.size() - 1;
+#if 1
This should of course go before being committed.
+ int ii = 0;
+ char** ch;
+ for (ch = __argv; *ch != nullptr; ch++, ii++) {
+ printf("DBG av[%d] = '%s'\n", ii, *ch);
+ }
+#endif
+
return 0;
}
--
2.9.4
--
You received this message because you are subscribed to the Google
Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:osv-dev%[email protected]>.
For more options, visit https://groups.google.com/d/optout
<https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.