[jira] [Updated] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

Varun Vasudev (JIRA) Thu, 24 Aug 2017 12:06:22 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Varun Vasudev updated YARN-6623:
--------------------------------
    Attachment: YARN-6623.006.patch

Thank you for the review [~ebadger]!
----
bq. Should we fix the no newline at the end of file warnings? The apply tool 
complains about them.

Fixed.
----
{noformat}
DockerCommand:getCommandArguments()

public Map<String, List<String>> getCommandArguments() {
  return Collections.unmodifiableMap(commandArguments);
}

This will return the command as well as the arguments. Unless we are 
considering the /usr/docker to be the actual command and inspect to be one of 
the arguments. If that’s what we’re expecting to happen here, then the name is 
a little bit misleading. This might be more of a problem with how 
commandArguments is named than how this function is named.
{noformat}

Renamed function to getCommandWithArguments. Is that ok?
----
{noformat}
container-executor.c:construct_docker_command()

+char *construct_docker_command(const char *command_file) {
+  int ret = 0;
+  char *buffer = (char *) malloc(EXECUTOR_PATH_MAX * sizeof(char));

This should use _SC_ARG_MAX as we did in YARN-6988
size_t command_size = MIN(sysconf(_SC_ARG_MAX), 128*1024);
Also, why not use calloc() instead of malloc() and then memset()?
container-executor.c:run_docker()
{noformat}

Fixed; changed to use the alloc_and_clear function.
----
{noformat}
+  docker_command = construct_docker_command(command_file);
+  docker_binary = get_docker_binary(&CFG);

I don’t see these getting freed. Am I missing this invocation somewhere?
container-executor.c:run_docker()

{noformat}

We call the execvp function, the running program will get replaced by the 
docker invocation.
----
{noformat}
+  memset(docker_command_with_binary, 0, EXECUTOR_PATH_MAX);

Is this necessary? We allocate the memory with calloc() which already clears 
all of the memory upon allocation.
{noformat}

Yep. Fixed.
----
{noformat}
{
container-executor.h

// get the executable's filename
char* get_executable(char *argv0);

Do we need this declaration (in container-executor.h) since we have moved that 
declaration into get_executable.h? We should also add get_executable.h in the 
appropriate places. Looks like main.c and test-container-executor.c both call 
get_executable.
{noformat}

You're correct; fixed
----
{noformat}
main.c:assert_valid_setup()

-    fprintf(ERRORFILE,"realpath of executable: %s\n",
-      errno != 0 ? strerror(errno) : "unknown");
-    flush_and_close_log_files();
-    exit(-1);
+    fprintf(ERRORFILE, "realpath of executable: %s\n",
+            errno != 0 ? strerror(errno) : "unknown");
+    exit(INVALID_CONFIG_FILE);

Why don’t we want to flush the log files anymore?
{noformat}

Fixed.
----
{noformat}
util.c:alloc_and_clear_memory()

+void* alloc_and_clear_memory(size_t num, size_t size) {
+  void *ret = calloc(num, size);
+  if (ret == NULL) {
+    exit(OUT_OF_MEMORY);
+  }
+  return ret;
+}

Should we inline this? Also, if we’re going to use this function, then all 
calloc calls should be replaced with this (like the ones I mentioned above)
util.h
{noformat}

Fixed(made function inline and replaced calloc invocations with alloc_and_clear)
----
{noformat}

// DOCKER_CONTAINER_NAME_INVALID = 41,

Should add (NOT USED) to follow convention
docker-util.c:read_and_verify_command_file()

{noformat}

Fixed.
----
{noformat}
if (command == NULL || (strcmp(command, docker_command) != 0)) {
  ret = INCORRECT_COMMAND;
}

Is command guaranteed to be null-terminated? It comes from the configuration 
file, which is a Java creation and I don’t think Java null-terminates. This 
comment is probably relevant for quite a few other places that are doing string 
operations. We need to be very safe about this or else we risk possibly 
overrunning into random regions of the heap. A safe alternative would be to use 
the “n” version of all the string operations. This patch uses a mixed bag of 
the regular versions and their accompanying “n” versions. I don’t quite 
understand the reasoning behind the usage of each. If we guarantee that the 
string is null terminated (and always null terminated) then we don’t need the 
“n” versions. But even if we guarantee that the input string is null 
terminated, it may accidentally have the null character chopped off by an off 
by 1 error in a strdup or something like that. So my preference here would be 
to use the “n” versions of all of the string functions. Thoughts?
{noformat}

command is guaranteed to be null terminated by the confiugration functions. If 
we use the 'n' versions of the function we end up doing a 'begins with' match 
instead of an exact match which could cause problems(e.g. "inspect" would match 
"inspectcommand")
----
{noformat}
docker-util.c:read_and_verify_command_file()

+  if (current_len + string_len < bufflen - 1) {
+    strncpy(buff + current_len, string, bufflen - current_len);
+    buff[current_len + string_len] = '\0';
+    return 0;
+  }

Here it is copying over bufflen - current_len bytes, when we really only have 
string_len bytes to copy. It’s not going to overflow buff, but we might copy 
some extra garbage past the end of string if it’s not null terminated.
{noformat}

Good point, fixed.
----
{noformat}
docker-util.c: add_docker_config_param()

+static int add_docker_config_param(const struct configuration *command_config, 
char *out, const size_t outlen) {
+  int ret = 0;
+  size_t tmp_buffer_size = 4096;
+  char *tmp_buffer;
+  char *config = get_configuration_value("docker-config", 
DOCKER_COMMAND_FILE_SECTION, command_config);
+  if (config != NULL) {
+    tmp_buffer = (char *) alloc_and_clear_memory(tmp_buffer_size, 
sizeof(char));
+    quote_and_append_arg(&tmp_buffer, &tmp_buffer_size, "--config=", config);
+    ret = add_to_buffer(out, outlen, tmp_buffer);
+    free(tmp_buffer);
+    free(config);
+  }
+  return ret;

Is this function necessary? Can’t it be done in a call to add_param_to_command?
{noformat}

You're correct again :). Fixed.
----
{noformat}
docker-util.c: get_docker_load_command()

+  image_name = get_configuration_value("image", DOCKER_COMMAND_FILE_SECTION, 
&command_config);
+  if (image_name == NULL) {
+    return INVALID_DOCKER_IMAGE_NAME;
+  }
+
+  memset(out, 0, outlen);

Maybe being paranoid is a good thing, but get_docker_load_command and any of 
the other commands from get_docker_command are always passed in a buffer that 
has already been nulled out, so the memset here is redundant. I imagine it will 
be similar in the other get_*_comand functions. Maybe we should put a comment 
saying that it is the caller’s responsibility to null out the buffer that they 
pass in and get rid of these memsets? Either that or keep the memset here and 
get rid of the resets higher up in get_docker_command?
{noformat}

I've gotten rid of the memset in the higher calls and left the memset in the 
get_docker_* functions. I just feel more comfortable having the memset there.
----
{noformat}
docker-util.c: get_docker_load_command()

+  image_name = get_configuration_value("image", DOCKER_COMMAND_FILE_SECTION, 
&command_config);
+  if (image_name == NULL) {
+    return INVALID_DOCKER_IMAGE_NAME;
+  }

We should validate the image name here as we do in get_docker_pull_command 
below.
{noformat}

You can pass tar files to docker load so we can't do much validation here. 
Docker save lets you save the image to any file you want and then you load it 
using docker load.
----
{noformat}
docker-util.c: get_docker_rm_command()

+  container_name = get_configuration_value("name", 
DOCKER_COMMAND_FILE_SECTION, &command_config);
+  if (container_name == NULL) {
+    return INVALID_DOCKER_CONTAINER_NAME;
+  }

We validate the container_id name in docker inspect, but not in docker rm, 
stop, or run
{noformat}

Fixed.
----
{noformat}
docker-util.c

+static int add_param_to_command(const struct configuration *command_config, 
const char *key, const char *param,
+                                 const int with_argument, char *out, const 
size_t outlen) {
+  size_t tmp_buffer_size = 1024;

What’s the reasoning behind the buffer size here? It’s hardcoded to 1024 in a 
bunch of places. This doesn’t follow the system path limit, the 
EXECUTOR_PATH_MAX or the system argument length.
{noformat}

It's just a temporary buffer. The size doesn't matter too much because 
quote_and_append_arg will resize the buffer if it's too small.
----
{noformat}
docker-util.c:set_network()

+  } else if (value == NULL) {
+    ret = 0;
+    goto free_and_exit;

Since we didn’t find any network in the configuration, we didn’t set a network. 
That seems like it should print out an error and/or fail. Right now it will 
just silently do nothing.
{noformat}

You can launch docker containers with no network specified. It'll just pick up 
the default network.
----
{noformat}
docker-util.c:set_network()

+  if (permitted_values != NULL) {
+    free(permitted_values[0]);
+    free(permitted_values);
+  }

We need to loop through permitted_values and free everything or else we will 
leak.
{noformat}
{noformat}
docker-util.c:normalize_mounts()

+  char *alloc_ptr = mounts[0];
+  for (i = 0; mounts[i] != NULL; ++i) {
+    tmp = normalize_mount(mounts[i]);
+    if (tmp == NULL) {
+      return -1;
+    }
+    mounts[i] = tmp;
+  }
+  free(alloc_ptr);

Why are we freeing mounts[0] or anything at all here?
{noformat}

{noformat}
docker-util.c:add_mounts()

+  if(permitted_ro_mounts != NULL) {
+    free(permitted_ro_mounts[0]);
+    free(permitted_ro_mounts);
+  }
+  if (permitted_rw_mounts != NULL) {
+    for (i = 0; permitted_rw_mounts[i] != NULL; ++i) {
+      free(permitted_rw_mounts[i]);
+    }
+    free(permitted_rw_mounts);
+  }
+  if (values != NULL) {
+    free(values[0]);
+    free(values);
+  }

You’ve done this above too, so I might be missing something. But I don’t see 
why we only want to free the 0th element of permitted_ro_mounts. Is there 
something that guarantees that permitted_ro_mounts will only have the first 
element of the pointer array populated? The pointer arrays are getting created 
by get_configuration_values_delimiter(), which ends up doing a strdup inside of 
a while loop via split_delimiter().
{noformat}

{noformat}
docker-util.c:set_capabilities()

+  if(values != NULL) {
+    free(values[0]);
+    free(values);
+  }
+  if(permitted_values != NULL) {
+    free(permitted_values[0]);
+    free(permitted_values);
+  }

Same issue with not looping when freeing
{noformat}

{noformat}
docker-util.c:set_devices()

+  if (permitted_devices != NULL) {
+    free(permitted_devices[0]);
+    free(permitted_devices);
+  }
+  if (values != NULL) {
+    free(values[0]);
+    free(values);

Same issue with not looping when freeing
{noformat}
{noformat}
docker-util.c:get_docker_run_command()

+        if (ret != 0) {
+          free(launch_command[0]);
+          free(launch_command);
+          free(tmp_buffer);

Same issue with not looping when freeing for launch_command
{noformat}
This was because I didn't realise the configuration behaviour had changed. 
Earlier it used to return the result of the strtok_r which got changed in a 
recent patch to the strdup. Thanks for catching it!
----
{noformat}
docker-util.c:check_mount_permitted()

+    // directory check
+    permitted_mount_len = strlen(permitted_mounts[i]);
+    if (permitted_mount_len > 0
+        && permitted_mounts[i][permitted_mount_len - 1] == '/') {
+      if (strncmp(requested, permitted_mounts[i], permitted_mount_len) == 0) {
+        return 1;

This looks potentially dangerous. Though we are normalizing the paths in all of 
the current invocations, someone in the future may add in another invocation 
that doesn’t normalize. I think at a bare minimum we need a comment saying that 
this function requires all symlinks to be resolved in the path that is passed 
to this function. Because if we allow symlinks to be followed then we will get 
different behavior because of how docker treats symlinks. A possible fix for 
this could be to combine get_src_mount() and get_real_src_mount() into a single 
function that parses the mount string to get the source and then resolves it.
{noformat}

I've moved the normalize_mount call into the check_mount_permitted function. 
Does that address your concern?
----
{noformat}
docker-util.c:get_real_src_mount()

+  char *tmp = strdup(src_mount);
+  if (tmp == NULL) {
+    fprintf(ERRORFILE, "Could not allocate memory\n");
+    exit(OUT_OF_MEMORY);
+  }

We should be consistent on our error handling. Some calls to strdup we exit 
with OOM and some we let it return error. This is inconsistent between 
get_src_mount() and get_real_src_mount(). I’m not sure I have a preference on 
whether to exit here or not, but we should make sure all places are consistent.
On another note, is this function necessary given that we have 
normalize_mount()?
{noformat}

Removed get_real_src_mount because as you pointed out, we call realpath in 
normalize_mount
----
{noformat}
docker-util.c:add_mounts()

+  if (real_src_mount != NULL) {
+    free(real_src_mount);
+  }
+  if (src_mount != NULL) {
+    free(src_mount);
+  }
+  if(normalized_src_mount != NULL) {
+    free(normalized_src_mount);
+  }
+  if(container_executor_cfg_path != NULL) {
+    free((char *) container_executor_cfg_path);
+  }

We don’t need to check for null. free will do nothing if a null pointer is 
passed, so it’s safe and makes the check unnecessary.
{noformat}

Fixed.
----
{noformat}
docker-util.c:add_mounts()

+      free(real_src_mount);
+      free(src_mount);
+      free(normalized_src_mount);
+      src_mount = NULL;
+      real_src_mount = NULL;
+      normalized_src_mount = NULL;

Is the clear necessary here? The next time they are used in the loop they will 
be written instead of read. And the convention of the code in this file is to 
not clear after free (though maybe it should be?)
{noformat}
This was so that the I can call free in the free_and_exit label without 
worrying about calling free on the same pointer.
----
{noformat}
docker-util.c:add_mounts()

+      real_src_mount = get_real_src_mount(src_mount);
+      if (real_src_mount == NULL) {
+        ret = INVALID_DOCKER_MOUNT;
+        goto free_and_exit;
+      }
+      normalized_src_mount = normalize_mount(real_src_mount);
+      if(normalized_src_mount == NULL) {
+        ret = INVALID_DOCKER_MOUNT;
+        goto  free_and_exit;
+      }

Do we need to call both get_real_src_mount() and normalize_mount()? As I 
mentioned in an earlier comment, get_real_src_mount() seems to be a subset of 
the functionality of normalize_mount(). They both call realpath(), but 
normalize_mount() does some extra directory stuff.
{noformat}

You are correct. I removed the get_real_src_mount function.
----
{noformat}
docker-util.c:set_privileged()

+  if (value != NULL) {
+    free(value);
+  }
+  if (privileged_container_enabled != NULL) {
+    free(privileged_container_enabled);
+  }

Don’t need to do the null check before free
{noformat}

Fixed.
----
{noformat}
docker-util.c:set_capabilities()

+  if (values != NULL) {
+    if(permitted_values != NULL) {
+      for (i = 0; values[i] != NULL; ++i) {
+        memset(tmp_buffer, 0, tmp_buffer_size);
+        permitted = 0;
+        for (j = 0; permitted_values[j] != NULL; ++j) {
+          if (strcmp(values[i], permitted_values[j]) == 0) {
+            permitted = 1;
+            break;
+          }
+        }

I think this is the 3rd time I’ve seen this structure. I think it would be nice 
if we had a single function that took in a double array for permitted_values, 
and the value to be checked to see whether that value is permitted. That way we 
don’t have to duplicate all of the looping and can minimize the amount of code 
that we need to maintain. We could use this for networks, capabilities, mounts, 
devices, etc.
{noformat}

Fixed. I added a new function called add_param_to_command_if_allowed which if 
used for network, capabilities and devices. Mounts has it's own function due to 
the read-write behaviour.
----
{noformat}
docker-util.c:get_docker_run_command()

+  memset(out, 0, outlen);

Is there a reason we need to memset out here but not in the other get_*_command 
functions?
{noformat}

Not needed, fixed.
----
{noformat}
docker-util.c:get_docker_run_command()

+    quote_and_append_arg(&tmp_buffer, &tmp_buffer_size, "", image);
+    ret = add_to_buffer(out, outlen, tmp_buffer);
+    if (ret != 0) {
+      return BUFFER_TOO_SMALL;
+    }
+    memset(tmp_buffer, 0, tmp_buffer_size);
+
+    launch_command = get_configuration_values_delimiter("launch-command", 
DOCKER_COMMAND_FILE_SECTION, &command_config,
+                                                        ",");
+    if (launch_command != NULL) {
+      for (i = 0; launch_command[i] != NULL; ++i) {
+        memset(tmp_buffer, 0, tmp_buffer_size);

Don’t think we need the first memset here since the first thing we do in the 
for loop is memset.
{noformat}

Fixed.
----
{noformat}
docker-util.c:get_docker_run_command()

+  ret = add_to_buffer(out, outlen, DOCKER_RUN_COMMAND);
+  if (ret == 0) {
…
+  }
+  return BUFFER_TOO_SMALL;

Rather than put this whole block in an if statement, we could invert the if 
statement and have an early return on failure, similar to what we do a few 
lines up with add_docker_config_param()
{noformat}

Fixed.
----
{quote}
As a general comment for docker-util.c, it would be nice if we could 
consolidate some of the code in the get_*_command functions and in the set_* 
functions. There is a lot of redundant code in there. I’m worried that someone 
in the future will change part of the flow in one function (possibly a bug fix) 
that will then not get propagated to the rest of the similar code. This 
wouldn't necessarily have to be a part of this patch, but it might get kicked 
down the line and never get in if we don't do it now.
{quote}

I think I've addressed some of this as part of the latest patch. Can you take a 
look and let me know?

Thank you the comprehensive review. Very much appreciated.


> Add support to turn off launching privileged containers in the 
> container-executor
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-6623
>                 URL: https://issues.apache.org/jira/browse/YARN-6623
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>            Priority: Blocker
>         Attachments: YARN-6623.001.patch, YARN-6623.002.patch, 
> YARN-6623.003.patch, YARN-6623.004.patch, YARN-6623.005.patch, 
> YARN-6623.006.patch
>
>
> Currently, launching privileged containers is controlled by the NM. We should 
> add a flag to the container-executor.cfg allowing admins to disable launching 
> privileged containers at the container-executor level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YARN-6623) Add support to turn off launching privileged containers in the container-executor

Reply via email to