[
https://issues.apache.org/jira/browse/YARN-7197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240684#comment-16240684
]
Eric Yang commented on YARN-7197:
---------------------------------
[~jlowe]
{quote}
Yes, by "explode" I mean the OS will fail the container creation, and that's
where the problem lies in some cases. Here's an example scenario:
# Admin has configured /etc in the whitelist and /etc/shadow in the blacklist.
# User runs a container that bind-mounts /etc but not for malicious reasons
# YARN needs to bind-mount over /etc/shadow to prevent the visibility of that
file on the host from within the container
# Attempting to bind-mount a directory over a file fails, and the container
does not launch
The current patch doesn't handle this case properly because it misses the need
to bind-mount to a blacklisted path when a parent of a blacklisted path is
mounted. Even if it did catch that case, the blacklisted path on the host needs
to be a directory or bind-mounting the empty directory onto the host file is
going to fail, preventing the container from launching. That's what I meant by
stating we need to support bind-mounting an empty file or we're effectively
only supporting blacklisted paths that are directories.
{quote}
I see that I missed your points earlier. You would like the container to start
as much as possible rather than fail early with denied of entry. However, we
don't know for certain that mounting a empty file would not cause eruption to
the container later. There is some risk involved by allowing empty file to be
in place of black listed file that we may not fully understood the risk
involved. For example, {{/etc/shadow}} is mounted from outside empty file, we
end up strange behavior:
{code}
[root@fa79f56310e6 etc]# useradd eyang
useradd: failure while writing changes to /etc/shadow
{code}
Even if it worked, we could be leaking private container information to
outside. For example, if someone mount a random number generator socket for
encryption seed, and admin blacklisted random number generator from mounting.
The result all end up with 0s. There is no one know about a mistake have been
made. It could have serious consequence to security.
The current implementation is more brute force to denied of access, and black
listed directory is read-only, hence it notifies the user early about the
problems rather than failing with silent behavior. Let's say {{/etc}} is white
list, and {{/etc/hadoop}}, {{/etc/passwd}}, {{/etc/group}} are required for
container to have Hadoop configuration and user group information. User can
explicitly specify each item, and admin only required to define {{/etc}} once
in white list. If user attempt to mount {{/etc/shadow}} or {{/etc/}} blindly,
then it will end with denied system call. I think the current proposal is more
safe in terms of usability while preserve required policy in place.
{quote}
The size of tmp_buffer_2 is computed as the lengths of three strings sans their
NUL terminators. Those same three strings are copied into tmp_buffer_2 along
with the constant portion of the format string. sprintf is not allocating a new
destination buffer here. tmp_buffer_2 needs to be big enough to hold not only
the contents of empty_dir, mount_target, and bind_mount but also
type=bind,source= and ,target= and ,readonly along with a terminating NUL. All
of those constant characters in the format string are not part of the size
calculation for tmp_buffer_2 which means the sprintf is going to blow past the
end of the allocated buffer and corrupt the heap. In addition the call to
quote_and_append_arg could blow past the end of tmp_buffer since tmp_buffer is
only a 1K buffer and the paths involved could be quite long.
{quote}
{{bind_mount}} is {{type-bind,source=}}, {{target=}}, and {{readonly}} and two
{{%s}}. The allocated length is 38 char instead of the required 34 in the
alloc_and_clear_memory. With sprintf appending null terminator, it will end up
with 35 characters. The required length for {{empty_dir}} and {{mount_target}}
are already factored in the calculation. Hence, the proper size have already
been accounted. Let me know if I am missing the point. Thanks
> Add support for a volume blacklist for docker containers
> --------------------------------------------------------
>
> Key: YARN-7197
> URL: https://issues.apache.org/jira/browse/YARN-7197
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Shane Kumpf
> Assignee: Eric Yang
> Attachments: YARN-7197.001.patch, YARN-7197.002.patch,
> YARN-7197.003.patch, YARN-7197.004.patch, YARN-7197.005.patch
>
>
> Docker supports bind mounting host directories into containers. Work is
> underway to allow admins to configure a whilelist of volume mounts. While
> this is a much needed and useful feature, it opens the door for
> misconfiguration that may lead to users being able to compromise or crash the
> system.
> One example would be allowing users to mount /run from a host running
> systemd, and then running systemd in that container, rendering the host
> mostly unusable.
> This issue is to add support for a default blacklist. The default blacklist
> would be where we put files and directories that if mounted into a container,
> are likely to have negative consequences. Users are encouraged not to remove
> items from the default blacklist, but may do so if necessary.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]