[Non-DoD Source] 答复: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-18 Thread yangjihong
>On 12/15/2017 08:56 AM, Stephen Smalley wrote:
>> On Fri, 2017-12-15 at 03:09 +, yangjihong wrote:
>>> On 12/15/2017 10:31 PM, yangjihong wrote:
 On 12/14/2017 12:42 PM, Casey Schaufler wrote:
> On 12/14/2017 9:15 AM, Stephen Smalley wrote:
>> On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:
>>> On 12/14/2017 8:42 AM, Stephen Smalley wrote:
 On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
> On 12/13/2017 7:18 AM, Stephen Smalley wrote:
>> On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
>>> Hello,
>>>
>>> I am doing stressing testing on 3.10 kernel(centos 7.4), to 
>>> constantly starting numbers of docker ontainers with selinux 
>>> enabled, and after about 2 days, the kernel softlockup panic:
>>>  []
>>> sched_show_task+0xb8/0x120
>>>[] show_lock_info+0x20f/0x3a0
>>>[] watchdog_timer_fn+0x1da/0x2f0
>>>[] ?
>>> watchdog_enable_all_cpus.part.4+0x40/0x40
>>>[]
>>> __hrtimer_run_queues+0xd2/0x260
>>>[] hrtimer_interrupt+0xb0/0x1e0
>>>[]
>>> local_apic_timer_interrupt+0x37/0x60
>>>[]
>>> smp_apic_timer_interrupt+0x50/0x140
>>>[] apic_timer_interrupt+0x6d/0x80
>>>  [] ?
>>> sidtab_context_to_sid+0xb3/0x480
>>>[] ?
>>> sidtab_context_to_sid+0x110/0x480
>>>[] ?
>>> mls_setup_user_range+0x145/0x250
>>>[]
>>> security_get_user_sids+0x3f7/0x550
>>>[] sel_write_user+0x12b/0x210
>>>[] ? sel_write_member+0x200/0x200
>>>[]
>>> selinux_transaction_write+0x48/0x80
>>>[] vfs_write+0xbd/0x1e0
>>>[] SyS_write+0x7f/0xe0
>>>[] system_call_fastpath+0x16/0x1b
>>>
>>> My opinion:
>>> when the docker container starts, it would mount overlay 
>>> filesystem with different selinux context, mount point such 
>>> as:
>>> overlay on
>>> /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952ea
>>> e4f6cb0f
>>> 07b4
>>> bc32
>>> 6cb07495ca08fc9ddb66/merged type overlay 
>>> (rw,relatime,context="system_u:object_r:svirt_sandbox
>>> _file_t:
>>> s0:c
>>> 414,
>>> c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV
>>> 5CFWLADP
>>> ARHH
>>> WY7:
>>> /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS
>>> :/var/li
>>> b/do
>>> cker
>>> /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/
>>> lib/dock
>>> er/o
>>> verl
>>> ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07
>>> 495ca08f
>>> c9dd
>>> b66/
>>> diff,workdir=/var/lib/docker/overlay2/be3ef517730d92f
>>> c4530e0e
>>> 952e
>>> ae4f
>>> 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
>>> shm on
>>> /var/lib/docker/containers/9fd65e177d2132011d7b422755
>>> 793449c9
>>> 1327
>>> ca57
>>> 7b8f5d9d6a4adf218d4876/shm type tmpfs 
>>> (rw,nosuid,nodev,noexec,relatime,context="system_u:ob
>>> ject_r:s
>>> virt
>>> _san
>>> dbox_file_t:s0:c414,c873",size=65536k)
>>> overlay on
>>> /var/lib/docker/overlay2/38d1544d080145c7d76150530d02
>>> 55991dfb
>>> 7258
>>> cbca
>>> 14ff6d165b94353eefab/merged type overlay 
>>> (rw,relatime,context="system_u:object_r:svirt_sandbox
>>> _file_t:
>>> s0:c
>>> 431,
>>> c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLF
>>> B7ANVRHP
>>> AVRC
>>> RSS:
>>> /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI
>>> ,upperdi
>>> r=/v
>>> ar/l
>>> ib/docker/overlay2/38d1544d080145c7d76150530d0255991d
>>> fb7258cb
>>> ca14
>>> ff6d
>>> 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/
>>> 38d1544d
>>> 0801
>>> 45c7
>>> d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work
>>> )
>>> shm on
>>> /var/lib/docker/containers/662e7f798fc08b09eae0f0f944
>>> 537a4bce
>>> dc1d
>>> cf05
>>> a65866458523ffd4a71614/shm type tmpfs 
>>> (rw,nosuid,nodev,noexec,relatime,context="system_u:ob
>>> ject_r:s
>>> virt
>>> _san
>>> dbox_file_t:s0:c431,c651",size=65536k)
>>>
>>> sidtab_search_context check the context whether is in the 
>>> sidtab list, If not found, a new node is generated and insert 
>>> into the list, As the number of containers is increasing,  
>>> context nodes are also more and more, we tested the final 
>>> number of nodes reached

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-15 Thread Daniel Walsh

On 12/15/2017 08:56 AM, Stephen Smalley wrote:

On Fri, 2017-12-15 at 03:09 +, yangjihong wrote:

On 12/15/2017 10:31 PM, yangjihong wrote:

On 12/14/2017 12:42 PM, Casey Schaufler wrote:

On 12/14/2017 9:15 AM, Stephen Smalley wrote:

On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:

On 12/14/2017 8:42 AM, Stephen Smalley wrote:

On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:

On 12/13/2017 7:18 AM, Stephen Smalley wrote:

On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:

Hello,

I am doing stressing testing on 3.10 kernel(centos
7.4), to
constantly starting numbers of docker ontainers with
selinux
enabled, and after about 2 days, the kernel
softlockup panic:
 []
sched_show_task+0xb8/0x120
   [] show_lock_info+0x20f/0x3a0
   [] watchdog_timer_fn+0x1da/0x2f0
   [] ?
watchdog_enable_all_cpus.part.4+0x40/0x40
   []
__hrtimer_run_queues+0xd2/0x260
   [] hrtimer_interrupt+0xb0/0x1e0
   []
local_apic_timer_interrupt+0x37/0x60
   []
smp_apic_timer_interrupt+0x50/0x140
   [] apic_timer_interrupt+0x6d/0x80
 [] ?
sidtab_context_to_sid+0xb3/0x480
   [] ?
sidtab_context_to_sid+0x110/0x480
   [] ?
mls_setup_user_range+0x145/0x250
   []
security_get_user_sids+0x3f7/0x550
   [] sel_write_user+0x12b/0x210
   [] ? sel_write_member+0x200/0x200
   []
selinux_transaction_write+0x48/0x80
   [] vfs_write+0xbd/0x1e0
   [] SyS_write+0x7f/0xe0
   [] system_call_fastpath+0x16/0x1b

My opinion:
when the docker container starts, it would mount
overlay
filesystem with different selinux context, mount
point such as:
overlay on
/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952ea
e4f6cb0f
07b4
bc32
6cb07495ca08fc9ddb66/merged type overlay
(rw,relatime,context="system_u:object_r:svirt_sandbox
_file_t:
s0:c
414,
c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV
5CFWLADP
ARHH
WY7:
/var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS
:/var/li
b/do
cker
/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/
lib/dock
er/o
verl
ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07
495ca08f
c9dd
b66/
diff,workdir=/var/lib/docker/overlay2/be3ef517730d92f
c4530e0e
952e
ae4f
6cb0f07b4bc326cb07495ca08fc9ddb66/work)
shm on
/var/lib/docker/containers/9fd65e177d2132011d7b422755
793449c9
1327
ca57
7b8f5d9d6a4adf218d4876/shm type tmpfs
(rw,nosuid,nodev,noexec,relatime,context="system_u:ob
ject_r:s
virt
_san
dbox_file_t:s0:c414,c873",size=65536k)
overlay on
/var/lib/docker/overlay2/38d1544d080145c7d76150530d02
55991dfb
7258
cbca
14ff6d165b94353eefab/merged type overlay
(rw,relatime,context="system_u:object_r:svirt_sandbox
_file_t:
s0:c
431,
c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLF
B7ANVRHP
AVRC
RSS:
/var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI
,upperdi
r=/v
ar/l
ib/docker/overlay2/38d1544d080145c7d76150530d0255991d
fb7258cb
ca14
ff6d
165b94353eefab/diff,workdir=/var/lib/docker/overlay2/
38d1544d
0801
45c7
d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work
)
shm on
/var/lib/docker/containers/662e7f798fc08b09eae0f0f944
537a4bce
dc1d
cf05
a65866458523ffd4a71614/shm type tmpfs
(rw,nosuid,nodev,noexec,relatime,context="system_u:ob
ject_r:s
virt
_san
dbox_file_t:s0:c431,c651",size=65536k)

sidtab_search_context check the context whether is in
the sidtab
list, If not found, a new node is generated and
insert into the
list, As the number of containers is
increasing,  context nodes
are also more and more, we tested the final number of
nodes
reached
300,000 +,
sidtab_context_to_sid runtime needs 100-200ms, which
will lead
to the system softlockup.

Is this a selinux bug? When filesystem umount, why
context node
is not deleted?  I cannot find the relevant function
to delete
the node in sidtab.c

Thanks for reading and looking forward to your reply.

So, does docker just keep allocating a unique category
set for
every new container, never reusing them even if the
container is
destroyed?
That would be a bug in docker IMHO.  Or are you
creating an
unbounded number of containers and never destroying the
older
ones?

You can't reuse the security context. A process in
ContainerA
sends a labeled packet to MachineB. ContainerA goes away
and its
context is recycled in ContainerC. MachineB responds some
time
later, again with a labeled packet. ContainerC gets
information
intended for ContainerA, and uses the information to take
over the
Elbonian government.

Docker isn't using labeled networking (nor is anything else
by
default; it is only enabled if explicitly configured).

If labeled networking weren't an issue we'd have full
security
module stacking by now. Yes, it's an edge case. If you want
to use
labeled NFS or a local filesystem that gets mounted in each
container (don't tell me that nobody would do that) you've
got the
same problem.

Even if someone were to configure labeled networking, Docker is
not
presently relying on that or SELinux network enforcement for
any
security properties, so it really doesn't matter.

True enough. I can imagine a use case, but as you point out, it
would
be a very 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-15 Thread Stephen Smalley
On Fri, 2017-12-15 at 03:09 +, yangjihong wrote:
> On 12/15/2017 10:31 PM, yangjihong wrote:
> > On 12/14/2017 12:42 PM, Casey Schaufler wrote:
> > > On 12/14/2017 9:15 AM, Stephen Smalley wrote:
> > > > On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:
> > > > > On 12/14/2017 8:42 AM, Stephen Smalley wrote:
> > > > > > On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
> > > > > > > On 12/13/2017 7:18 AM, Stephen Smalley wrote:
> > > > > > > > On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > I am doing stressing testing on 3.10 kernel(centos
> > > > > > > > > 7.4), to 
> > > > > > > > > constantly starting numbers of docker ontainers with
> > > > > > > > > selinux 
> > > > > > > > > enabled, and after about 2 days, the kernel
> > > > > > > > > softlockup panic:
> > > > > > > > >     []
> > > > > > > > > sched_show_task+0xb8/0x120
> > > > > > > > >   [] show_lock_info+0x20f/0x3a0
> > > > > > > > >   [] watchdog_timer_fn+0x1da/0x2f0
> > > > > > > > >   [] ?
> > > > > > > > > watchdog_enable_all_cpus.part.4+0x40/0x40
> > > > > > > > >   []
> > > > > > > > > __hrtimer_run_queues+0xd2/0x260
> > > > > > > > >   [] hrtimer_interrupt+0xb0/0x1e0
> > > > > > > > >   []
> > > > > > > > > local_apic_timer_interrupt+0x37/0x60
> > > > > > > > >   []
> > > > > > > > > smp_apic_timer_interrupt+0x50/0x140
> > > > > > > > >   [] apic_timer_interrupt+0x6d/0x80
> > > > > > > > >     [] ?
> > > > > > > > > sidtab_context_to_sid+0xb3/0x480
> > > > > > > > >   [] ?
> > > > > > > > > sidtab_context_to_sid+0x110/0x480
> > > > > > > > >   [] ?
> > > > > > > > > mls_setup_user_range+0x145/0x250
> > > > > > > > >   []
> > > > > > > > > security_get_user_sids+0x3f7/0x550
> > > > > > > > >   [] sel_write_user+0x12b/0x210
> > > > > > > > >   [] ? sel_write_member+0x200/0x200
> > > > > > > > >   []
> > > > > > > > > selinux_transaction_write+0x48/0x80
> > > > > > > > >   [] vfs_write+0xbd/0x1e0
> > > > > > > > >   [] SyS_write+0x7f/0xe0
> > > > > > > > >   [] system_call_fastpath+0x16/0x1b
> > > > > > > > > 
> > > > > > > > > My opinion:
> > > > > > > > > when the docker container starts, it would mount
> > > > > > > > > overlay 
> > > > > > > > > filesystem with different selinux context, mount
> > > > > > > > > point such as:
> > > > > > > > > overlay on
> > > > > > > > > /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952ea
> > > > > > > > > e4f6cb0f
> > > > > > > > > 07b4
> > > > > > > > > bc32
> > > > > > > > > 6cb07495ca08fc9ddb66/merged type overlay
> > > > > > > > > (rw,relatime,context="system_u:object_r:svirt_sandbox
> > > > > > > > > _file_t:
> > > > > > > > > s0:c
> > > > > > > > > 414,
> > > > > > > > > c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV
> > > > > > > > > 5CFWLADP
> > > > > > > > > ARHH
> > > > > > > > > WY7:
> > > > > > > > > /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS
> > > > > > > > > :/var/li
> > > > > > > > > b/do
> > > > > > > > > cker
> > > > > > > > > /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/
> > > > > > > > > lib/dock
> > > > > > > > > er/o
> > > > > > > > > verl
> > > > > > > > > ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07
> > > > > > > > > 495ca08f
> > > > > > > > > c9dd
> > > > > > > > > b66/
> > > > > > > > > diff,workdir=/var/lib/docker/overlay2/be3ef517730d92f
> > > > > > > > > c4530e0e
> > > > > > > > > 952e
> > > > > > > > > ae4f
> > > > > > > > > 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
> > > > > > > > > shm on
> > > > > > > > > /var/lib/docker/containers/9fd65e177d2132011d7b422755
> > > > > > > > > 793449c9
> > > > > > > > > 1327
> > > > > > > > > ca57
> > > > > > > > > 7b8f5d9d6a4adf218d4876/shm type tmpfs 
> > > > > > > > > (rw,nosuid,nodev,noexec,relatime,context="system_u:ob
> > > > > > > > > ject_r:s
> > > > > > > > > virt
> > > > > > > > > _san
> > > > > > > > > dbox_file_t:s0:c414,c873",size=65536k)
> > > > > > > > > overlay on
> > > > > > > > > /var/lib/docker/overlay2/38d1544d080145c7d76150530d02
> > > > > > > > > 55991dfb
> > > > > > > > > 7258
> > > > > > > > > cbca
> > > > > > > > > 14ff6d165b94353eefab/merged type overlay
> > > > > > > > > (rw,relatime,context="system_u:object_r:svirt_sandbox
> > > > > > > > > _file_t:
> > > > > > > > > s0:c
> > > > > > > > > 431,
> > > > > > > > > c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLF
> > > > > > > > > B7ANVRHP
> > > > > > > > > AVRC
> > > > > > > > > RSS:
> > > > > > > > > /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI
> > > > > > > > > ,upperdi
> > > > > > > > > r=/v
> > > > > > > > > ar/l
> > > > > > > > > ib/docker/overlay2/38d1544d080145c7d76150530d0255991d
> > > > > > > > > fb7258cb
> > > > > > > > > ca14
> > > > > > > > > ff6d
> > > > > > > > > 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/
> > > > > > > > > 38d1544d
> > > > > > > > > 0801
> > > > > > > > > 45c7
> > > > > > > > > d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work
> > > > > > > > > )
> > > > > > > > > shm on
> > > > > > 

[Non-DoD Source] 答复: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-15 Thread yangjihong
On 12/15/2017 10:31 PM, yangjihong wrote:
>On 12/14/2017 12:42 PM, Casey Schaufler wrote:
>> On 12/14/2017 9:15 AM, Stephen Smalley wrote:
>>> On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:
 On 12/14/2017 8:42 AM, Stephen Smalley wrote:
> On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
>> On 12/13/2017 7:18 AM, Stephen Smalley wrote:
>>> On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
 Hello,

 I am doing stressing testing on 3.10 kernel(centos 7.4), to 
 constantly starting numbers of docker ontainers with selinux 
 enabled, and after about 2 days, the kernel softlockup panic:
 [] sched_show_task+0xb8/0x120
   [] show_lock_info+0x20f/0x3a0
   [] watchdog_timer_fn+0x1da/0x2f0
   [] ?
 watchdog_enable_all_cpus.part.4+0x40/0x40
   [] __hrtimer_run_queues+0xd2/0x260
   [] hrtimer_interrupt+0xb0/0x1e0
   [] local_apic_timer_interrupt+0x37/0x60
   [] smp_apic_timer_interrupt+0x50/0x140
   [] apic_timer_interrupt+0x6d/0x80
 [] ?
 sidtab_context_to_sid+0xb3/0x480
   [] ? sidtab_context_to_sid+0x110/0x480
   [] ? mls_setup_user_range+0x145/0x250
   [] security_get_user_sids+0x3f7/0x550
   [] sel_write_user+0x12b/0x210
   [] ? sel_write_member+0x200/0x200
   [] selinux_transaction_write+0x48/0x80
   [] vfs_write+0xbd/0x1e0
   [] SyS_write+0x7f/0xe0
   [] system_call_fastpath+0x16/0x1b

 My opinion:
 when the docker container starts, it would mount overlay 
 filesystem with different selinux context, mount point such as:
 overlay on
 /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f
 07b4
 bc32
 6cb07495ca08fc9ddb66/merged type overlay
 (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
 s0:c
 414,
 c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADP
 ARHH
 WY7:
 /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/li
 b/do
 cker
 /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/dock
 er/o
 verl
 ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08f
 c9dd
 b66/
 diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e
 952e
 ae4f
 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
 shm on
 /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c9
 1327
 ca57
 7b8f5d9d6a4adf218d4876/shm type tmpfs 
 (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
 virt
 _san
 dbox_file_t:s0:c414,c873",size=65536k)
 overlay on
 /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb
 7258
 cbca
 14ff6d165b94353eefab/merged type overlay
 (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
 s0:c
 431,
 c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHP
 AVRC
 RSS:
 /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdi
 r=/v
 ar/l
 ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cb
 ca14
 ff6d
 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d
 0801
 45c7
 d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
 shm on
 /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bce
 dc1d
 cf05
 a65866458523ffd4a71614/shm type tmpfs 
 (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
 virt
 _san
 dbox_file_t:s0:c431,c651",size=65536k)

 sidtab_search_context check the context whether is in the sidtab 
 list, If not found, a new node is generated and insert into the 
 list, As the number of containers is increasing,  context nodes 
 are also more and more, we tested the final number of nodes 
 reached
 300,000 +,
 sidtab_context_to_sid runtime needs 100-200ms, which will lead 
 to the system softlockup.

 Is this a selinux bug? When filesystem umount, why context node 
 is not deleted?  I cannot find the relevant function to delete 
 the node in sidtab.c

 Thanks for reading and looking forward to your reply.
>>> So, does docker just keep allocating a unique category set for 
>>> every new container, never reusing them even if the container is 
>>> destroyed?
>>> That would be a bug in docker IMHO.  Or are you creating an 
>>> unbounded number of containers and never destroying the older 
>>> ones?
>> You can't reuse the security context. A process in ContainerA 
>> sends a 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Daniel Walsh

On 12/14/2017 12:42 PM, Casey Schaufler wrote:

On 12/14/2017 9:15 AM, Stephen Smalley wrote:

On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:

On 12/14/2017 8:42 AM, Stephen Smalley wrote:

On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:

On 12/13/2017 7:18 AM, Stephen Smalley wrote:

On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:

Hello,

I am doing stressing testing on 3.10 kernel(centos 7.4), to
constantly starting numbers of docker ontainers with selinux
enabled,
and after about 2 days, the kernel softlockup panic:
[] sched_show_task+0xb8/0x120
  [] show_lock_info+0x20f/0x3a0
  [] watchdog_timer_fn+0x1da/0x2f0
  [] ?
watchdog_enable_all_cpus.part.4+0x40/0x40
  [] __hrtimer_run_queues+0xd2/0x260
  [] hrtimer_interrupt+0xb0/0x1e0
  [] local_apic_timer_interrupt+0x37/0x60
  [] smp_apic_timer_interrupt+0x50/0x140
  [] apic_timer_interrupt+0x6d/0x80
[] ?
sidtab_context_to_sid+0xb3/0x480
  [] ? sidtab_context_to_sid+0x110/0x480
  [] ? mls_setup_user_range+0x145/0x250
  [] security_get_user_sids+0x3f7/0x550
  [] sel_write_user+0x12b/0x210
  [] ? sel_write_member+0x200/0x200
  [] selinux_transaction_write+0x48/0x80
  [] vfs_write+0xbd/0x1e0
  [] SyS_write+0x7f/0xe0
  [] system_call_fastpath+0x16/0x1b

My opinion:
when the docker container starts, it would mount overlay
filesystem
with different selinux context, mount point such as:
overlay on
/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f
07b4
bc32
6cb07495ca08fc9ddb66/merged type overlay
(rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
s0:c
414,
c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADP
ARHH
WY7:
/var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/li
b/do
cker
/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/dock
er/o
verl
ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08f
c9dd
b66/
diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e
952e
ae4f
6cb0f07b4bc326cb07495ca08fc9ddb66/work)
shm on
/var/lib/docker/containers/9fd65e177d2132011d7b422755793449c9
1327
ca57
7b8f5d9d6a4adf218d4876/shm type tmpfs
(rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
virt
_san
dbox_file_t:s0:c414,c873",size=65536k)
overlay on
/var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb
7258
cbca
14ff6d165b94353eefab/merged type overlay
(rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
s0:c
431,
c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHP
AVRC
RSS:
/var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdi
r=/v
ar/l
ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cb
ca14
ff6d
165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d
0801
45c7
d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
shm on
/var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bce
dc1d
cf05
a65866458523ffd4a71614/shm type tmpfs
(rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
virt
_san
dbox_file_t:s0:c431,c651",size=65536k)

sidtab_search_context check the context whether is in the
sidtab
list, If not found, a new node is generated and insert into
the
list,
As the number of containers is increasing,  context nodes are
also
more and more, we tested the final number of nodes reached
300,000 +,
sidtab_context_to_sid runtime needs 100-200ms, which will
lead to
the
system softlockup.

Is this a selinux bug? When filesystem umount, why context
node
is
not deleted?  I cannot find the relevant function to delete
the
node
in sidtab.c

Thanks for reading and looking forward to your reply.

So, does docker just keep allocating a unique category set for
every
new container, never reusing them even if the container is
destroyed?
That would be a bug in docker IMHO.  Or are you creating an
unbounded
number of containers and never destroying the older ones?

You can't reuse the security context. A process in ContainerA
sends
a labeled packet to MachineB. ContainerA goes away and its
context
is recycled in ContainerC. MachineB responds some time later,
again
with a labeled packet. ContainerC gets information intended for
ContainerA, and uses the information to take over the Elbonian
government.

Docker isn't using labeled networking (nor is anything else by
default;
it is only enabled if explicitly configured).

If labeled networking weren't an issue we'd have full security
module stacking by now. Yes, it's an edge case. If you want to
use labeled NFS or a local filesystem that gets mounted in each
container (don't tell me that nobody would do that) you've got
the same problem.

Even if someone were to configure labeled networking, Docker is not
presently relying on that or SELinux network enforcement for any
security properties, so it really doesn't matter.

True enough. I can imagine a use case, but as you point out, it
would be a very complex configuration and coordination exercise
using SELinux.


And if they wanted
to do that, they'd have to coordinate category assignments across all
systems involved, for which no 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Casey Schaufler
On 12/14/2017 9:15 AM, Stephen Smalley wrote:
> On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:
>> On 12/14/2017 8:42 AM, Stephen Smalley wrote:
>>> On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
 On 12/13/2017 7:18 AM, Stephen Smalley wrote:
> On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
>> Hello, 
>>
>> I am doing stressing testing on 3.10 kernel(centos 7.4), to
>> constantly starting numbers of docker ontainers with selinux
>> enabled,
>> and after about 2 days, the kernel softlockup panic:
>>    [] sched_show_task+0xb8/0x120
>>  [] show_lock_info+0x20f/0x3a0
>>  [] watchdog_timer_fn+0x1da/0x2f0
>>  [] ?
>> watchdog_enable_all_cpus.part.4+0x40/0x40
>>  [] __hrtimer_run_queues+0xd2/0x260
>>  [] hrtimer_interrupt+0xb0/0x1e0
>>  [] local_apic_timer_interrupt+0x37/0x60
>>  [] smp_apic_timer_interrupt+0x50/0x140
>>  [] apic_timer_interrupt+0x6d/0x80
>>    [] ?
>> sidtab_context_to_sid+0xb3/0x480
>>  [] ? sidtab_context_to_sid+0x110/0x480
>>  [] ? mls_setup_user_range+0x145/0x250
>>  [] security_get_user_sids+0x3f7/0x550
>>  [] sel_write_user+0x12b/0x210
>>  [] ? sel_write_member+0x200/0x200
>>  [] selinux_transaction_write+0x48/0x80
>>  [] vfs_write+0xbd/0x1e0
>>  [] SyS_write+0x7f/0xe0
>>  [] system_call_fastpath+0x16/0x1b
>>
>> My opinion:
>> when the docker container starts, it would mount overlay
>> filesystem
>> with different selinux context, mount point such as: 
>> overlay on
>> /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f
>> 07b4
>> bc32
>> 6cb07495ca08fc9ddb66/merged type overlay
>> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
>> s0:c
>> 414,
>> c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADP
>> ARHH
>> WY7:
>> /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/li
>> b/do
>> cker
>> /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/dock
>> er/o
>> verl
>> ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08f
>> c9dd
>> b66/
>> diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e
>> 952e
>> ae4f
>> 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
>> shm on
>> /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c9
>> 1327
>> ca57
>> 7b8f5d9d6a4adf218d4876/shm type tmpfs
>> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
>> virt
>> _san
>> dbox_file_t:s0:c414,c873",size=65536k)
>> overlay on
>> /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb
>> 7258
>> cbca
>> 14ff6d165b94353eefab/merged type overlay
>> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
>> s0:c
>> 431,
>> c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHP
>> AVRC
>> RSS:
>> /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdi
>> r=/v
>> ar/l
>> ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cb
>> ca14
>> ff6d
>> 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d
>> 0801
>> 45c7
>> d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
>> shm on
>> /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bce
>> dc1d
>> cf05
>> a65866458523ffd4a71614/shm type tmpfs
>> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
>> virt
>> _san
>> dbox_file_t:s0:c431,c651",size=65536k)
>>
>> sidtab_search_context check the context whether is in the
>> sidtab
>> list, If not found, a new node is generated and insert into
>> the
>> list,
>> As the number of containers is increasing,  context nodes are
>> also
>> more and more, we tested the final number of nodes reached
>> 300,000 +,
>> sidtab_context_to_sid runtime needs 100-200ms, which will
>> lead to
>> the
>> system softlockup.
>>
>> Is this a selinux bug? When filesystem umount, why context
>> node
>> is
>> not deleted?  I cannot find the relevant function to delete
>> the
>> node
>> in sidtab.c
>>
>> Thanks for reading and looking forward to your reply.
> So, does docker just keep allocating a unique category set for
> every
> new container, never reusing them even if the container is
> destroyed? 
> That would be a bug in docker IMHO.  Or are you creating an
> unbounded
> number of containers and never destroying the older ones?
 You can't reuse the security context. A process in ContainerA
 sends
 a labeled packet to MachineB. ContainerA goes away and its
 context
 is recycled in ContainerC. MachineB responds some time later,
 again
 with a labeled packet. ContainerC gets information intended for
 ContainerA, and uses the information to take over the Elbonian

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Stephen Smalley
On Thu, 2017-12-14 at 09:00 -0800, Casey Schaufler wrote:
> On 12/14/2017 8:42 AM, Stephen Smalley wrote:
> > On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
> > > On 12/13/2017 7:18 AM, Stephen Smalley wrote:
> > > > On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
> > > > > Hello, 
> > > > > 
> > > > > I am doing stressing testing on 3.10 kernel(centos 7.4), to
> > > > > constantly starting numbers of docker ontainers with selinux
> > > > > enabled,
> > > > > and after about 2 days, the kernel softlockup panic:
> > > > >    [] sched_show_task+0xb8/0x120
> > > > >  [] show_lock_info+0x20f/0x3a0
> > > > >  [] watchdog_timer_fn+0x1da/0x2f0
> > > > >  [] ?
> > > > > watchdog_enable_all_cpus.part.4+0x40/0x40
> > > > >  [] __hrtimer_run_queues+0xd2/0x260
> > > > >  [] hrtimer_interrupt+0xb0/0x1e0
> > > > >  [] local_apic_timer_interrupt+0x37/0x60
> > > > >  [] smp_apic_timer_interrupt+0x50/0x140
> > > > >  [] apic_timer_interrupt+0x6d/0x80
> > > > >    [] ?
> > > > > sidtab_context_to_sid+0xb3/0x480
> > > > >  [] ? sidtab_context_to_sid+0x110/0x480
> > > > >  [] ? mls_setup_user_range+0x145/0x250
> > > > >  [] security_get_user_sids+0x3f7/0x550
> > > > >  [] sel_write_user+0x12b/0x210
> > > > >  [] ? sel_write_member+0x200/0x200
> > > > >  [] selinux_transaction_write+0x48/0x80
> > > > >  [] vfs_write+0xbd/0x1e0
> > > > >  [] SyS_write+0x7f/0xe0
> > > > >  [] system_call_fastpath+0x16/0x1b
> > > > > 
> > > > > My opinion:
> > > > > when the docker container starts, it would mount overlay
> > > > > filesystem
> > > > > with different selinux context, mount point such as: 
> > > > > overlay on
> > > > > /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f
> > > > > 07b4
> > > > > bc32
> > > > > 6cb07495ca08fc9ddb66/merged type overlay
> > > > > (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
> > > > > s0:c
> > > > > 414,
> > > > > c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADP
> > > > > ARHH
> > > > > WY7:
> > > > > /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/li
> > > > > b/do
> > > > > cker
> > > > > /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/dock
> > > > > er/o
> > > > > verl
> > > > > ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08f
> > > > > c9dd
> > > > > b66/
> > > > > diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e
> > > > > 952e
> > > > > ae4f
> > > > > 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
> > > > > shm on
> > > > > /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c9
> > > > > 1327
> > > > > ca57
> > > > > 7b8f5d9d6a4adf218d4876/shm type tmpfs
> > > > > (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
> > > > > virt
> > > > > _san
> > > > > dbox_file_t:s0:c414,c873",size=65536k)
> > > > > overlay on
> > > > > /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb
> > > > > 7258
> > > > > cbca
> > > > > 14ff6d165b94353eefab/merged type overlay
> > > > > (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:
> > > > > s0:c
> > > > > 431,
> > > > > c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHP
> > > > > AVRC
> > > > > RSS:
> > > > > /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdi
> > > > > r=/v
> > > > > ar/l
> > > > > ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cb
> > > > > ca14
> > > > > ff6d
> > > > > 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d
> > > > > 0801
> > > > > 45c7
> > > > > d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
> > > > > shm on
> > > > > /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bce
> > > > > dc1d
> > > > > cf05
> > > > > a65866458523ffd4a71614/shm type tmpfs
> > > > > (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:s
> > > > > virt
> > > > > _san
> > > > > dbox_file_t:s0:c431,c651",size=65536k)
> > > > > 
> > > > > sidtab_search_context check the context whether is in the
> > > > > sidtab
> > > > > list, If not found, a new node is generated and insert into
> > > > > the
> > > > > list,
> > > > > As the number of containers is increasing,  context nodes are
> > > > > also
> > > > > more and more, we tested the final number of nodes reached
> > > > > 300,000 +,
> > > > > sidtab_context_to_sid runtime needs 100-200ms, which will
> > > > > lead to
> > > > > the
> > > > > system softlockup.
> > > > > 
> > > > > Is this a selinux bug? When filesystem umount, why context
> > > > > node
> > > > > is
> > > > > not deleted?  I cannot find the relevant function to delete
> > > > > the
> > > > > node
> > > > > in sidtab.c
> > > > > 
> > > > > Thanks for reading and looking forward to your reply.
> > > > 
> > > > So, does docker just keep allocating a unique category set for
> > > > every
> > > > new container, never reusing them even if the container is
> > > > destroyed? 
> > > > That would be a bug in docker IMHO.  Or are you creating an
> > > > unbounded
> > > > number of containers and never destroying the older ones?
> > > 
> > > You can't reuse 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Casey Schaufler
On 12/14/2017 8:42 AM, Stephen Smalley wrote:
> On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
>> On 12/13/2017 7:18 AM, Stephen Smalley wrote:
>>> On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
 Hello, 

 I am doing stressing testing on 3.10 kernel(centos 7.4), to
 constantly starting numbers of docker ontainers with selinux
 enabled,
 and after about 2 days, the kernel softlockup panic:
    [] sched_show_task+0xb8/0x120
  [] show_lock_info+0x20f/0x3a0
  [] watchdog_timer_fn+0x1da/0x2f0
  [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
  [] __hrtimer_run_queues+0xd2/0x260
  [] hrtimer_interrupt+0xb0/0x1e0
  [] local_apic_timer_interrupt+0x37/0x60
  [] smp_apic_timer_interrupt+0x50/0x140
  [] apic_timer_interrupt+0x6d/0x80
    [] ? sidtab_context_to_sid+0xb3/0x480
  [] ? sidtab_context_to_sid+0x110/0x480
  [] ? mls_setup_user_range+0x145/0x250
  [] security_get_user_sids+0x3f7/0x550
  [] sel_write_user+0x12b/0x210
  [] ? sel_write_member+0x200/0x200
  [] selinux_transaction_write+0x48/0x80
  [] vfs_write+0xbd/0x1e0
  [] SyS_write+0x7f/0xe0
  [] system_call_fastpath+0x16/0x1b

 My opinion:
 when the docker container starts, it would mount overlay
 filesystem
 with different selinux context, mount point such as: 
 overlay on
 /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4
 bc32
 6cb07495ca08fc9ddb66/merged type overlay
 (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c
 414,
 c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHH
 WY7:
 /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/do
 cker
 /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/o
 verl
 ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9dd
 b66/
 diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952e
 ae4f
 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
 shm on
 /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327
 ca57
 7b8f5d9d6a4adf218d4876/shm type tmpfs
 (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt
 _san
 dbox_file_t:s0:c414,c873",size=65536k)
 overlay on
 /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258
 cbca
 14ff6d165b94353eefab/merged type overlay
 (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c
 431,
 c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRC
 RSS:
 /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/v
 ar/l
 ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14
 ff6d
 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d0801
 45c7
 d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
 shm on
 /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1d
 cf05
 a65866458523ffd4a71614/shm type tmpfs
 (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt
 _san
 dbox_file_t:s0:c431,c651",size=65536k)

 sidtab_search_context check the context whether is in the sidtab
 list, If not found, a new node is generated and insert into the
 list,
 As the number of containers is increasing,  context nodes are
 also
 more and more, we tested the final number of nodes reached
 300,000 +,
 sidtab_context_to_sid runtime needs 100-200ms, which will lead to
 the
 system softlockup.

 Is this a selinux bug? When filesystem umount, why context node
 is
 not deleted?  I cannot find the relevant function to delete the
 node
 in sidtab.c

 Thanks for reading and looking forward to your reply.
>>> So, does docker just keep allocating a unique category set for
>>> every
>>> new container, never reusing them even if the container is
>>> destroyed? 
>>> That would be a bug in docker IMHO.  Or are you creating an
>>> unbounded
>>> number of containers and never destroying the older ones?
>> You can't reuse the security context. A process in ContainerA sends
>> a labeled packet to MachineB. ContainerA goes away and its context
>> is recycled in ContainerC. MachineB responds some time later, again
>> with a labeled packet. ContainerC gets information intended for
>> ContainerA, and uses the information to take over the Elbonian
>> government.
> Docker isn't using labeled networking (nor is anything else by default;
> it is only enabled if explicitly configured).

If labeled networking weren't an issue we'd have full security
module stacking by now. Yes, it's an edge case. If you want to
use labeled NFS or a local filesystem that gets mounted in each
container (don't tell me that nobody would do that) you've got
the same problem.


>>> On the selinux userspace side, we'd also like to eliminate the use
>>> of
>>> /sys/fs/selinux/user 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Stephen Smalley
On Thu, 2017-12-14 at 08:18 -0800, Casey Schaufler wrote:
> On 12/13/2017 7:18 AM, Stephen Smalley wrote:
> > On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
> > > Hello, 
> > > 
> > > I am doing stressing testing on 3.10 kernel(centos 7.4), to
> > > constantly starting numbers of docker ontainers with selinux
> > > enabled,
> > > and after about 2 days, the kernel softlockup panic:
> > >    [] sched_show_task+0xb8/0x120
> > >  [] show_lock_info+0x20f/0x3a0
> > >  [] watchdog_timer_fn+0x1da/0x2f0
> > >  [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
> > >  [] __hrtimer_run_queues+0xd2/0x260
> > >  [] hrtimer_interrupt+0xb0/0x1e0
> > >  [] local_apic_timer_interrupt+0x37/0x60
> > >  [] smp_apic_timer_interrupt+0x50/0x140
> > >  [] apic_timer_interrupt+0x6d/0x80
> > >    [] ? sidtab_context_to_sid+0xb3/0x480
> > >  [] ? sidtab_context_to_sid+0x110/0x480
> > >  [] ? mls_setup_user_range+0x145/0x250
> > >  [] security_get_user_sids+0x3f7/0x550
> > >  [] sel_write_user+0x12b/0x210
> > >  [] ? sel_write_member+0x200/0x200
> > >  [] selinux_transaction_write+0x48/0x80
> > >  [] vfs_write+0xbd/0x1e0
> > >  [] SyS_write+0x7f/0xe0
> > >  [] system_call_fastpath+0x16/0x1b
> > > 
> > > My opinion:
> > > when the docker container starts, it would mount overlay
> > > filesystem
> > > with different selinux context, mount point such as: 
> > > overlay on
> > > /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4
> > > bc32
> > > 6cb07495ca08fc9ddb66/merged type overlay
> > > (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c
> > > 414,
> > > c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHH
> > > WY7:
> > > /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/do
> > > cker
> > > /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/o
> > > verl
> > > ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9dd
> > > b66/
> > > diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952e
> > > ae4f
> > > 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
> > > shm on
> > > /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327
> > > ca57
> > > 7b8f5d9d6a4adf218d4876/shm type tmpfs
> > > (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt
> > > _san
> > > dbox_file_t:s0:c414,c873",size=65536k)
> > > overlay on
> > > /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258
> > > cbca
> > > 14ff6d165b94353eefab/merged type overlay
> > > (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c
> > > 431,
> > > c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRC
> > > RSS:
> > > /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/v
> > > ar/l
> > > ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14
> > > ff6d
> > > 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d0801
> > > 45c7
> > > d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
> > > shm on
> > > /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1d
> > > cf05
> > > a65866458523ffd4a71614/shm type tmpfs
> > > (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt
> > > _san
> > > dbox_file_t:s0:c431,c651",size=65536k)
> > > 
> > > sidtab_search_context check the context whether is in the sidtab
> > > list, If not found, a new node is generated and insert into the
> > > list,
> > > As the number of containers is increasing,  context nodes are
> > > also
> > > more and more, we tested the final number of nodes reached
> > > 300,000 +,
> > > sidtab_context_to_sid runtime needs 100-200ms, which will lead to
> > > the
> > > system softlockup.
> > > 
> > > Is this a selinux bug? When filesystem umount, why context node
> > > is
> > > not deleted?  I cannot find the relevant function to delete the
> > > node
> > > in sidtab.c
> > > 
> > > Thanks for reading and looking forward to your reply.
> > 
> > So, does docker just keep allocating a unique category set for
> > every
> > new container, never reusing them even if the container is
> > destroyed? 
> > That would be a bug in docker IMHO.  Or are you creating an
> > unbounded
> > number of containers and never destroying the older ones?
> 
> You can't reuse the security context. A process in ContainerA sends
> a labeled packet to MachineB. ContainerA goes away and its context
> is recycled in ContainerC. MachineB responds some time later, again
> with a labeled packet. ContainerC gets information intended for
> ContainerA, and uses the information to take over the Elbonian
> government.

Docker isn't using labeled networking (nor is anything else by default;
it is only enabled if explicitly configured).

> 
> > On the selinux userspace side, we'd also like to eliminate the use
> > of
> > /sys/fs/selinux/user (sel_write_user -> security_get_user_sids)
> > entirely, which is what triggered this for you.
> > 
> > We cannot currently delete a sidtab node because we have no way of
> > knowing if there are any lingering references to the SID.  

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Casey Schaufler
On 12/13/2017 7:18 AM, Stephen Smalley wrote:
> On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
>> Hello, 
>>
>> I am doing stressing testing on 3.10 kernel(centos 7.4), to
>> constantly starting numbers of docker ontainers with selinux enabled,
>> and after about 2 days, the kernel softlockup panic:
>>    [] sched_show_task+0xb8/0x120
>>  [] show_lock_info+0x20f/0x3a0
>>  [] watchdog_timer_fn+0x1da/0x2f0
>>  [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
>>  [] __hrtimer_run_queues+0xd2/0x260
>>  [] hrtimer_interrupt+0xb0/0x1e0
>>  [] local_apic_timer_interrupt+0x37/0x60
>>  [] smp_apic_timer_interrupt+0x50/0x140
>>  [] apic_timer_interrupt+0x6d/0x80
>>    [] ? sidtab_context_to_sid+0xb3/0x480
>>  [] ? sidtab_context_to_sid+0x110/0x480
>>  [] ? mls_setup_user_range+0x145/0x250
>>  [] security_get_user_sids+0x3f7/0x550
>>  [] sel_write_user+0x12b/0x210
>>  [] ? sel_write_member+0x200/0x200
>>  [] selinux_transaction_write+0x48/0x80
>>  [] vfs_write+0xbd/0x1e0
>>  [] SyS_write+0x7f/0xe0
>>  [] system_call_fastpath+0x16/0x1b
>>
>> My opinion:
>> when the docker container starts, it would mount overlay filesystem
>> with different selinux context, mount point such as: 
>> overlay on
>> /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc32
>> 6cb07495ca08fc9ddb66/merged type overlay
>> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c414,
>> c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHHWY7:
>> /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/docker
>> /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/overl
>> ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/
>> diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f
>> 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
>> shm on
>> /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327ca57
>> 7b8f5d9d6a4adf218d4876/shm type tmpfs
>> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
>> dbox_file_t:s0:c414,c873",size=65536k)
>> overlay on
>> /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca
>> 14ff6d165b94353eefab/merged type overlay
>> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c431,
>> c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRCRSS:
>> /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/l
>> ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d
>> 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d080145c7
>> d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
>> shm on
>> /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1dcf05
>> a65866458523ffd4a71614/shm type tmpfs
>> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
>> dbox_file_t:s0:c431,c651",size=65536k)
>>
>> sidtab_search_context check the context whether is in the sidtab
>> list, If not found, a new node is generated and insert into the list,
>> As the number of containers is increasing,  context nodes are also
>> more and more, we tested the final number of nodes reached 300,000 +,
>> sidtab_context_to_sid runtime needs 100-200ms, which will lead to the
>> system softlockup.
>>
>> Is this a selinux bug? When filesystem umount, why context node is
>> not deleted?  I cannot find the relevant function to delete the node
>> in sidtab.c
>>
>> Thanks for reading and looking forward to your reply.
> So, does docker just keep allocating a unique category set for every
> new container, never reusing them even if the container is destroyed? 
> That would be a bug in docker IMHO.  Or are you creating an unbounded
> number of containers and never destroying the older ones?

You can't reuse the security context. A process in ContainerA sends
a labeled packet to MachineB. ContainerA goes away and its context
is recycled in ContainerC. MachineB responds some time later, again
with a labeled packet. ContainerC gets information intended for
ContainerA, and uses the information to take over the Elbonian
government.

> On the selinux userspace side, we'd also like to eliminate the use of
> /sys/fs/selinux/user (sel_write_user -> security_get_user_sids)
> entirely, which is what triggered this for you.
>
> We cannot currently delete a sidtab node because we have no way of
> knowing if there are any lingering references to the SID.  Fixing that
> would require reference-counted SIDs, which goes beyond just SELinux
> since SIDs/secids are returned by LSM hooks and cached in other kernel
> data structures.

You could delete a sidtab node. The code already deals with unfindable
SIDs. The issue is that eventually you run out of SIDs. Then you are
forced to recycle SIDs, which leads to the overthrow of the Elbonian
government.

> sidtab_search_context() could no doubt be optimized for the negative
> case; there was an earlier optimization for the positive case by adding
> a cache to sidtab_context_to_sid() prior to calling it.  It's a reverse
> 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread Stephen Smalley
On Thu, 2017-12-14 at 03:19 +, yangjihong wrote:
> Hello,
> 
> >  So, does docker just keep allocating a unique category set for
> > every new container, never reusing them even if the container is
> > destroyed? 
> >  That would be a bug in docker IMHO.  Or are you creating an
> > unbounded number of containers and never destroying the older ones?
> 
> I creat a containers, then destroy it,  and create second one,
> destroy it...
> When docker created, it will mount overlay fs, because every
> containers has different selinux context, so a new sidtab node is
> generated and insert into the sidtab list  
> When docker destroyed, it will umount overlay fs, but umount
> operation does not seem relevant to "delete the node" hooks function,
> resulting in longer and longer sidtab list
> I think when umount, its selinux context will never reuse, so sidtab
> node is useless, it is best to delete i

The "selinux context will never reuse" is IMHO a bug in docker; if you
truly destroy the container (i.e. don't just stop its execution, but
delete it entirely), then the context should be reusable.

> >  sidtab_search_context() could no doubt be optimized for the
> > negative case; there was an earlier optimization for the positive
> > case by adding a cache to sidtab_context_to_sid() prior to calling
> > it.  It's a reverse lookup in the sidtab.
> 
> I think add cache may be not very userful, because every containers
> has different selinux context, so when one docker created, it will
> search the whole sidtab list, until compare the last node, When a new
> node arrives, it is always necessary to compare all the nodes first,
> and then insert. 
> All as long as the list does not delete the node, list will always
> increase, and search time will longer and longer, eventually leading
> to softlockup
> 
> 
> Is there any solution to this problem?

On the kernel side, we could certainly implement a reverse lookup hash
table.  And there could be a faster way to quickly check whether a
given category set has ever been used if we wanted to specialize in
that manner.  But that won't fix the fact that docker is allocating
unbounded security contexts.



[Non-DoD Source] 答复: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-14 Thread yangjihong
Hello,

>  So, does docker just keep allocating a unique category set for every new 
> container, never reusing them even if the container is destroyed? 
>  That would be a bug in docker IMHO.  Or are you creating an unbounded number 
> of containers and never destroying the older ones?
I creat a containers, then destroy it,  and create second one, destroy it...
When docker created, it will mount overlay fs, because every containers has 
different selinux context, so a new sidtab node is generated and insert into 
the sidtab list  
When docker destroyed, it will umount overlay fs, but umount operation does not 
seem relevant to "delete the node" hooks function, resulting in longer and 
longer sidtab list
I think when umount, its selinux context will never reuse, so sidtab node is 
useless, it is best to delete it


>  sidtab_search_context() could no doubt be optimized for the negative case; 
> there was an earlier optimization for the positive case by adding a cache to 
> sidtab_context_to_sid() prior to calling it.  It's a reverse lookup in the 
> sidtab.
I think add cache may be not very userful, because every containers has 
different selinux context, so when one docker created, it will search the whole 
sidtab list, until compare the last node, When a new node arrives, it is always 
necessary to compare all the nodes first, and then insert. 
All as long as the list does not delete the node, list will always increase, 
and search time will longer and longer, eventually leading to softlockup


Is there any solution to this problem?
Thanks for reading and looking forward to your reply.

Best wishes!

-邮件原件-
发件人: Stephen Smalley [mailto:s...@tycho.nsa.gov] 
发送时间: 2017年12月13日 23:18
收件人: yangjihong ; p...@paul-moore.com; 
epa...@parisplace.org; selinux@tycho.nsa.gov; Daniel J Walsh 
; Lukas Vrabec ; Petr Lautrbach 

抄送: linux-ker...@vger.kernel.org
主题: Re: [BUG]kernel softlockup due to sidtab_search_context run for long time 
because of too many sidtab context node

On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
> Hello,
> 
> I am doing stressing testing on 3.10 kernel(centos 7.4), to constantly 
> starting numbers of docker ontainers with selinux enabled, and after 
> about 2 days, the kernel softlockup panic:
>    [] sched_show_task+0xb8/0x120
>  [] show_lock_info+0x20f/0x3a0
>  [] watchdog_timer_fn+0x1da/0x2f0
>  [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
>  [] __hrtimer_run_queues+0xd2/0x260
>  [] hrtimer_interrupt+0xb0/0x1e0
>  [] local_apic_timer_interrupt+0x37/0x60
>  [] smp_apic_timer_interrupt+0x50/0x140
>  [] apic_timer_interrupt+0x6d/0x80
>    [] ? sidtab_context_to_sid+0xb3/0x480
>  [] ? sidtab_context_to_sid+0x110/0x480
>  [] ? mls_setup_user_range+0x145/0x250
>  [] security_get_user_sids+0x3f7/0x550
>  [] sel_write_user+0x12b/0x210
>  [] ? sel_write_member+0x200/0x200
>  [] selinux_transaction_write+0x48/0x80
>  [] vfs_write+0xbd/0x1e0
>  [] SyS_write+0x7f/0xe0
>  [] system_call_fastpath+0x16/0x1b
> 
> My opinion:
> when the docker container starts, it would mount overlay filesystem 
> with different selinux context, mount point such as:
> overlay on
> /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc32
> 6cb07495ca08fc9ddb66/merged type overlay 
> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c414,
> c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHHWY7:
> /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/docker
> /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/overl
> ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/
> diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f
> 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
> shm on
> /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327ca57
> 7b8f5d9d6a4adf218d4876/shm type tmpfs
> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
> dbox_file_t:s0:c414,c873",size=65536k)
> overlay on
> /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca
> 14ff6d165b94353eefab/merged type overlay 
> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c431,
> c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRCRSS:
> /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/l
> ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d
> 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d080145c7
> d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
> shm on
> /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1dcf05
> a65866458523ffd4a71614/shm type tmpfs
> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
> dbox_file_t:s0:c431,c651",size=65536k)
> 
> sidtab_search_context check the context whether is in the sidtab list, 
> If not found, a new node is generated and insert into the list, As the 
> number of 

Re: [BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-13 Thread Stephen Smalley
On Wed, 2017-12-13 at 09:25 +, yangjihong wrote:
> Hello, 
> 
> I am doing stressing testing on 3.10 kernel(centos 7.4), to
> constantly starting numbers of docker ontainers with selinux enabled,
> and after about 2 days, the kernel softlockup panic:
>    [] sched_show_task+0xb8/0x120
>  [] show_lock_info+0x20f/0x3a0
>  [] watchdog_timer_fn+0x1da/0x2f0
>  [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
>  [] __hrtimer_run_queues+0xd2/0x260
>  [] hrtimer_interrupt+0xb0/0x1e0
>  [] local_apic_timer_interrupt+0x37/0x60
>  [] smp_apic_timer_interrupt+0x50/0x140
>  [] apic_timer_interrupt+0x6d/0x80
>    [] ? sidtab_context_to_sid+0xb3/0x480
>  [] ? sidtab_context_to_sid+0x110/0x480
>  [] ? mls_setup_user_range+0x145/0x250
>  [] security_get_user_sids+0x3f7/0x550
>  [] sel_write_user+0x12b/0x210
>  [] ? sel_write_member+0x200/0x200
>  [] selinux_transaction_write+0x48/0x80
>  [] vfs_write+0xbd/0x1e0
>  [] SyS_write+0x7f/0xe0
>  [] system_call_fastpath+0x16/0x1b
> 
> My opinion:
> when the docker container starts, it would mount overlay filesystem
> with different selinux context, mount point such as: 
> overlay on
> /var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc32
> 6cb07495ca08fc9ddb66/merged type overlay
> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c414,
> c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHHWY7:
> /var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/docker
> /overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/overl
> ay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/
> diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f
> 6cb0f07b4bc326cb07495ca08fc9ddb66/work)
> shm on
> /var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327ca57
> 7b8f5d9d6a4adf218d4876/shm type tmpfs
> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
> dbox_file_t:s0:c414,c873",size=65536k)
> overlay on
> /var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca
> 14ff6d165b94353eefab/merged type overlay
> (rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c431,
> c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRCRSS:
> /var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/l
> ib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d
> 165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d080145c7
> d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
> shm on
> /var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1dcf05
> a65866458523ffd4a71614/shm type tmpfs
> (rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_san
> dbox_file_t:s0:c431,c651",size=65536k)
> 
> sidtab_search_context check the context whether is in the sidtab
> list, If not found, a new node is generated and insert into the list,
> As the number of containers is increasing,  context nodes are also
> more and more, we tested the final number of nodes reached 300,000 +,
> sidtab_context_to_sid runtime needs 100-200ms, which will lead to the
> system softlockup.
> 
> Is this a selinux bug? When filesystem umount, why context node is
> not deleted?  I cannot find the relevant function to delete the node
> in sidtab.c
> 
> Thanks for reading and looking forward to your reply.

So, does docker just keep allocating a unique category set for every
new container, never reusing them even if the container is destroyed? 
That would be a bug in docker IMHO.  Or are you creating an unbounded
number of containers and never destroying the older ones?

On the selinux userspace side, we'd also like to eliminate the use of
/sys/fs/selinux/user (sel_write_user -> security_get_user_sids)
entirely, which is what triggered this for you.

We cannot currently delete a sidtab node because we have no way of
knowing if there are any lingering references to the SID.  Fixing that
would require reference-counted SIDs, which goes beyond just SELinux
since SIDs/secids are returned by LSM hooks and cached in other kernel
data structures.

sidtab_search_context() could no doubt be optimized for the negative
case; there was an earlier optimization for the positive case by adding
a cache to sidtab_context_to_sid() prior to calling it.  It's a reverse
lookup in the sidtab.



[BUG]kernel softlockup due to sidtab_search_context run for long time because of too many sidtab context node

2017-12-13 Thread yangjihong
Hello, 

I am doing stressing testing on 3.10 kernel(centos 7.4), to constantly starting 
numbers of docker ontainers with selinux enabled, and after about 2 days, the 
kernel softlockup panic:
   [] sched_show_task+0xb8/0x120
 [] show_lock_info+0x20f/0x3a0
 [] watchdog_timer_fn+0x1da/0x2f0
 [] ? watchdog_enable_all_cpus.part.4+0x40/0x40
 [] __hrtimer_run_queues+0xd2/0x260
 [] hrtimer_interrupt+0xb0/0x1e0
 [] local_apic_timer_interrupt+0x37/0x60
 [] smp_apic_timer_interrupt+0x50/0x140
 [] apic_timer_interrupt+0x6d/0x80
   [] ? sidtab_context_to_sid+0xb3/0x480
 [] ? sidtab_context_to_sid+0x110/0x480
 [] ? mls_setup_user_range+0x145/0x250
 [] security_get_user_sids+0x3f7/0x550
 [] sel_write_user+0x12b/0x210
 [] ? sel_write_member+0x200/0x200
 [] selinux_transaction_write+0x48/0x80
 [] vfs_write+0xbd/0x1e0
 [] SyS_write+0x7f/0xe0
 [] system_call_fastpath+0x16/0x1b

My opinion:
when the docker container starts, it would mount overlay filesystem with 
different selinux context, mount point such as: 
overlay on 
/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/merged
 type overlay 
(rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c414,c873",lowerdir=/var/lib/docker/overlay2/l/Z4U7WY6ASNV5CFWLADPARHHWY7:/var/lib/docker/overlay2/l/V2S3HOKEFEOQLHBVAL5WLA3YLS:/var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/diff,workdir=/var/lib/docker/overlay2/be3ef517730d92fc4530e0e952eae4f6cb0f07b4bc326cb07495ca08fc9ddb66/work)
shm on 
/var/lib/docker/containers/9fd65e177d2132011d7b422755793449c91327ca577b8f5d9d6a4adf218d4876/shm
 type tmpfs 
(rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c414,c873",size=65536k)
overlay on 
/var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d165b94353eefab/merged
 type overlay 
(rw,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c431,c651",lowerdir=/var/lib/docker/overlay2/l/3MQQXB4UCLFB7ANVRHPAVRCRSS:/var/lib/docker/overlay2/l/46YGYO474KLOULZGDSZDW2JPRI,upperdir=/var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d165b94353eefab/diff,workdir=/var/lib/docker/overlay2/38d1544d080145c7d76150530d0255991dfb7258cbca14ff6d165b94353eefab/work)
shm on 
/var/lib/docker/containers/662e7f798fc08b09eae0f0f944537a4bcedc1dcf05a65866458523ffd4a71614/shm
 type tmpfs 
(rw,nosuid,nodev,noexec,relatime,context="system_u:object_r:svirt_sandbox_file_t:s0:c431,c651",size=65536k)

sidtab_search_context check the context whether is in the sidtab list, If not 
found, a new node is generated and insert into the list, As the number of 
containers is increasing,  context nodes are also more and more, we tested the 
final number of nodes reached 300,000 +, sidtab_context_to_sid runtime needs 
100-200ms, which will lead to the system softlockup.

Is this a selinux bug? When filesystem umount, why context node is not deleted? 
 I cannot find the relevant function to delete the node in sidtab.c

Thanks for reading and looking forward to your reply.