[Devel] [PATCH RH8] ksm: react on changing "sleep_millisecs" parameter faster

2021-05-31 Thread Andrey Zhadchenko
From: Kirill Tkhai ksm thread unconditionally sleeps in ksm_scan_thread() after each iteration: schedule_timeout_interruptible( msecs_to_jiffies(ksm_thread_sleep_millisecs)) The timeout is configured in /sys/kernel/mm/ksm/sleep_millisecs. In case of user writes a big

[Devel] [PATCH RH8] net: export "net.netfilter.nf_conntrack_helper" sysctl for Container

2021-05-31 Thread Vasily Averin
firewalld honors it and the sysctl data is stored on "net", it's safe to provide it. https://jira.sw.ru/browse/PSBM-99791 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit cfbc2e31e7fb6f6c31fc89ae1a6fbaa73b93c1fd) Signed-off-by: Vasily Averin ---

[Devel] [PATCH RH8 2/5] socket: fix unused-function warning

2021-05-31 Thread Vasily Averin
When procfs is disabled, the fdinfo code causes a harmless warning: net/socket.c:1000:13: error: 'sock_show_fdinfo' defined but not used [-Werror=unused-function] static void sock_show_fdinfo(struct seq_file *m, struct file *f) Move the function definition up so we can use a single #ifdef

[Devel] [PATCH RH8 4/5] unix: define and set show_fdinfo only if procfs is enabled

2021-05-31 Thread Vasily Averin
Follow the pattern used with other *_show_fdinfo functions and only define unix_show_fdinfo and set it in proto_ops if CONFIG_PROCFS is set. Fixes: 3c32da19a858 ("unix: Show number of pending scm files of receive queue in fdinfo") Signed-off-by: Tobias Klauser Reviewed-by: Kirill Tkhai

[Devel] [PATCH RH8 3/5] unix: Show number of pending scm files of receive queue in fdinfo

2021-05-31 Thread Vasily Averin
Unix sockets like a block box. You never know what is stored there: there may be a file descriptor holding a mount or a block device, or there may be whole universes with namespaces, sockets with receive queues full of sockets etc. The patch adds a little debug and accounts number of files (not

[Devel] [PATCH RH8 1/5] net: Allow to show socket-specific information in /proc/[pid]/fdinfo/[fd]

2021-05-31 Thread Vasily Averin
This adds .show_fdinfo to socket_file_ops, so protocols will be able to print their specific data in fdinfo. Signed-off-by: Kirill Tkhai Signed-off-by: David S. Miller (cherry picked from commit b4653342b1514cb11f25b727c689451aff02996d) VvS: taken from vz7 commit

[Devel] [PATCH RH8 5/5] unix: uses an atomic type for scm files accounting

2021-05-31 Thread Vasily Averin
So the scm_stat_{add,del} helper can be invoked with no additional lock held. This clean-up the code a bit and will make the next patch easier. Signed-off-by: Paolo Abeni Reviewed-by: Kirill Tkhai Signed-off-by: David S. Miller (cherry picked from commit

Re: [Devel] [PATCH RH8 2/2] cbt: endless loop on rollback in blk_cbt_snap_create()

2021-05-31 Thread Kirill Tkhai
On 29.05.2021 14:52, Vasily Averin wrote: > taken from vz7 commit faed6a011b > ("cbt: endless loop on rollback in blk_cbt_map_copy_once") > > found by smatch: > block/blk-cbt.c:359 blk_cbt_map_copy_once() warn: > always true condition '(--i >= 0) => (0-u64max >= 0)' > > It leads to

[Devel] [PATCH RH8] ve/net/bridge: make net.bridge.* sysctl visible in Containers (r/o)

2021-05-31 Thread Vasily Averin
Kubernetes does some prechecks before run, in particular it requires "net.bridge.bridge-nf-call-ip[6]tables" sysctls to be enabled. Thus let's make all "net.bridge.*" sysctls visible in Containers but (as they are not virtualized) in readonly mode. The implementation is not minimal to gain the

[Devel] [PATCH RH8] ve/procfs: make /proc/config.gz visible inside Containers

2021-05-31 Thread Vasily Averin
Kubernetes reads this file to check kernel version at the moment and potentially to check other options in the future. https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit 076550cd53c5bdbb50282259d3eab88277f8) Signed-off-by: Vasily Averin

[Devel] [PATCH RH8] net: export "net/*/neigh/*/*" sysctls for Container

2021-05-31 Thread Vasily Averin
Weave Kubernetes plugin requires tuning of /proc/sys/net/ipv4/neigh/weave/base_reachable_time in particular, so let's export neighbour sysctls as well. https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit

[Devel] [PATCH RH8] ve/bridge: handle netlink messages AF_BRIDGE / RTM_[GSD]ETLINK sent from inside a Container

2021-05-31 Thread Vasily Averin
Weave network pluging for Kubernetes configures bridge via netlink, so need to allow appropriate netlink messages if sent inside a Container. https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit e7c862d58164c1b3376c8c568099cde3a540853d)

[Devel] [PATCH RH8] ve/proc/block: show /proc/diskstats inside a Container

2021-05-31 Thread Vasily Averin
The proc file is virtualized, so it contains stats for only those block devices which are allowed by device cgroup related to the Container. https://jira.sw.ru/browse/PSBM-90491 https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit

[Devel] [PATCH RH8] ve/nfsd: don't disable UMH client tracker globally due to single Container misconfiguration

2021-05-31 Thread Vasily Averin
If UMH client tracker fails to init in a single Container due to, for example, corrupted "/sbin/nfsdcltrack" binary, currently UMH client tracker is disabled globally on the node as it's not virtualized. Let's print a ratelimited warning instead, but don't disable the UMH tracker. Fixes: vz8:

[Devel] [PATCH RH8] openvswitch: allow to create ovs bridges inside Containers

2021-05-31 Thread Vasily Averin
openvswitch briges are used by Weave net plugin for Kubernetes. https://jira.sw.ru/browse/PSBM-92107 Signed-off-by: Konstantin Khorenko (cherry picked from vz7 commit 8ed1b4ae93bc7ee7752a55040b00647d8d07afb1) Signed-off-by: Vasily Averin --- net/openvswitch/vport-internal_dev.c | 3 ++- 1

[Devel] [PATCH RH8] ms/VFS: use synchronize_rcu_expedited() in namespace_unlock()

2021-05-31 Thread Vasily Averin
The synchronize_rcu() in namespace_unlock() is called every time a filesystem is unmounted. If a great many filesystems are mounted, this can cause a noticable slow-down in, for example, system shutdown. The sequence: mkdir -p /tmp/Mtest/{0..5000} time for i in /tmp/Mtest/*; do mount -t

[Devel] [PATCH RH8] ms/nfsd: memory corruption in nfsd4_lock()

2021-05-31 Thread Vasily Averin
New struct nfsd4_blocked_lock allocated in find_or_allocate_block() does not initialized nbl_list and nbl_lru. If conflock allocation fails rollback can call list_del_init() access uninitialized fields and corrupt memory. v2: just initialize nbl_list and nbl_lru right after nbl allocation.

[Devel] [PATCH RH8] mm/memcontrol.c: allocate shrinker_map on appropriate NUMA node

2021-05-31 Thread Vasily Averin
The shrinker_map may be touched from any cpu (e.g., a bit there may be set by a task running everywhere) but kswapd is always bound to specific node. So allocate shrinker_map from the related NUMA node to respect its NUMA locality. Also, this follows generic way we use for allocation of memcg's