Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-23 Thread Tejun Heo
Hello, Rick. On Mon, Jun 22, 2020 at 02:22:34PM -0700, Rick Lindsley wrote: > > I don't know. The above highlights the absurdity of the approach itself to > > me. You seem to be aware of it too in writing: 250,000 "devices". > > Just because it is absurd doesn't mean it wasn't built that way :)

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-22 Thread Tejun Heo
On Fri, Jun 19, 2020 at 07:44:29PM -0700, Rick Lindsley wrote: > echo 0 > /sys/devices//system/memory/memory10374/online > > and boom - you've taken memory chunk 10374 offline. > > These changes are not just a whim. I used lockstat to measure contention > during boot. The addition of 250,000

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-22 Thread Tejun Heo
Hello, Ian. On Sun, Jun 21, 2020 at 12:55:33PM +0800, Ian Kent wrote: > > > They are used for hotplugging and partitioning memory. The size of > > > the > > > segments (and thus the number of them) is dictated by the > > > underlying > > > hardware. > > > > This sounds so bad. There gotta be a

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-19 Thread Tejun Heo
On Fri, Jun 19, 2020 at 01:41:39PM -0700, Rick Lindsley wrote: > On 6/19/20 8:38 AM, Tejun Heo wrote: > > > I don't have strong objections to the series but the rationales don't seem > > particularly strong. It's solving a suspected problem but only half way. It > > isn'

Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency improvement

2020-06-19 Thread Tejun Heo
Hello, Ian. On Wed, Jun 17, 2020 at 03:37:43PM +0800, Ian Kent wrote: > The series here tries to reduce the locking needed during path walks > based on the assumption that there are many path walks with a fairly > large portion of those for non-existent paths, as described above. > > That was

[GIT PULL] workqueue changes for v5.8-rc1

2020-06-05 Thread Tejun Heo
Hello, Linus. Mostly cleanups and other trivial changes. The only interesting change is Sebastian's rcuwait conversion for RT which was already discussed. Thanks. The following changes since commit 47cf1b422e6093aee2a3e55d5e162112a2c69870: Merge branch 'for-linus' of

[GIT PULL] cgroup changes for v5.8-rc1

2020-06-05 Thread Tejun Heo
Hello, Linus. Just two patches. One to add system-level cpu.stat to the root cgroup for convenience and a trivial comment update. Thanks. The following changes since commit eec8fd0277e37cf447b88c6be181e81df867bcf1: device_cgroup: Cleanup cgroup eBPF device filter code (2020-04-13 14:41:54

Re: [PATCH 4/4] workqueue: slash half memory usage in 32bit system

2020-06-02 Thread Tejun Heo
Hello, On Tue, Jun 02, 2020 at 08:08:10AM +0800, Lai Jiangshan wrote: > It is not noticable from the "free" command. > By counting the number of allocated pwq (mainly percpu pwq), > it saves 20k in my simple kvm guest (4cpu). > I guess it highly various in different boxes with various > kernel

Re: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

2020-06-02 Thread Tejun Heo
Hello, Lai. On Tue, Jun 02, 2020 at 01:49:14PM +, Lai Jiangshan wrote: > +static void dec_nr_in_flight_flush_work(struct workqueue_struct *wq) > +{ > + if (atomic_dec_and_test(>nr_flush_work)) Do you think it'd make sense to put this in pwq so that it can be synchronized with the pool

Re: [PATCH] blk-cgroup: don't account iostat for root cgroup

2020-06-01 Thread Tejun Heo
On Mon, Jun 01, 2020 at 04:11:41PM -0700, Boris Burkov wrote: > This data is never flushed by rstat, so it is never used. We shouldn't > bother collecting it. We can access global disk stats to compute io > statistics for the root cgroup. > > Signed-off-by: Boris Burkov Acke

Re: [PATCH 2/2 blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat

2020-06-01 Thread Tejun Heo
; to external so that it can be used from blk-cgroup.c to iterate over > disks. > > Signed-off-by: Boris Burkov > Suggested-by: Tejun Heo Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH 1/2 blk-cgroup/for-5.8] blk-cgroup: make iostat functions visible to stat printing

2020-06-01 Thread Tejun Heo
cts directly. Since declaring static functions ahead does not > seem like common practice in this file, simply move the iostat functions > up. We only plan to use blkg_iostat_set, but it seems better to keep them > all together. > > Signed-off-by: Boris Burkov Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH blk-cgroup/for-5.8] blk-cgroup: show global disk stats in root cgroup io.stat

2020-06-01 Thread Tejun Heo
ling it > out directly from global disk stats. The result is a root cgroup io.stat > file consistent with both /proc/diskstats and io.stat. > > Signed-off-by: Boris Burkov > Suggested-by: Tejun Heo ... > +static void blkg_iostat_set(struct blkg_iostat *dst, struct blkg_iostat

Re: [PATCH] workqueue: ensure all flush_work() completed when being destoryed

2020-06-01 Thread Tejun Heo
Hello, Lai. On Mon, Jun 01, 2020 at 06:08:02AM +, Lai Jiangshan wrote: > +static void flush_no_color(struct workqueue_struct *wq) > +{ ... I'm not too sure about using the colored flushing system for this. Given that the requirements are a lot simpler, I'd prefer keep it separate and dumb

Re: [PATCH 2/4] workqueue: use BUILD_BUG_ON() for compile time test instead of WARN_ON()

2020-06-01 Thread Tejun Heo
On Mon, Jun 01, 2020 at 08:44:40AM +, Lai Jiangshan wrote: > Any runtime WARN_ON() has to be fixed, and BUILD_BUG_ON() can > help you nitice it earlier. > > Signed-off-by: Lai Jiangshan Applied 1-2 to wq/for-5.8. Thanks. -- tejun

Re: [PATCH 4/4] workqueue: slash half memory usage in 32bit system

2020-06-01 Thread Tejun Heo
On Mon, Jun 01, 2020 at 08:44:42AM +, Lai Jiangshan wrote: > The major memory ussage in workqueue is on the pool_workqueue. > The pool_workqueue has alignment requirement which often leads > to padding. > > Reducing the memory usage for the pool_workqueue is valuable. > > And 32bit system

Re: [PATCH 2/4] workqueue: don't check wq->rescuer in rescuer

2020-05-29 Thread Tejun Heo
Hello, On Fri, May 29, 2020 at 10:58:46PM +0800, Lai Jiangshan wrote: > I'm not sure I understood your words. And I'm not > sure which function may use freed object in "use-after-free". > Is it "send_mayday() may use a freed rescuer"? > > This patch relies on > def98c84b6 ("workqueue: Fix

Re: [PATCH 4/4] workqueue: remove useless unlock() and lock() in series

2020-05-29 Thread Tejun Heo
On Fri, May 29, 2020 at 06:59:02AM +, Lai Jiangshan wrote: > This is no point to unlock() and then lock() the same mutex > back to back. > > Signed-off-by: Lai Jiangshan Applied to wq/for-5.8. Thanks. -- tejun

Re: [PATCH 3/4] workqueue: free wq->unbound_attrs earlier

2020-05-29 Thread Tejun Heo
On Fri, May 29, 2020 at 06:59:01AM +, Lai Jiangshan wrote: > wq->unbound_attrs is never accessed in rcu read site, so that > it can be freed earlier and relieves memory pressure earlier, > although slightly. > > Signed-off-by: Lai Jiangshan I don't think this is gonna make a material

Re: [PATCH 1/4] workqueue: void unneeded requeuing the pwq in rescuer thread

2020-05-29 Thread Tejun Heo
On Fri, May 29, 2020 at 06:58:59AM +, Lai Jiangshan wrote: > 008847f66c3 ("workqueue: allow rescuer thread to do more work.") made > the rescuer worker requeue the pwq immediately if there may be more > work items which need rescuing instead of waiting for the next mayday > timer expiration.

Re: [PATCH 2/4] workqueue: don't check wq->rescuer in rescuer

2020-05-29 Thread Tejun Heo
On Fri, May 29, 2020 at 06:59:00AM +, Lai Jiangshan wrote: > Now rescuer checks pwq->nr_active before requeues the pwq, > it is a more robust check and the rescuer must be still valid. > > Signed-off-by: Lai Jiangshan > --- > kernel/workqueue.c | 23 +-- > 1 file

Re: [PATCH v2 2/2] workqueue: Convert the pool::lock and wq_mayday_lock to raw_spinlock_t

2020-05-29 Thread Tejun Heo
On Wed, May 27, 2020 at 09:46:33PM +0200, Sebastian Andrzej Siewior wrote: > The workqueue code has it's internal spinlocks (pool::lock), which > are acquired on most workqueue operations. These spinlocks are > converted to 'sleeping' spinlocks on a RT-kernel. > > Workqueue functions can be

Re: [PATCH 1/2] workqueue: pin the pool while it is managing

2020-05-28 Thread Tejun Heo
Hello, On Thu, May 28, 2020 at 03:06:55AM +, Lai Jiangshan wrote: > @@ -2129,10 +2128,21 @@ __acquires(>lock) > static bool manage_workers(struct worker *worker) > { > struct worker_pool *pool = worker->pool; > + struct work_struct *work = list_first_entry(>worklist, > +

Re: [PATCH v2 cgroup/for-5.8] cgroup: add cpu.stat file to root cgroup

2020-05-28 Thread Tejun Heo
em, while improving the usability of cgroups stats. > We avoid the second problem by computing the contents of cpu.stat from > existing data collected for /proc/stat anyway. > > Signed-off-by: Boris Burkov > Suggested-by: Tejun Heo Applied to cgroup/for-5.8. Thanks. -- tejun

Re: 回复: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq()

2020-05-28 Thread Tejun Heo
Hello, On Thu, May 28, 2020 at 09:27:03PM +0800, Lai Jiangshan wrote: > wq owns the ultimate or permanent references to itself by > owning references to wq->numa_pwq_tbl[node], wq->dfl_pwq. > The pwq's references keep the pwq in wq->pwqs. Yeah, regardless of who puts a wq the last time, the base

Re: 回复: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq()

2020-05-28 Thread Tejun Heo
On Thu, May 28, 2020 at 12:57:03PM +0300, Dan Carpenter wrote: > Guys, the patch is wrong. The kfree is harmless when this is called > from destroy_workqueue() and required when it's called from > pwq_unbound_release_workfn(). Lai Jiangshan already explained this > already. Why are we still

[GIT PULL] cgroup fixes for v5.7-rc7

2020-05-27 Thread Tejun Heo
): xattr: fix uninitialized out-param Odin Ugedal (1): device_cgroup: Cleanup cgroup eBPF device filter code Tejun Heo (1): Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window" drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-

Re: [PATCH v5] workqueue: Remove unnecessary kfree() call in rcu_free_wq()

2020-05-27 Thread Tejun Heo
On Wed, May 27, 2020 at 03:57:15PM +0800, qiang.zh...@windriver.com wrote: > From: Zhang Qiang > > The data structure member "wq->rescuer" was reset to a null pointer > in one if branch. It was passed to a call of the function "kfree" > in the callback function "rcu_free_wq" (which was

Re: [PATCH cgroup/for-5.8] cgroup: add cpu.stat file to root cgroup

2020-05-26 Thread Tejun Heo
t; first problem, while improving the usability of cgroups stats. > We avoid the second problem by computing the contents of cpu.stat from > existing data collected for /proc/stat anyway. > > Signed-off-by: Boris Burkov > Suggested-by: Tejun Heo If there are any objections, please h

Re: [PATCH 0/3] workqueue: Make the workqueue code PREEMPT_RT safe

2020-05-26 Thread Tejun Heo
Hello, On Wed, May 13, 2020 at 06:27:29PM +0200, Sebastian Andrzej Siewior wrote: > The series changes `wq_manager_wait' from waitqueues to simple > waitqueues and its internal locking (pool::lock and wq_mayday_lock) to > raw spinlocks so that workqueues can be used on PREEMPT_RT from truly >

Re: [PATCH RFC 2/5] mm: memcg/percpu: account percpu memory to memory cgroups

2020-05-26 Thread Tejun Heo
On Tue, May 19, 2020 at 01:18:03PM -0700, Roman Gushchin wrote: > Percpu memory is becoming more and more widely used by various > subsystems, and the total amount of memory controlled by the percpu > allocator can make a good part of the total memory. > > As an example, bpf maps can consume a

Re: [PATCH] blkcg:Fix memory leaks in blkg_conf_prep()

2020-05-26 Thread Tejun Heo
ion blkg_conf_prep(). > > Suggested-by: Markus Elfring > Signed-off-by: Wu Bo Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH] cgroup: Remove stale comments

2020-05-26 Thread Tejun Heo
On Wed, May 13, 2020 at 10:13:11AM +0800, Zefan Li wrote: > - The default root is where we can create v2 cgroups. > - The __DEVEL__sane_behavior mount option has been removed long long ago. > > Signed-off-by: Li Zefan Applied to cgroup/for-5.8. Thanks. -- tejun

Re: [PATCH block/for-linus] iocost: don't let vrate run wild while there's no saturation signal

2020-05-14 Thread Tejun Heo
On Mon, Oct 14, 2019 at 05:18:11PM -0700, Tejun Heo wrote: > When the QoS targets are met and nothing is being throttled, there's > no way to tell how saturated the underlying device is - it could be > almost entirely idle, at the cusp of saturation or anywhere inbetween. > Given

Re: [RFC v4 02/12] kthread: Add kthread_(un)block_work_queuing() and kthread_work_queuable()

2020-05-11 Thread Tejun Heo
On Fri, May 08, 2020 at 04:46:52PM -0400, Lyude Paul wrote: > Add some simple wrappers around incrementing/decrementing > kthread_work.cancelling under lock, along with checking whether queuing > is currently allowed on a given kthread_work, which we'll use want to > implement work cancelling with

Re: [RFC v4 01/12] kthread: Add kthread_queue_flush_work()

2020-05-11 Thread Tejun Heo
Hello, On Fri, May 08, 2020 at 04:46:51PM -0400, Lyude Paul wrote: > +bool kthread_queue_flush_work(struct kthread_work *work, > + struct kthread_flush_work *fwork); > +void __kthread_flush_work_fn(struct kthread_work *work); As an exposed interface, this doesn't seem

Re: [PATCH] workqueue: Fix an use after free in init_rescuer()

2020-05-11 Thread Tejun Heo
On Fri, May 08, 2020 at 06:07:40PM +0300, Dan Carpenter wrote: > We need to preserve error code before freeing "rescuer". > > Fixes: f187b6974f6df ("workqueue: Use IS_ERR and PTR_ERR instead of > PTR_ERR_OR_ZERO.") > Signed-off-by: Dan Carpenter Applied to wq/for-5.8. Thank you. -- tejun

Re: [PATCH] trace: Remove duplicate semicolons at the end of line

2020-05-11 Thread Tejun Heo
On Mon, May 11, 2020 at 07:21:02PM +0800, Xiaoming Ni wrote: > Remove duplicate semicolons at the end of line in > include/trace/events/iocost.h > > Signed-off-by: Xiaoming Ni Acked-by: Tejun Heo iocost changes normally goes through the block tree. Can you please resend wit

Re: [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups

2020-05-09 Thread Tejun Heo
he mode switch when someone writes to the ifpriomap cgroup > control file. The easiest fix is to also do the switch when a task is attached > to a new cgroup. > > Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup") > Reported-by: Yang Yingliang > Tested-by: Yang Yingliang > Signed-off-by: Zefan Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-06 Thread Tejun Heo
Hello, Bruce. On Wed, May 06, 2020 at 11:36:58AM -0400, J. Bruce Fields wrote: > On Tue, May 05, 2020 at 05:25:27PM -0400, J. Bruce Fields wrote: > > On Tue, May 05, 2020 at 05:09:56PM -0400, Tejun Heo wrote: > > > It's not the end of the world but a bit hacky. I wonder

Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-05 Thread Tejun Heo
Hello, On Tue, May 05, 2020 at 05:01:18PM -0400, J. Bruce Fields wrote: > On Mon, May 04, 2020 at 10:15:14PM -0400, J. Bruce Fields wrote: > > Though now I'm feeling greedy: it would be nice to have both some kind > > of global flag, *and* keep kthread->data pointing to svc_rqst (as that > >

Re: cgroup pointed by sock is leaked on mode switch

2020-05-05 Thread Tejun Heo
Hello, Yang. On Sat, May 02, 2020 at 06:27:21PM +0800, Yang Yingliang wrote: > I find the number nr_dying_descendants is increasing: > linux-dVpNUK:~ # find /sys/fs/cgroup/ -name cgroup.stat -exec grep > '^nr_dying_descendants [^0]'  {} + > /sys/fs/cgroup/unified/cgroup.stat:nr_dying_descendants

Re: [PATCH v3 19/19] tools/cgroup: add memcg_slabinfo.py tool

2020-05-05 Thread Tejun Heo
n, but SLAB can be trivially > added later. ... > Signed-off-by: Roman Gushchin > Cc: Waiman Long > Cc: Tobin C. Harding > Cc: Tejun Heo Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH v2] workqueue: Use IS_ERR and PTR_ERR instead of PTR_ERR_OR_ZERO.

2020-05-05 Thread Tejun Heo
On Wed, Apr 29, 2020 at 12:04:13PM +0800, Sean Fu wrote: > Replace inline function PTR_ERR_OR_ZERO with IS_ERR and PTR_ERR to > remove redundant parameter definitions and checks. > Reduce code size. > Before: >text data bss dec hex filename > 47510 5979 840

Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-05 Thread Tejun Heo
Hello, Bruce. On Mon, May 04, 2020 at 10:15:14PM -0400, J. Bruce Fields wrote: > We're currently using it to pass the struct svc_rqst that a new nfsd > thread needs. But once the new thread has gotten that, I guess it could > set kthread->data to some global value that it uses to say "I'm a

Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup

2020-05-05 Thread Tejun Heo
Hello, Dave. On Tue, May 05, 2020 at 04:41:14PM +1000, Dave Chinner wrote: > > OTOH I don't have a great idea how the generic infrastructure should look > > like... > > I haven't given it any thought - it's not something I have any > bandwidth to spend time on. I'll happily review a unified >

[block/for-5.7] iocost: protect iocg->abs_vdebt with iocg->waitq.lock

2020-05-04 Thread Tejun Heo
ndling with iocg->waitq.lock. Signed-off-by: Tejun Heo Reported-by: Vlad Dmitriev Cc: sta...@vger.kernel.org # v5.4+ Fixes: e1518f63f246 ("blk-iocost: Don't let merges push vtime into the future") --- block/blk-iocost.c | 117

Re: [PATCH 0/4] allow multiple kthreadd's

2020-05-01 Thread Tejun Heo
Hello, On Fri, May 01, 2020 at 10:59:24AM -0700, Linus Torvalds wrote: > Which kind of makes me want to point a finger at Tejun. But it's been > mostly PeterZ touching this file lately.. Looks fine to me too. I don't quite understand the usecase tho. It looks like all it's being used for is to

Re: [PATCH] percpu: make pcpu_alloc() aware of current gfp context

2020-04-30 Thread Tejun Heo
is discouraged since memalloc_[nofs|noio]_save() > were introduced. Therefore this change makes pcpu_alloc() look up into > an existing nofs/noio context before deciding whether it is in an atomic > context or not. > > Signed-off-by: Filipe Manana Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH v5 0/4] Charge loop device i/o to issuing cgroup

2020-04-29 Thread Tejun Heo
Hello, On Wed, Apr 29, 2020 at 12:25:40PM +0200, Jan Kara wrote: > Yeah, I was thinking about the same when reading the patch series > description. We already have some cgroup workarounds for btrfs kthreads if > I remember correctly, we have cgroup handling for flush workers, now we are > adding

Re: [PATCH] net: fix sk_page_frag() recursion from memory reclaim

2019-10-19 Thread Tejun Heo
Hello, On Sat, Oct 19, 2019 at 11:15:28AM -0700, Eric Dumazet wrote: > It seems compiler generates better code with : > > diff --git a/include/net/sock.h b/include/net/sock.h > index > ab905c4b1f0efd42ebdcae333b3f0a2c7c1b2248..56de6ac99f0952bd0bc003353c094ce3a5a852f4 > 100644 > ---

[PATCH] net: fix sk_page_frag() recursion from memory reclaim

2019-10-19 Thread Tejun Heo
>From f0335a5d14d3596d36e3ffddb2fd4fa0dc6ca9c2 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 19 Oct 2019 09:10:57 -0700 sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the soc

Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

2019-10-16 Thread Tejun Heo
On Thu, Oct 17, 2019 at 02:35:20AM +1100, Aleksa Sarai wrote: > > Sure, I will switch it to use atomic64_read() and atomic64_set() instead > > if that's what you'd prefer. Though I will mention that on quite a few > > architectures atomic64_read() is defined as: > > > > #define atomic64_read(v)

Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

2019-10-16 Thread Tejun Heo
Hello, On Thu, Oct 17, 2019 at 02:29:46AM +1100, Aleksa Sarai wrote: > > Hah, where is it saying that? > > Isn't that what this says: > > > Therefore, if you find yourself only using the Non-RMW operations of > > atomic_t, you do not in fact need atomic_t at all and are doing it > > wrong. > >

Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

2019-10-16 Thread Tejun Heo
Hello, Aleksa. On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote: > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to > me like it's explicitly saying that I shouldn't use atomic64_t if I'm > just using it for fetching and assignment. Hah, where is it saying that?

[PATCH block/for-linus] blkcg: Fix multiple bugs in blkcg_activate_policy()

2019-10-15 Thread Tejun Heo
parate. Init takes place only after all allocs succeeded and on failure all allocated pds are freed. * Unifying and fixing the cleanup of the remaining pd_prealloc. Signed-off-by: Tejun Heo Fixes: cf09a8ee19ad ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()") -

Re: [RFC] writeback: add elastic bdi in cgwb bdp

2019-10-15 Thread Tejun Heo
Hello, Hillf. Do you have a test case which can demonstrate the problem you're seeing in the existing code? Thanks. -- tejun

Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

2019-10-14 Thread Tejun Heo
Hello, Aleksa. On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote: > On 2019-10-14, Tejun Heo wrote: > > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote: > > > Because pids->limit can be changed concurrently (but we don't want to > > >

Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

2019-10-14 Thread Tejun Heo
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote: > Because pids->limit can be changed concurrently (but we don't want to > take a lock because it would be needlessly expensive), use the > appropriate memory barriers. I can't quite tell what problem it's fixing. Can you elaborate a

Re: [PATCH] cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()

2019-10-11 Thread Tejun Heo
On Wed, Oct 09, 2019 at 05:02:30PM +0200, Oleg Nesterov wrote: > ptrace_stop() does preempt_enable_no_resched() to avoid the preemption, > but after that cgroup_enter_frozen() does spin_lock/unlock and this adds > another preemption point. > > Reported-and-tested-by: Bruce Ashfield > Fixes:

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-10-07 Thread Tejun Heo
Hello, On Wed, Oct 02, 2019 at 03:28:00PM +0900, Namhyung Kim wrote: > On Sat, Sep 21, 2019 at 6:04 AM Tejun Heo wrote: > > > > On Fri, Sep 20, 2019 at 05:47:45PM +0900, Namhyung Kim wrote: > > > Thanks for the sharing information! For 32-bit, while the ino itself is &

Re: [PATCH 0/5] Optimize single thread migration

2019-10-07 Thread Tejun Heo
On Fri, Oct 04, 2019 at 12:57:38PM +0200, Michal Koutný wrote: > Hello. > > The important part is the patch 02 where the reasoning is. > > The rest is mostly auxiliar and split out into separate commits for > better readability. > > The patches are based on v5.3. This is great. Applied to

Re: [PATCH] cgroup: short-circuit current_cgns_cgroup_from_root() on the default hierarchy

2019-10-07 Thread Tejun Heo
On Sun, Sep 29, 2019 at 04:06:58PM +0800, Miaohe Lin wrote: > Like commit 13d82fb77abb ("cgroup: short-circuit cset_cgroup_from_root() on > the default hierarchy"), short-circuit current_cgns_cgroup_from_root() on > the default hierarchy. > > Signed-off-by: Miaohe Lin Applied to cgroup/for-5.5.

[PATCH wq/for-5.2-fixes] workqueue: Fix pwq ref leak in rescuer_thread()

2019-10-04 Thread Tejun Heo
>From e66b39af00f426b3356b96433d620cb3367ba1ff Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 25 Sep 2019 06:59:15 -0700 Subject: [PATCH 2/2] workqueue: Fix pwq ref leak in rescuer_thread() 008847f66c3 ("workqueue: allow rescuer thread to do more work.") made the rescuer w

[PATCH wq/for-5.4-fixes] workqueue: more destroy_workqueue() fixes

2019-10-04 Thread Tejun Heo
>From c29eb85386880750130a01aabf288408a6614d65 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 23 Sep 2019 11:08:58 -0700 Subject: [PATCH] workqueue: more destroy_workqueue() fixes destroy_workqueue() warnings still, at a lower frequency, trigger spuriously. The problem se

Re: [PATCH 08/10] blkcg: implement blk-iocost

2019-10-03 Thread Tejun Heo
Hello, On Thu, Oct 03, 2019 at 04:51:06PM +0200, Michal Koutný wrote: > > Initially, I put them under block device sysfs but it was too clumsy > > with different config file formats and all. > Do you have any more details on that? In the end, it all boils down to a > daemon/setup utility writing

Re: workqueue: PF_MEMALLOC task 14771(cc1plus) is flushing !WQ_MEM_RECLAIM events:gen6_pm_rps_work

2019-10-03 Thread Tejun Heo
Hello, On Thu, Sep 26, 2019 at 09:06:58AM +0200, Zdenek Sojka wrote: > I've hit the following dmesg with a 5.3.1 kernel; it looks similar to > https://lkml.org/lkml/2019/8/28/754 , which should have been fixed as noted > in https://lkml.org/lkml/2019/8/28/763 (if the patch is in the 5.3

Re: System hangs if NVMe/SSD is removed during suspend

2019-10-03 Thread Tejun Heo
Hello, Mika. On Wed, Oct 02, 2019 at 03:21:36PM +0300, Mika Westerberg wrote: > but from that discussion I don't see more generic solution to be > implemented. > > Any ideas we should fix this properly? Yeah, the only fix I can think of is not using freezable wq. It's just not a good idea and

[PATCH] writeback: fix use-after-free in finish_writeback_work()

2019-09-23 Thread Tejun Heo
0001606e0 Call Trace: __wake_up_common_lock+0x63/0xc0 wb_workfn+0xd2/0x3e0 process_one_work+0x1f5/0x3f0 worker_thread+0x2d/0x3d0 kthread+0x111/0x130 ret_from_fork+0x1f/0x30 Fix it by reading and caching @done->waitq before decrementing @done->cnt. Signed-off-by: Tejun Heo Debugged-by: Chri

Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

2019-09-23 Thread Tejun Heo
Hello, On Mon, Sep 23, 2019 at 06:06:46PM +0300, Konstantin Khlebnikov wrote: > On 23/09/2019 17.52, Tejun Heo wrote: > >Hello, Konstantin. > > > >On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote: > >>With vm.dirty_write_behind 1 or 2

Re: [PATCH v2] mm: implement write-behind policy for sequential file writes

2019-09-23 Thread Tejun Heo
Hello, Konstantin. On Fri, Sep 20, 2019 at 10:39:33AM +0300, Konstantin Khlebnikov wrote: > With vm.dirty_write_behind 1 or 2 files are written even faster and Is the faster speed reproducible? I don't quite understand why this would be. > during copying amount of dirty memory always stays

[PATCH] FUSE: fix beyond-end-of-page access in fuse_parse_cache()

2019-09-22 Thread Tejun Heo
g a PF. This is caused by dirent->namelen being accessed before ensuring that there's enough bytes in the page for the dirent. Fix it by pushing down reclen calculation. Signed-off-by: Tejun Heo Fixes: 5d7bc7e8680c ("fuse: allow using readdir cache") Cc: sta...@vger.kernel.org # v4.20+ ---

[PATCH] workqueue: Minor follow-ups to the rescuer destruction change

2019-09-20 Thread Tejun Heo
>From 30ae2fc0a75eb5f1a38bbee7355d8e3bc823a6e1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 20 Sep 2019 14:09:14 -0700 * Now that wq->rescuer may be cleared while rescuer is still there, switch show_pwq() debug printout to test worker->rescue_wq to identify rescuers intead o

Re: [PATCH] workqueue: Fix spurious sanity check failures in destroy_workqueue()

2019-09-20 Thread Tejun Heo
Hello, On Thu, Sep 19, 2019 at 10:49:04AM +0800, Lai Jiangshan wrote: > Looks good to me. > > There is one test in show_pwq() > """ > worker == pwq->wq->rescuer ? "(RESCUER)" : "", > """ > I'm wondering if it needs to be updated to > """ > worker->rescue_wq ? "(RESCUER)" : "", > """ Hmm...

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-09-20 Thread Tejun Heo
On Fri, Sep 20, 2019 at 05:47:45PM +0900, Namhyung Kim wrote: > Thanks for the sharing information! For 32-bit, while the ino itself is not > monotonic, gen << 32 + ino is monotonic right? I think we can use the It's not. gen gets incremented on every allocation, so it has not high but still

[PATCH wq/for-5.4-fixes] workqueue: Fix missing kfree(rescuer) in destroy_workqueue()

2019-09-20 Thread Tejun Heo
>From 8efe1223d73c218ce7e8b2e0e9aadb974b582d7f Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 20 Sep 2019 13:39:57 -0700 Signed-off-by: Tejun Heo Reported-by: Qian Cai Fixes: def98c84b6cd ("workqueue: Fix spurious sanity check failures in destroy_workqueue()") --- Applied

[PATCH] workqueue: Fix spurious sanity check failures in destroy_workqueue()

2019-09-18 Thread Tejun Heo
of the workqueue. This patch fixes the above two by * If a workqueue has a rescuer, disable and kill the rescuer before sanity checks. Disabling and killing is guaranteed to flush the existing mayday list. * Remove sysfs interface before sanity checks. Signed-off-by: Tejun Heo Reported-by: Marcin

[GIT PULL] cgroup changes for v5.4-rc1

2019-09-16 Thread Tejun Heo
Hello, Linus. Three minor cleanup patches. Thanks. The following changes since commit ad5e427e0f6b702e52c11d1f7b2b7be3bac7de82: Merge branch 'parisc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux (2019-07-23 15:34:59 -0700) are available in the Git repository

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-09-16 Thread Tejun Heo
Hello, Song. On Sat, Sep 14, 2019 at 03:02:51PM +0100, Song Liu wrote: > I think we don't need a perfect identifier in this case. IIUC, the goal of I really don't want different versions of imperfect identifiers proliferating. > this patchset is to map each sample with a cgroup name (or full

[GIT PULL] cgroup fixes for v5.3-rc8

2019-09-12 Thread Tejun Heo
Hello, Linus. Roman found and fixed a bug in the cgroup2 freezer which allows new child cgroup to escape frozen state. Thanks. The following changes since commit 505a8ec7e11ae5236c4a154a1e24ef49a8349600: Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" (2019-09-12

Re: [PATCH 2/2] cgroup: freezer: fix frozen state inheritance

2019-09-12 Thread Tejun Heo
Applied 1-2 to cgroup/for-5.3-fixes. Thanks. -- tejun

Re: [PATCH 08/10] blkcg: implement blk-iocost

2019-09-11 Thread Tejun Heo
Hey, On Wed, Sep 11, 2019 at 07:16:30AM -0700, Tejun Heo wrote: > > > I can implement > > > the switching if so. > > > > That would be perfect. > > Whichever way it gets decided, this is easy enough. I'll prep a > patch. Here's the patch. * It disab

Re: [PATCH] cgroup: use kv(malloc|free) instead of pidlist_(allocate|free)

2019-09-10 Thread Tejun Heo
On Sun, Sep 08, 2019 at 05:40:41PM +0300, Ivan Safonov wrote: > Resolve TODO: > > The following two functions "fix" the issue where there are more pids > > than kmalloc will give memory for; in such cases, we use vmalloc/vfree. > > TODO: replace with a kernel-wide solution to this problem > >

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-09-05 Thread Tejun Heo
Hello, Namhyung. On Tue, Sep 03, 2019 at 10:13:08AM +0800, Namhyung Kim wrote: > So is my understanding below correct? > > * currently kernfs ino+gen is different than inode's ino+gen They're the same. It's just that cgroup has other less useful IDs too. > * but it'd be better to make them

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-08-30 Thread Tejun Heo
Hello, On Sat, Aug 31, 2019 at 12:03:26PM +0900, Namhyung Kim wrote: > Hmm.. it looks hard to use fhandle as the identifier since perf > sampling is done in NMI context. AFAICS the encode_fh part seems ok > but getting dentry/inode from a kernfs_node seems not. > > I assume kernfs_node_id's ino

[PATCH block/for-next] blkcg: add missing NULL check in ioc_cpd_alloc()

2019-08-30 Thread Tejun Heo
ioc_cpd_alloc() forgot to check NULL return from kzalloc(). Add it. Signed-off-by: Tejun Heo Reported-by: kbuild test robot --- block/blk-iocost.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 9c8046ac5925..2aae8ec391ef

Re: linux-next: build warning after merge of the block tree

2019-08-29 Thread Tejun Heo
On Thu, Aug 29, 2019 at 02:08:28PM +1000, Stephen Rothwell wrote: > From: Stephen Rothwell > Date: Thu, 29 Aug 2019 14:03:43 +1000 > Subject: [PATCH] blkcg: blk-iocost: predeclare used structs > > Signed-off-by: Stephen Rothwell Acked-by: Tejun Heo Thanks. > --- >

[PATCH 05/10] block/rq_qos: implement rq_qos_ops->queue_depth_changed()

2019-08-28 Thread Tejun Heo
wbt already gets queue depth changed notification through wbt_set_queue_depth(). Generalize it into rq_qos_ops->queue_depth_changed() so that other rq_qos policies can easily hook into the events too. Signed-off-by: Tejun Heo --- block/blk-rq-qos.c | 9 + block/blk-rq-qos.h |

[PATCH 04/10] block/rq_qos: add rq_qos_merge()

2019-08-28 Thread Tejun Heo
Add a merge hook for rq_qos. This will be used by io.weight. Signed-off-by: Tejun Heo --- block/blk-core.c | 4 block/blk-rq-qos.c | 9 + block/blk-rq-qos.h | 9 + 3 files changed, 22 insertions(+) diff --git a/block/blk-core.c b/block/blk-core.c index 77807a5d7f9e

[PATCH 02/10] blkcg: make ->cpd_init_fn() optional

2019-08-28 Thread Tejun Heo
For policies which can do enough initialization from ->cpd_alloc_fn(), make ->cpd_init_fn() optional. Signed-off-by: Tejun Heo --- block/blk-cgroup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 6a82ca3fb5cf..78ccbd

[PATCH 01/10] blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()

2019-08-28 Thread Tejun Heo
Instead of @node, pass in @q and @blkcg so that the alloc function has more context. This doesn't cause any behavior change and will be used by io.weight implementation. Signed-off-by: Tejun Heo --- block/bfq-cgroup.c | 5 +++-- block/blk-cgroup.c | 6 +++--- block/blk

[PATCH 10/10] blkcg: add tools/cgroup/iocost_coef_gen.py

2019-08-28 Thread Tejun Heo
Add a script which can be used to generate device-specific iocost linear model coefficients. Signed-off-by: Tejun Heo --- Documentation/admin-guide/cgroup-v2.rst | 3 + block/blk-iocost.c | 3 + tools/cgroup/iocost_coef_gen.py | 178 3

[PATCH 08/10] blkcg: implement blk-iocost

2019-08-28 Thread Tejun Heo
in Rik's fix for a divide-by-zero bug in current_hweight() triggered by zero inuse_sum. Signed-off-by: Tejun Heo Cc: Andy Newell Cc: Josef Bacik Cc: Rik van Riel --- Documentation/admin-guide/cgroup-v2.rst | 94 + block/Kconfig | 10 + block/Makefile

Re: [PATCH 2/9] perf/core: Add PERF_SAMPLE_CGROUP feature

2019-08-28 Thread Tejun Heo
On Wed, Aug 28, 2019 at 04:31:23PM +0900, Namhyung Kim wrote: > @@ -958,6 +958,7 @@ struct perf_sample_data { > u64 stack_user_size; > > u64 phys_addr; > + u64 cgroup; Ditto, please use fhandle

Re: [PATCH 1/9] perf/core: Add PERF_RECORD_CGROUP event

2019-08-28 Thread Tejun Heo
Hello, Namhyung. On Wed, Aug 28, 2019 at 04:31:22PM +0900, Namhyung Kim wrote: > + * struct { > + * struct perf_event_headerheader; > + * u64 ino; > + * u64 path_len; > + * char

[PATCHSET v2] writeback, memcg: Implement foreign inode flushing

2019-08-15 Thread Tejun Heo
Hello, Changes from v1[1]: * More comments explaining the parameters. * 0003-writeback-Separate-out-wb_get_lookup-from-wb_get_create.patch added and avoid spuriously creating missing wbs for foreign flushing. There's an inherent mismatch between memcg and writeback. The former trackes

Re: [PATCH 3/4] writeback, memcg: Implement cgroup_writeback_by_id()

2019-08-15 Thread Tejun Heo
On Thu, Aug 15, 2019 at 04:54:21PM +0200, Jan Kara wrote: > > + /* and find the associated wb */ > > + wb = wb_get_create(bdi, memcg_css, GFP_NOWAIT | __GFP_NOWARN); > > + if (!wb) { > > + ret = -ENOMEM; > > + goto out_css_put; > > + } > > One more thought: You don't

Re: [PATCH] kernfs: fix memleak in kernel_ops_readdir()

2019-08-07 Thread Tejun Heo
Hello, On Wed, Aug 07, 2019 at 06:29:28AM -0700, Tony Lindgren wrote: > Hi, > > * Tejun Heo [691231 23:00]: > > From: Andrea Arcangeli > > > > If getdents64 is killed or hits on segfault, it'll leave cgroups > > directories in sysfs pinned leaking memory

Re: [PATCH] Use kvmalloc in cgroups-v1

2019-08-07 Thread Tejun Heo
On Tue, Aug 06, 2019 at 03:24:12PM +0200, Marc Koderer wrote: > Instead of using its own logic for k-/vmalloc rely on > kvmalloc which is actually doing quite the same. > > Signed-off-by: Marc Koderer Applied to cgroup/for-5.4. Thanks. -- tejun

[PATCH] kernfs: fix memleak in kernel_ops_readdir()

2019-08-05 Thread Tejun Heo
mkdir $i; done # rmdir * # for i in `seq 1000`; do while :; do ls $i/ >/dev/null; done & done # while :; do killall ls; done kernfs_node_cache in /proc/slabinfo keeps going up as expected. Signed-off-by: Andrea Arcangeli Signed-off-by: Tejun Heo Cc: sta...@vger.kernel.org # goes w

<    1   2   3   4   5   6   7   8   9   10   >