Re: [PATCH 0/4 v2] BDI lifetime fix

2017-02-07 Thread Tejun Heo
Hello, On Tue, Feb 07, 2017 at 01:33:31PM +0100, Jan Kara wrote: > > We can see above that inode->i_wb is in r31, and the machine crashed at > > 0xc03799a0 so it was trying to dereference wb and crashed. > > r31 is NULL in the crash information. > > Thanks for report and the analysis.

Re: [PATCH V3 1/1] percpu-refcount: fix reference leak during percpu-atomic transition

2017-01-28 Thread Tejun Heo
On Sat, Jan 28, 2017 at 06:42:20AM -0600, Douglas Miller wrote: > percpu_ref_tryget() and percpu_ref_tryget_live() should return > "true" IFF they acquire a reference. But the return value from > atomic_long_inc_not_zero() is a long and may have high bits set, > e.g. PERCPU_COUNT_BIAS, and the

Re: [PATCH 1/1] percpu-refcount: fix reference leak during percpu-atomic transition

2017-01-27 Thread Tejun Heo
Hello, Douglas. On Fri, Jan 27, 2017 at 02:59:08PM -0600, Douglas Miller wrote: > @@ -212,7 +212,7 @@ static inline bool percpu_ref_tryget(struct percpu_ref > *ref) > this_cpu_inc(*percpu_count); > ret = true; > } else { > - ret =

Re: [PATCH V2 0/1] percpu-refcount: fix reference leak during percpu-atomic transition

2017-01-27 Thread Tejun Heo
On Fri, Jan 27, 2017 at 05:32:15PM -0600, Douglas Miller wrote: > Removed extraneous parentheses. Did not switch to "bool" as that would > necessitate > more testing and raises the question whether other platforms might have a > different > definition for "bool" that would not fix this problem.

Re: [PATCH 02/10] block: Unhash also block device inode for the whole device

2017-02-11 Thread Tejun Heo
; > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 08/10] block: Fix oops in locked_inode_to_wb_and_lock_list()

2017-02-11 Thread Tejun Heo
Hello, Jan. On Thu, Feb 09, 2017 at 01:44:31PM +0100, Jan Kara wrote: > When block device is closed, we call inode_detach_wb() in __blkdev_put() > which sets inode->i_wb to NULL. That is contrary to expectations that > inode->i_wb stays valid once set during the whole inode's lifetime and > leads

Re: [PATCH 09/10] kobject: Export kobject_get_unless_zero()

2017-02-11 Thread Tejun Heo
On Thu, Feb 09, 2017 at 01:44:32PM +0100, Jan Kara wrote: > Make the function available for outside use and fortify it against NULL > kobject. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Cc Greg? Thanks. -- tejun

Re: [PATCH 04/10] block: Move bdi_unregister() to del_gendisk()

2017-02-11 Thread Tejun Heo
k(). > > [1] http://marc.info/?l=linux-block=148554717109098=2 > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 05/10] writeback: Generalize and standardize I_SYNC waiting function

2017-02-11 Thread Tejun Heo
_wait_on_bit() unnecessarily. Switch it to wait_on_bit() to remove > some code. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 10/10] block: Fix oops scsi_disk_get()

2017-02-11 Thread Tejun Heo
gt; already 0 refcount and is already on its way to be freed. Indeed we've > got a warning from kobject_get() about 0 refcount shortly before the > oops. > > We fix the problem by using kobject_get_unless_zero() in get_disk() so > that get_disk() cannot get reference on a disk that is already being > freed. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 05/13] bdi: Mark congested->bdi as internal

2017-02-28 Thread Tejun Heo
be slightly ugly to special-case bdi->wb.congested to > avoid effectively a cyclic reference of bdi to itself and the reference > gets cleared from bdi_unregister() making it impossible to reference > a freed bdi. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 06/13] bdi: Make wb->bdi a proper reference

2017-02-28 Thread Tejun Heo
ff-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 10/13] bdi: Rename cgwb_bdi_destroy() to cgwb_bdi_unregister()

2017-02-28 Thread Tejun Heo
..@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 09/13] bdi: Do not wait for cgwbs release in bdi_unregister()

2017-02-28 Thread Tejun Heo
Hello, On Tue, Feb 21, 2017 at 06:09:54PM +0100, Jan Kara wrote: > @@ -726,14 +718,6 @@ static void cgwb_bdi_destroy(struct backing_dev_info > *bdi) > } > > spin_unlock_irq(_lock); > - > - /* > - * All cgwb's and their congested states must be shutdown and > - *

Re: [PATCH 08/13] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()

2017-02-28 Thread Tejun Heo
arily long time. It's not a correctness issue tho and if it becomes a problem we can deal with it by splitting wb_shutdown() into two pieces - the unregistration / issuing of the work items, and the flushing of those. Other than that, looks good to me. Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-02-28 Thread Tejun Heo
Hello, It generally looks good to me. The only worry I have is around wb_shutdown() synchronization and if that is actually an issue it shouldn't be too difficult to fix. The other thing which came to mind is that the congested->__bdi sever semantics. IIRC, that one was also to support the

Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-02-28 Thread Tejun Heo
Hello, Overall, the approach looks good to me but please see below. On Mon, Feb 27, 2017 at 06:49:57PM -0800, Tahsin Erdogan wrote: > @@ -806,44 +807,99 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct > blkcg_policy *pol, > if (!disk) > return -ENODEV; > if

Re: [PATCH 0/13 v2] block: Fix block device shutdown related races

2017-03-01 Thread Tejun Heo
Hello, Jan. On Wed, Mar 01, 2017 at 04:37:00PM +0100, Jan Kara wrote: > > The other thing which came to mind is that the congested->__bdi sever > > semantics. IIRC, that one was also to support the "bdi must go away now" > > behavior. As bdi is refcnted now, I think we can probably just let

Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-03-01 Thread Tejun Heo
Hello, On Tue, Feb 28, 2017 at 03:51:27PM -0800, Tahsin Erdogan wrote: > On Tue, Feb 28, 2017 at 2:47 PM, Tejun Heo <t...@kernel.org> wrote: > >> + if (!blkcg_policy_enabled(q, pol)) { > >> + ret = -EOPNOTSUPP; > >> +

Re: [PATCH 07/13] bdi: Move removal from bdi->wb_list into wb_shutdown()

2017-02-28 Thread Tejun Heo
ack works among bdi_writeback structures). It will also allow us > to simplify writeback shutdown in cgwb_bdi_destroy(). > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello, On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote: > > Don't know what's the right fix, but I posted a slightly different one > > for the same crash some months ago: > > https://patchwork.kernel.org/patch/8556941/ > > > > Ah, I'm sorry, I didn't see that. > > Your patch is

Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello, On Mon, Aug 29, 2016 at 11:33:41PM +0200, Vegard Nossum wrote: > On 08/29/2016 09:55 PM, Tejun Heo wrote: > > I think the right thing to do there is doing blkdev_get() / > > blkdev_put() around func() invocation in iterate_bdevs() rather than > > holding bd_mutex acro

Re: [PATCH v2] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-30 Thread Tejun Heo
k-cgroup.c:1378: blkcg_policy_register() error: double unlock > 'mutex:_pol_mutex' > > Fixes: 06b285bd1125 ("blkcg: fix blkcg_policy_data allocation bug") > Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> > Cc: Tejun Heo <t...@kernel.org> > Cc: <sta...

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Tejun Heo
Hello, Paolo. On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > In this respect, for your generic, unpredictable scenario to make > sense, there must exist at least one real system that meets the > requirements of such a scenario. Or, if such a real system does not > yet exist, it

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Paolo. On Tue, Oct 04, 2016 at 09:02:47PM +0200, Paolo Valente wrote: > That's exactly what BFQ has succeeded in doing in all the tests > devised so far. Can you give me a concrete example for which I can > try with BFQ and with any other mechanism you deem better. If > you are right,

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Paolo. On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote: > > Hmm... I think we already discussed this but here's a really simple > > case. There are three unknown workloads A, B and C and we want to > > give A certain best-effort guarantees (let's say around 80% of the > >

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Vivek. On Tue, Oct 04, 2016 at 02:12:45PM -0400, Vivek Goyal wrote: > Agreed that we don't have a good basic unit to measure IO cost. I was > thinking of measuring cost in terms of sectors as that's simple and > gets more accurate on faster devices with almost no seek penalty. And If this

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-10-04 Thread Tejun Heo
Hello, Adam. On Tue, Oct 04, 2016 at 08:49:18AM -0700, Adam Manzanares wrote: > > I wonder whether the right thing to do is adding bio->bi_ioprio which > > is initialized on bio submission and carried through req->ioprio. > > I looked around and thought about this and I'm not sure if this will

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Vivek. On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote: > On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: > > Hi, > > > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. > > So this is an interim

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote: > Could you please elaborate more on this point? BFQ uses sectors > served to measure service, and, on the all the fast devices on which > we have tested it, it accurately distributes > bandwidth as desired, redistributes

Re: [PATCH] blkcg: Annotate blkg_hint correctly

2016-09-23 Thread Tejun Heo
On Fri, Sep 23, 2016 at 09:07:56AM -0700, Bart Van Assche wrote: > Avoid that sparse complains about blkg_hint manipulations. > > Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") > Signed-off-by: Bart Van Assche <bart.vanass...@sandisk.com> >

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-09-29 Thread Tejun Heo
Hello, On Tue, Sep 27, 2016 at 11:14:54AM -0700, Adam Manzanares wrote: > Patch adds association between iocontext and a request. > > Signed-off-by: Adam Manzanares Can you please describe how this may impact existing usages? Thanks. -- tejun -- To unsubscribe from

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-10-02 Thread Tejun Heo
Hello, Adam. On Fri, Sep 30, 2016 at 09:02:17AM -0700, Adam Manzanares wrote: > I'll start with the changes I made and work my way through a grep of > > ioprio. Please add or correct any of the assumptions I have made. > Well, it looks like you're the one who's most

Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Tejun Heo
Hello, On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote: > Patch adds an association between iocontext ioprio and the ioprio of a > request. This is done to enable request based drivers the ability to > act on priority information stored in the request. An example being > ATA

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
this feature is turned off. This patch depends on ata: Enabling ATA Command Priorities tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial ata_ncq_prio_on() and open-coded the test. Signed-off-by: Adam Manzanares <adam.manzana...@hgst.com> Signed-off-by: Tejun Heo <t...@kernel.

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote: > Add a sysfs entry to turn on priority information being passed > to a ATA device. By default this feature is turned off. > > This patch depends on ata: Enabling ATA Command Priorities > > Signed-off-by: Adam Manzanares

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-23 Thread Tejun Heo
Hello, On Tue, Nov 22, 2016 at 03:18:24PM -0800, Shaohua Li wrote: > Hmm, it's not a real 'time slice'. The name is a bit confusion. Maybe rename > it > to 'throtl_interval' or 'throtl_sampling_time'? not sure. bandwidth and iops > are always in terms of a time interval we measure them. We can't

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-22 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:14PM -0800, Shaohua Li wrote: > throtl_slice is important for blk-throttling. A lot of stuffes depend on > it, for example, throughput measurement. It has 100ms default value, > which is not appropriate for all disks. For example, for SSD we might > use a

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Tejun Heo
Hello, On Tue, Nov 22, 2016 at 04:21:21PM -0500, Tejun Heo wrote: > 1. A cgroup and its high and max limits don't have much to do with >other cgroups and their limits. I don't get how the choice between >high and max limits can be a td-wide state. Ah, okay, this combines

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:12PM -0800, Shaohua Li wrote: > +static unsigned long tg_last_high_overflow_time(struct throtl_grp *tg) > +{ > + struct throtl_service_queue *parent_sq; > + struct throtl_grp *parent = tg; > + unsigned long ret = __tg_last_high_overflow_time(tg); >

Re: [PATCH V4 02/15] blk-throttle: add .high interface

2016-11-22 Thread Tejun Heo
Hello, Shaohua. Sorry about the delay. On Mon, Nov 14, 2016 at 02:22:09PM -0800, Shaohua Li wrote: > @@ -1376,11 +1414,37 @@ static ssize_t tg_set_max(struct kernfs_open_file *of, > goto out_finish; > } > > - tg->bps[READ][LIMIT_MAX] = v[0]; > -

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-28 Thread Tejun Heo
Hello, Shaohua. On Wed, Nov 23, 2016 at 05:06:30PM -0800, Shaohua Li wrote: > > Shouldn't this be a per-cgroup setting along with latency target? > > These two are the parameters which define how the cgroup should be > > treated time-wise. > > It should be easy to make it per-cgroup. Just not

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-28 Thread Tejun Heo
Hello, Shaohua. On Wed, Nov 23, 2016 at 05:15:18PM -0800, Shaohua Li wrote: > > Hmm... I'm not sure thinktime is the best measure here. Think time is > > used by cfq mainly to tell the likely future behavior of a workload so > > that cfq can take speculative actions on the prediction. However,

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-23 Thread Tejun Heo
On Mon, Nov 14, 2016 at 02:22:18PM -0800, Shaohua Li wrote: > Add interface to configure the threshold > > Signed-off-by: Shaohua Li > --- > block/blk-sysfs.c| 7 +++ > block/blk-throttle.c | 25 + > block/blk.h | 4 > 3 files

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-23 Thread Tejun Heo
Hello, Shaohua. On Mon, Nov 14, 2016 at 02:22:17PM -0800, Shaohua Li wrote: > Unfortunately it's very hard to determine if a cgroup is real idle. This > patch uses the 'think time check' idea from CFQ for the purpose. Please > note, the idea doesn't work for all workloads. For example, a workload

Re: [PATCH v5 4/4] ata: ATA Command Priority Disabled By Default

2016-10-13 Thread Tejun Heo
Hello, Adam. Sorry about late reply. Was on vacation. On Thu, Oct 13, 2016 at 04:00:31PM -0700, Adam Manzanares wrote: > Add a sysfs entry to turn on priority information being passed > to a ATA device. By default this feature is turned off. > > This patch depends on ata: Enabling ATA Command

Re: Fwd: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-14 Thread Tejun Heo
Hello, Kyle. On Sat, Oct 08, 2016 at 06:15:14PM -0700, Kyle Sanderson wrote: > How is this even a discussion when hard numbers, and trying any > reproduction case easily reproduce the issues that CFQ causes. Reading > this thread, and many others only grows not only my disappointment, > but

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-14 Thread Tejun Heo
Hello, Paolo. On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote: > That said, your 'thus' seems a little too strong: "bfq does not yet > handle fast SSDs, thus we need something else". What about the > millions of devices (and people) still within 10-20 K IOPS, and > experiencing

Re: [PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-29 Thread Tejun Heo
Hello, On Tue, Nov 29, 2016 at 10:30:44AM -0800, Shaohua Li wrote: > > As discussed separately, it might make more sense to just use the avg > > of the closest bucket instead of trying to line-fit the buckets, but > > it's an implementation detail and whatever which works is fine. > > that is

Re: [PATCH V4 15/15] blk-throttle: add latency target support

2016-11-29 Thread Tejun Heo
Hello, On Tue, Nov 29, 2016 at 10:14:03AM -0800, Shaohua Li wrote: > What the patches do doesn't conflict what you are talking about. We need a way > to detect if cgroups are idle or active. I think the problem is how to define > 'active' and 'idle'. We must quantify the state. We could use: > 1.

Re: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-09 Thread Tejun Heo
Hello, Rasmus. On Thu, Dec 08, 2016 at 02:22:55AM +0100, Rasmus Villemoes wrote: > TL;DR: these patches save 250 KB of memory, with more low-hanging > fruit ready to pick. > > While browsing through the lib/idr.c code, I noticed that the code at > the end of ida_get_new_above() probably doesn't

Re: [PATCH v7 0/4] Enabling ATA Command Priorities

2016-12-06 Thread Tejun Heo
Adam, On Tue, Dec 06, 2016 at 09:18:01AM -0800, Adam Manzanares wrote: > From: Adam Manzanares > > This patch builds ATA commands with high priority if the iocontext of a > process > is set to real time. The goal of the patch is to improve tail latencies of >

Re: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-12 Thread Tejun Heo
Hello, On Fri, Dec 09, 2016 at 02:01:40PM -0800, Andrew Morton wrote: > On Thu, 8 Dec 2016 02:22:55 +0100 Rasmus Villemoes > wrote: > > > TL;DR: these patches save 250 KB of memory, with more low-hanging > > fruit ready to pick. > > > > While browsing through the

Re: [RFC 00/10] implement alternative and much simpler id allocator

2016-12-12 Thread Tejun Heo
Hello, Matthew. On Mon, Dec 12, 2016 at 05:35:17PM +, Matthew Wilcox wrote: > I know the preload followed by preload_end looks wrong. I don't > think it's broken though. If we get preempted, then the worst > situation is that we'll end up with the memory we preallocated being > allocated to

Re: [PATCH V5 13/17] blk-throttle: ignore idle cgroup limit

2017-01-09 Thread Tejun Heo
On Thu, Dec 15, 2016 at 12:33:04PM -0800, Shaohua Li wrote: > Last patch introduces a way to detect idle cgroup. We use it to make > upgrade/downgrade decision. And the new algorithm can detect completely > idle cgroup too, so we can delete the corresponding code. Ah, okay, the slice based idle

Re: [PATCH V5 09/17] blk-throttle: detect completed idle cgroup

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:33:00PM -0800, Shaohua Li wrote: > @@ -1660,6 +1671,11 @@ static bool throtl_tg_can_downgrade(struct throtl_grp > *tg) > struct throtl_data *td = tg->td; > unsigned long now = jiffies; > > + if (time_after_eq(now, tg->last_dispatch_time[READ] +

Re: [PATCH V5 16/17] blk-throttle: add a mechanism to estimate IO latency

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:33:07PM -0800, Shaohua Li wrote: > User configures latency target, but the latency threshold for each > request size isn't fixed. For a SSD, the IO latency highly depends on > request size. To calculate latency threshold, we sample some data, eg, > average

Re: [PATCH V5 15/17] block: track request size in blk_issue_stat

2017-01-09 Thread Tejun Heo
On Thu, Dec 15, 2016 at 12:33:06PM -0800, Shaohua Li wrote: > Currently there is no way to know the request size when the request is > finished. Next patch will need this info, so add to blk_issue_stat. With > this, we will have 49bits to track time, which still is very long time. Not necessarily

Re: [PATCH V5 05/17] blk-throttle: add upgrade logic for LIMIT_LOW state

2017-01-09 Thread Tejun Heo
Hello, again. On Mon, Jan 09, 2017 at 01:40:53PM -0500, Tejun Heo wrote: > I think it'd be great to explain the above. It was a bit difficult > for me to follow. It's also interesting because we're tying state > transitions for both read and write together. blk-throtl has been > ha

Re: [PATCH V5 10/17] blk-throttle: make bandwidth change smooth

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:33:01PM -0800, Shaohua Li wrote: > static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw) > { > struct blkcg_gq *blkg = tg_to_blkg(tg); > + struct throtl_data *td; > uint64_t ret; > > if (cgroup_subsys_on_dfl(io_cgrp_subsys) &&

Re: [PATCH V5 12/17] blk-throttle: add interface to configure idle time threshold

2017-01-09 Thread Tejun Heo
On Thu, Dec 15, 2016 at 12:33:03PM -0800, Shaohua Li wrote: > @@ -180,6 +180,8 @@ struct throtl_data > unsigned int limit_index; > bool limit_valid[LIMIT_CNT]; > > + u64 dft_idle_ttime_threshold; BTW, wouldn't idle_time be a better name for these? Currently, it's "idle

Re: [PATCH V5 08/17] blk-throttle: make throtl_slice tunable

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:32:59PM -0800, Shaohua Li wrote: > throtl_slice is important for blk-throttling. It's called slice > internally but it really is a time window blk-throttling samples data. > blk-throttling will make decision based on the samplings. An example is > bandwidth

Re: [PATCH V5 11/17] blk-throttle: add a simple idle detection

2017-01-09 Thread Tejun Heo
On Thu, Dec 15, 2016 at 12:33:02PM -0800, Shaohua Li wrote: > /* Throttling is performed over 100ms slice and after that slice is renewed > */ > #define DFL_THROTL_SLICE (HZ / 10) > #define MAX_THROTL_SLICE (HZ / 5) > +#define DFL_IDLE_THRESHOLD_SSD (50 * 1000) /* 50 us */ > +#define

Re: [PATCH V4 13/15] blk-throttle: add a mechanism to estimate IO latency

2016-11-29 Thread Tejun Heo
Hello, Shaohua. On Mon, Nov 14, 2016 at 02:22:20PM -0800, Shaohua Li wrote: > To do this, we sample some data, eg, average latency for request size > 4k, 8k, 16k, 32k, 64k. We then use an equation f(x) = a * x + b to fit > the data (x is request size in KB, f(x) is the latency). Then we can use >

Re: [PATCH V4 15/15] blk-throttle: add latency target support

2016-11-29 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:22PM -0800, Shaohua Li wrote: > One hard problem adding .high limit is to detect idle cgroup. If one > cgroup doesn't dispatch enough IO against its high limit, we must have a > mechanism to determine if other cgroups dispatch more IO. We added the > think

Re: [PATCH V5 03/17] blk-throttle: add .low interface

2017-01-09 Thread Tejun Heo
Happy new year, Shaohua. Sorry about the long delay. On Thu, Dec 15, 2016 at 12:32:54PM -0800, Shaohua Li wrote: > Add low limit for cgroup and corresponding cgroup interface. It'd be nice to explain why we're adding separate _conf fields. > +static void blk_throtl_update_valid_limit(struct

Re: [PATCH V5 04/17] blk-throttle: configure bps/iops limit for cgroup in low limit

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:32:55PM -0800, Shaohua Li wrote: > each queue will have a state machine. Initially queue is in LIMIT_LOW > state, which means all cgroups will be throttled according to their low > limit. After all cgroups with low limit cross the limit, the queue state > gets

Re: [PATCH V5 05/17] blk-throttle: add upgrade logic for LIMIT_LOW state

2017-01-09 Thread Tejun Heo
Hello, Shaohua. On Thu, Dec 15, 2016 at 12:32:56PM -0800, Shaohua Li wrote: > For a cgroup hierarchy, there are two cases. Children has lower low > limit than parent. Parent's low limit is meaningless. If children's > bps/iops cross low limit, we can upgrade queue state. The other case is >

Re: [PATCH V5 07/17] blk-throttle: make sure expire time isn't too big

2017-01-09 Thread Tejun Heo
On Thu, Dec 15, 2016 at 12:32:58PM -0800, Shaohua Li wrote: > cgroup could be throttled to a limit but when all cgroups cross high > limit, queue enters a higher state and so the group should be throttled > to a higher limit. It's possible the cgroup is sleeping because of > throttle and other

Re: [PATCH V5 14/17] blk-throttle: add interface for per-cgroup target latency

2017-01-09 Thread Tejun Heo
Hello, On Thu, Dec 15, 2016 at 12:33:05PM -0800, Shaohua Li wrote: > @@ -438,6 +439,11 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t > gfp, int node) > } > tg->idle_ttime_threshold = U64_MAX; > > + /* > + * target latency default 0, eg, latency threshold is 0,

Re: support ranges TRIM for libata

2017-03-21 Thread Tejun Heo
Hello, On Mon, Mar 20, 2017 at 04:43:12PM -0400, Christoph Hellwig wrote: > This series implements rangeѕ discard for ATA SSDs. Compared to the > initial NVMe support there are two things that complicate the ATA > support: > > - ATA only suports 16-bit long ranges > - the whole mess of

Re: [PATCH 02/11] block: Fix race of bdev open with gendisk shutdown

2017-03-23 Thread Tejun Heo
Hello, Jan. On Thu, Mar 23, 2017 at 01:21:05AM +0100, Jan Kara wrote: > > * A bdi is the object which knows which request_queue it's associated > > and when that association dies. > > So bdi is associated with a request_queue but it does not have any > direct pointer or reference to it. It is

Re: support ranges TRIM for libata

2017-03-23 Thread Tejun Heo
Hello, Christoph. On Thu, Mar 23, 2017 at 03:43:30PM +0100, Christoph Hellwig wrote: > > That's up to you ... from the point of view of code documenting itself, > > forming the ATA_16 TRIM in sd and not doing any satl transformation is > > easier for others to follow, but if it's going to cause

Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-03-28 Thread Tejun Heo
nisms to handle non-atomic allocations. This drop lock / alloc / relock / check invariants in the main path and preallocation logic used in the init path. Right now, both proposed implementations aren't that satisfactory. Personally, I'd prefer superficial ugliness to structural duplications, but, ideally, we shouldn't have to make this choice. idk, it's a bug fix. We can always clean things up later. Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH V3 02/16] block, bfq: add full hierarchical scheduling and cgroups support

2017-04-11 Thread Tejun Heo
Hello, On Tue, Apr 11, 2017 at 03:43:01PM +0200, Paolo Valente wrote: > From: Arianna Avanzini > > Add complete support for full hierarchical scheduling, with a cgroups > interface. Full hierarchical scheduling is implemented through the > 'entity' abstraction: both

Re: [PATCH V3 02/16] block, bfq: add full hierarchical scheduling and cgroups support

2017-04-18 Thread Tejun Heo
Hello, Paolo. On Wed, Apr 12, 2017 at 07:22:03AM +0200, Paolo Valente wrote: > could you elaborate a bit more on this? I mean, cgroups support has > been in BFQ (and CFQ) for almost ten years, perfectly working as far > as I know. Of course it is perfectly working in terms of I/O and not > of

Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-09 Thread Tejun Heo
Tested-by: Omar Sandoval <osan...@fb.com> > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks! -- tejun

Re: [PATCH v2] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-03-03 Thread Tejun Heo
Hello, Tahsin. On Thu, Mar 02, 2017 at 02:33:11PM -0800, Tahsin Erdogan wrote: > > And let blkg_create() verify these conditions after releasing and > > regrabbing the lock. > > > > This also means that the init path can simply pass in GFP_KERNEL. > > I tried that approach, but I encountered two

Re: [PATCH] block: use put_io_context_active() to disassociate bio from a task

2017-03-03 Thread Tejun Heo
implement bio_associate_current()") Cc: sta...@vger.kernel.org # v3.5+ I think the failure mode isn't severe. We'll be carrying around ioc's longer than necessary but that's about it. I wonder whether the logic can be simplified in general. Anyways, please feel free to add Acked-by

Re: [PATCH v3] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-03-04 Thread Tejun Heo
Hello, Tahsin. Generally looks good. Please see below. > @@ -184,24 +185,48 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, > + if (unlikely(!wb_congested)) { > ret = -ENOMEM; > goto err_put_css; > + } else if (unlikely(!blkg)) { > +

Re: [PATCH 2/4] bdi: Fix use-after-free in wb_congested_put()

2017-03-08 Thread Tejun Heo
i regardless whether it ever got registered > or not. > > Fixes: 165a5e22fafb127ecb5914e12e8c32a1f0d3f820 > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 3/4] block: Make del_gendisk() safer for disks without queues

2017-03-08 Thread Tejun Heo
blk_unregister_queue() warns in that case, this change will make it oops > instead. Return to the old more robust behavior of just warning when > del_gendisk() gets called for gendisk with disk->queue being NULL. > > Reported-by: Dan Carpenter <dan.carpen...@oracle.com> > Signed-

Re: [PATCH 1/4] block: Allow bdi re-registration

2017-03-08 Thread Tejun Heo
Hello, On Wed, Mar 08, 2017 at 05:48:31PM +0100, Jan Kara wrote: > @@ -710,6 +710,11 @@ static void cgwb_bdi_destroy(struct backing_dev_info > *bdi) >*/ > atomic_dec(>usage_cnt); > wait_event(cgwb_release_wait, !atomic_read(>usage_cnt)); > + /* > + * Grab back our

Re: [PATCH 01/11] block: Fix bdi assignment to bdev inode when racing with disk delete

2017-03-06 Thread Tejun Heo
node will be pointing to a stale bdi. Fix the problem by setting > bdev->bd_bdi later when __blkdev_get() is already guaranteed to succeed. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 06/11] bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()

2017-03-06 Thread Tejun Heo
Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 07/11] bdi: Do not wait for cgwbs release in bdi_unregister()

2017-03-06 Thread Tejun Heo
efore all cgwbs are released) and when cgwb_bdi_destroy() > shuts down writeback directly. > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 02/11] block: Fix race of bdev open with gendisk shutdown

2017-03-06 Thread Tejun Heo
Hello, On Mon, Mar 06, 2017 at 05:33:55PM +0100, Jan Kara wrote: > + disk->flags &= ~GENHD_FL_UP; > + /* > + * Make sure __blkdev_open() sees the disk is going away before > + * starting to unhash bdev inodes. > + */ > + smp_wmb(); But which rmb is this paired with?

Re: [PATCH v2] blkcg: allocate struct blkcg_gq outside request queue spinlock

2017-03-02 Thread Tejun Heo
Hello, Tahsin. On Wed, Mar 01, 2017 at 03:43:19PM -0800, Tahsin Erdogan wrote: > @@ -258,18 +258,22 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, > struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, > - struct request_queue *q) > +

Re: [PATCH 11/13] block: Fix oops in locked_inode_to_wb_and_lock_list()

2017-02-28 Thread Tejun Heo
postpone calling inode_detach_wb() to that > moment. > > Also add a warning to catch if someone uses inode_detach_wb() in a > dangerous way. > > Reported-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH 02/11] block: Fix race of bdev open with gendisk shutdown

2017-03-07 Thread Tejun Heo
Hello, On Tue, Mar 07, 2017 at 12:14:20PM +0100, Jan Kara wrote: > > Given that this isn't a super hot path, I think it'd be far better to > > stick to a simpler synchronization mechanism. > > I'd be happy to do so however struct gendisk does not have any lock in it > and introducing a special

Re: [PATCH 02/11] block: Fix race of bdev open with gendisk shutdown

2017-03-21 Thread Tejun Heo
Hello, Jan. On Mon, Mar 13, 2017 at 04:14:01PM +0100, Jan Kara wrote: > blkdev_open() may race with gendisk shutdown in two different ways. > Either del_gendisk() has already unhashed block device inode (and thus > bd_acquire() will end up creating new block device inode) however > gen_gendisk()

Re: [PATCH 05/11] bdi: Unify bdi->wb_list handling for root wb_writeback

2017-03-21 Thread Tejun Heo
for us to handle shutdown of all > wb_writeback structures in bdi_unregister(). > > Signed-off-by: Jan Kara <j...@suse.cz> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: Lots of new warnings with gcc-7.1.1

2017-07-15 Thread Tejun Heo
Hello, On Wed, Jul 12, 2017 at 03:31:02PM +0200, Arnd Bergmann wrote: > > We also have about a bazillion > > > > warning: ‘*’ in boolean context, suggest ‘&&’ instead > > > > warnings in drivers/ata/libata-core.c, all due to a single macro that > > uses a pattern that gcc-7.1.1 doesn't like.

Re: [PATCH V4 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-28 Thread Tejun Heo
t as > possible. To make the API lock free, kernfs node is freed in RCU > context. And we depend on kernfs_node count/ino number to filter out > stale kernfs nodes. > > Signed-off-by: Shaohua Li <s...@fb.com> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH] sd: add support for TCG OPAL self encrypting disks

2017-06-28 Thread Tejun Heo
On Mon, Jun 19, 2017 at 02:26:46PM +0200, Christoph Hellwig wrote: > Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver > and the Security In/Out commands. > > Note that I don't know of any actual SCSI disks that do support TCG OPAL, > but this is required to support ATA

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Tejun Heo
Hello, On Wed, Jun 28, 2017 at 10:54:28AM -0600, Jens Axboe wrote: > >> Series looks fine to me. I don't know how you want to split or funnel it, > >> since it touches multiple different parts. Would it make sense to split > >> this > >> series into two - one for the kernfs changes, and then a

Re: [PATCH V4 07/12] cgroup: export fhandle info for a cgroup

2017-06-28 Thread Tejun Heo
Signed-off-by: Shaohua Li <s...@fb.com> Acked-by: Tejun Heo <t...@kernel.org> Thanks. -- tejun

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Tejun Heo
Hello, On Wed, Jun 28, 2017 at 02:57:38PM -0600, Jens Axboe wrote: > Personally I don't care that much, but the risk of conflicts is much > higher on the block side, than on the kernfs side. So might be the > path of less resistance to pull it through the block tree. And I'd be > happy to do

Re: [PATCH 1/2] powerpc/workqueue: update list of possible CPUs

2017-08-22 Thread Tejun Heo
Hello, Michael. On Tue, Aug 22, 2017 at 11:41:41AM +1000, Michael Ellerman wrote: > > This is something powerpc needs to fix. > > There is no way for us to fix it. I don't think that's true. The CPU id used in kernel doesn't have to match the physical one and arch code should be able to

Re: [PATCH 1/2] powerpc/workqueue: update list of possible CPUs

2017-08-23 Thread Tejun Heo
Hello, Michael. On Wed, Aug 23, 2017 at 09:00:39PM +1000, Michael Ellerman wrote: > > I don't think that's true. The CPU id used in kernel doesn't have to > > match the physical one and arch code should be able to pre-map CPU IDs > > to nodes and use the matching one when hotplugging CPUs. I'm

  1   2   3   >