Re: [PATCH v2 02/26] sysfs: export sysfs_remove_file_self()

2018-05-18 Thread Tejun Heo
On Fri, May 18, 2018 at 03:03:49PM +0200, Roman Pen wrote: > Function is going to be used in transport over RDMA module > in subsequent patches. > > Signed-off-by: Roman Pen > Cc: Tejun Heo > Cc: linux-ker...@vger.kernel.org Acked-by: Tejun Heo Please feel free to apply

Re: [PATCH] bdi: Fix another oops in wb_workfn()

2018-05-26 Thread Tejun Heo
On Sun, May 27, 2018 at 11:21:25AM +0900, Tetsuo Handa wrote: > From 8a8222698163d1fe180258566e9a3ff43f54fcd9 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Sun, 27 May 2018 11:08:20 +0900 > Subject: [PATCH] bdi: Fix another oops in wb_workfn() > > syzbot is still hitting NULL pointer dere

Re: [PATCH] bdi: Fix another oops in wb_workfn()

2018-05-29 Thread Tejun Heo
On Sun, May 27, 2018 at 01:43:45PM +0900, Tetsuo Handa wrote: > Tejun Heo wrote: > > On Sun, May 27, 2018 at 11:21:25AM +0900, Tetsuo Handa wrote: > > > syzbot is still hitting NULL pointer dereference at wb_workfn() [1]. > > > This might be because we overlooked that d

Re: [GIT PULL] Block changes for 4.18-rc

2018-06-04 Thread Tejun Heo
Hey, Linus. On Mon, Jun 04, 2018 at 09:24:28AM -0700, Linus Torvalds wrote: > Tejun, the code in question is mddev_put() in drivers/md/md.c, and it > basically does > > INIT_WORK(&mddev->del_work, mddev_delayed_delete); > queue_work(md_misc_wq, &mddev->del_work); >

Re: [PATCH] bdi: Fix another oops in wb_workfn()

2018-06-11 Thread Tejun Heo
Hello, On Mon, Jun 11, 2018 at 11:12:48AM +0200, Jan Kara wrote: > However this is wrong and so is the patch. The problem is in > cgwb_bdi_unregister() which does cgwb_kill() and thus drops bdi's > reference to wb structures before going through the list of wbs again and > calling wb_shutdown() on

Re: [PATCH] bdi: Fix another oops in wb_workfn()

2018-06-18 Thread Tejun Heo
way we also no longer need synchronization using WB_shutting_down as the > mutex provides it for CONFIG_CGROUP_WRITEBACK case and without > CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from > bdi_unregister(). > > Reported-by: syzbot > Signed-off-by: Jan Kara Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH v3 4/9] percpu-refcount: Introduce percpu_ref_read()

2018-08-02 Thread Tejun Heo
Hello, On Thu, Aug 02, 2018 at 11:29:39AM -0700, Bart Van Assche wrote: > Introduce a function that allows to read the value of a per-cpu counter. > This function will be used in the next patch to check whether a per-cpu > counter in atomic mode has the value one. I'm not a big fan of exposing th

Re: [PATCH v3 4/9] percpu-refcount: Introduce percpu_ref_read()

2018-08-06 Thread Tejun Heo
Hello, Bart. On Thu, Aug 02, 2018 at 01:04:43PM -0700, Bart Van Assche wrote: > As you probably know one of the long term goals for the block layer > is to switch to blk-mq and to drop the legacy block layer. Hence > this patch series that adds support for runtime power management to > blk-mq beca

Re: [PATCH v5 4/9] percpu-refcount: Introduce percpu_ref_is_in_use()

2018-08-08 Thread Tejun Heo
Hello, On Tue, Aug 07, 2018 at 03:51:28PM -0700, Bart Van Assche wrote: > Introduce a function that allows to determine whether a per-cpu refcount > is in use. This function will be used in a later patch to determine > whether or not any block layer requests are being executed. I thought about it

Re: [PATCH v5 1/3] blkcg: Introduce blkg_root_lookup()

2018-08-09 Thread Tejun Heo
Hello, On Thu, Aug 09, 2018 at 07:53:36AM -0700, Bart Van Assche wrote: > +/** > + * blkg_lookup - look up blkg for the specified request queue > + * @q: request_queue of interest > + * > + * Lookup blkg for @q at the root level. See also blkg_lookup(). > + */ > +static inline struct blkcg_gq *blk

Re: [PATCH v10 5/8] percpu-refcount: Introduce percpu_ref_resurrect()

2018-09-24 Thread Tejun Heo
rcu grace period. Can you please note that in the function comment? Provided that, Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH v10 5/8] percpu-refcount: Introduce percpu_ref_resurrect()

2018-09-26 Thread Tejun Heo
Hello, Bart. On Mon, Sep 24, 2018 at 01:43:35PM -0700, Bart Van Assche wrote: > Thanks for the review. But could you please clarify what "after ->percpu_ref > is called" refers to? Oops, sorry, I meant the confirm_kill callback of percpu_ref_kill_and_confirm(). Thanks. -- tejun

Re: [PATCH] blk-throttle: verify format of bandwidth limit and detect overflows

2019-03-05 Thread Tejun Heo
Hello, Konstantin. On Wed, Feb 27, 2019 at 11:05:44AM +0300, Konstantin Khlebnikov wrote: > Unlike to memory cgroup blkio throttler does not support value suffixes. > > It silently ignores everything after last digit. For example this command > will set rate limit 1 byte per second rather than 1

Re: [PATCH V3 02/16] block, bfq: add full hierarchical scheduling and cgroups support

2017-04-11 Thread Tejun Heo
Hello, On Tue, Apr 11, 2017 at 03:43:01PM +0200, Paolo Valente wrote: > From: Arianna Avanzini > > Add complete support for full hierarchical scheduling, with a cgroups > interface. Full hierarchical scheduling is implemented through the > 'entity' abstraction: both bfq_queues, i.e., the interna

Re: [PATCH V3 02/16] block, bfq: add full hierarchical scheduling and cgroups support

2017-04-18 Thread Tejun Heo
Hello, Paolo. On Wed, Apr 12, 2017 at 07:22:03AM +0200, Paolo Valente wrote: > could you elaborate a bit more on this? I mean, cgroups support has > been in BFQ (and CFQ) for almost ten years, perfectly working as far > as I know. Of course it is perfectly working in terms of I/O and not > of CP

Re: races between blk-cgroup operations and I/O scheds in blk-mq (?)

2017-05-17 Thread Tejun Heo
Hello, On Mon, May 15, 2017 at 09:49:13PM +0200, Paolo Valente wrote: > So, unless you tell me that there are other races I haven't seen, or, > even worse, that I'm just talking nonsense, I have thought of a simple > solution to address this issue without resorting to the request_queue > lock: fur

Re: [PATCH 0/4] blk-throttle: fix a few issues

2017-05-17 Thread Tejun Heo
On Wed, May 17, 2017 at 01:07:23PM -0700, Shaohua Li wrote: > Hi, > > The patchset fix a few issues for io.low limit. The title of the patches > explain the issues pretty well. Looks good to me. Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-19 Thread Tejun Heo
Hello, Paolo. On Fri, May 19, 2017 at 10:39:08AM +0200, Paolo Valente wrote: > Operations on blkg objects in blk-cgroup are protected with the > request_queue lock, which is no more the lock that protects > I/O-scheduler operations in blk-mq. The latter are now protected with > finer-grained per-s

Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-23 Thread Tejun Heo
Hello, Paolo. On Sat, May 20, 2017 at 09:27:33AM +0200, Paolo Valente wrote: > Consider a process or a group that is moved from a given source group > to a different group, or simply removed from a group (although I > didn't yet succeed in just removing a process from a group :) ). The > pointer

Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-24 Thread Tejun Heo
Hello, Paolo. On Wed, May 24, 2017 at 12:53:26PM +0100, Paolo Valente wrote: > Exact, but even after all blkgs, as well as the cfq_group and pd, are > gone, the children cfq_queues of the gone cfq_group continue to point > to unexisting objects, until new cfq_set_requests are executed for > those

Re: [PATCH 09/31] block: Avoid that blk_exit_rl() triggers a use-after-free

2017-05-24 Thread Tejun Heo
__blkg_release_rcu+0x59/0x170 > rcu_process_callbacks+0x260/0x4e0 > __do_softirq+0x116/0x250 > smpboot_thread_fn+0x123/0x1e0 > kthread+0x109/0x140 > ret_from_fork+0x31/0x40 > > Fixes: commit e9c787e65c0c ("scsi: allocate scsi_cmnd structures as part of > struct

Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-24 Thread Tejun Heo
Hello, On Wed, May 24, 2017 at 05:43:18PM +0100, Paolo Valente wrote: > > so none of the above objects can be destroyed before the request is > > done. > > ... the issue seems just to move to a more subtle position: cfq is ok, > because it protects itself with rq lock, but blk-mq schedulers don't

Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-25 Thread Tejun Heo
Hello, Paolo. On Thu, May 25, 2017 at 09:10:59AM +0200, Paolo Valente wrote: > Ok. So, just to better understand: as of now, i.e., before you make > the changes you are proposing, the address returned by blkg_lookup can > be used safely only if one both invokes blkg_lookup and gets a > reference,

Re: TCG Opal support for libata

2017-06-05 Thread Tejun Heo
On Sun, Jun 04, 2017 at 02:42:19PM +0200, Christoph Hellwig wrote: > Hi all, > > this series adds support for using our new generic TCG OPAL code with > SATA disks, and as side effect for SCSI disks (although so far it doesn't > seem like none of those actually exist). Applied 1-5 to libata/for-4

Re: [PATCH 02/11] kernfs: use idr instead of ida to manage inode number

2017-06-12 Thread Tejun Heo
Hello, On Fri, Jun 02, 2017 at 02:53:55PM -0700, Shaohua Li wrote: > @@ -630,7 +633,11 @@ static struct kernfs_node *__kernfs_new_node(struct > kernfs_root *root, > if (!kn) > goto err_out1; > > - ret = ida_simple_get(&root->ino_ida, 1, 0, GFP_KERNEL); > + idr_preloa

Re: [PATCH 03/11] kernfs: add an API to get kernfs node from inode number

2017-06-12 Thread Tejun Heo
Hello, On Fri, Jun 02, 2017 at 02:53:56PM -0700, Shaohua Li wrote: > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -643,6 +643,7 @@ static struct kernfs_node *__kernfs_new_node(struct > kernfs_root *root, > kn->ino = ret; > kn->generation = atomic_inc_return(&root->next_generatio

Re: [PATCH 04/11] kernfs: don't set dentry->d_fsdata

2017-06-12 Thread Tejun Heo
Hello, On Fri, Jun 02, 2017 at 02:53:57PM -0700, Shaohua Li wrote: > @@ -570,7 +570,8 @@ static int kernfs_dop_revalidate(struct dentry *dentry, > unsigned int flags) > goto out_bad; > > /* The kernfs node has been moved? */ > - if (dentry->d_parent->d_fsdata != kn->pare

Re: [PATCH 03/11] kernfs: add an API to get kernfs node from inode number

2017-06-12 Thread Tejun Heo
Ooh, one more thing. On Mon, Jun 12, 2017 at 02:20:28PM -0400, Tejun Heo wrote: > > +struct kernfs_node *kernfs_get_node_by_ino(struct kernfs_root *root, > > + unsigned int ino) Can we name this kernfs_find_and_get_by_ino() for consistency?

Re: [PATCH 06/11] cgroup: export fhandle info for a cgroup

2017-06-12 Thread Tejun Heo
On Fri, Jun 02, 2017 at 02:53:59PM -0700, Shaohua Li wrote: > From: Shaohua Li > > Add an API to export cgroup fhandle info. We don't export a full 'struct > file_handle', there are unrequired info. Sepcifically, cgroup is always > a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fh

Re: [PATCH 08/11] block: always attach cgroup info into bio

2017-06-12 Thread Tejun Heo
On Fri, Jun 02, 2017 at 02:54:01PM -0700, Shaohua Li wrote: > @@ -691,6 +691,8 @@ static inline bool blkcg_bio_issue_check(struct > request_queue *q, > rcu_read_lock(); > blkcg = bio_blkcg(bio); > > + bio_associate_blkcg(bio, &blkcg->css); > + Let's please note that this only es

Re: [PATCH V3 02/12] kernfs: implement i_generation

2017-06-19 Thread Tejun Heo
really is a problem, we should > find better data structure. > > Signed-off-by: Shaohua Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V3 01/12] kernfs: use idr instead of ida to manage inode number

2017-06-19 Thread Tejun Heo
number. > > Signed-off-by: Shaohua Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V3 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-19 Thread Tejun Heo
Hello, On Thu, Jun 15, 2017 at 11:17:11AM -0700, Shaohua Li wrote: > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index 33f711f..7a4f327 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -508,6 +508,10 @@ void kernfs_put(struct kernfs_node *kn) > struct kernfs_node *parent; >

Re: [PATCH V3 04/12] kernfs: don't set dentry->d_fsdata

2017-06-19 Thread Tejun Heo
Look at the kernfs code > closely, there is no point to set dentry->d_fsdata. inode->i_private > already points to kernfs_node, and we can get inode from a dentry. So > this patch just delete the d_fsdata usage. > > Signed-off-by: Shaohua Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V3 05/12] kernfs: introduce kernfs_node_id

2017-06-19 Thread Tejun Heo
Hello, On Thu, Jun 15, 2017 at 11:17:13AM -0700, Shaohua Li wrote: > +/* represent a kernfs node */ > +struct kernfs_node_id { > + u32 ino; > + u32 generation; > +} __attribute__((packed)); Can we just make it a u64? kernfs cares about the details

Re: [PATCH V3 06/12] kernfs: add exportfs operations

2017-06-19 Thread Tejun Heo
Hello, On Thu, Jun 15, 2017 at 11:17:14AM -0700, Shaohua Li wrote: > -static int kernfs_fill_super(struct super_block *sb, unsigned long magic) > +static int kernfs_fill_super(struct super_block *sb, unsigned long magic, > + bool enable_expop) Hmm... can't we make this a

Re: [PATCH V3 09/12] block: always attach cgroup info into bio

2017-06-19 Thread Tejun Heo
e cgroup info into bio at > blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup > info. > > Signed-off-by: Shaohua Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V3 11/12] blktrace: add an option to allow displying cgroup path

2017-06-19 Thread Tejun Heo
Hello, On Thu, Jun 15, 2017 at 11:17:19AM -0700, Shaohua Li wrote: > From: Shaohua Li > > By default we output cgroup id in blktrace. This adds an option to > display cgroup path. Since get cgroup path is a relativly heavy > operation, we don't enable it by default. > > with the option enabled,

Re: [PATCH] sd: add support for TCG OPAL self encrypting disks

2017-06-26 Thread Tejun Heo
On Mon, Jun 26, 2017 at 12:43:27PM -0400, Martin K. Petersen wrote: > > Christoph, > > > ping? > > Looks good to me. I'll queue it up for 4.13 as soon as Linus has pulled > in the ata bits. I can route it through libata tree w/ your ack if that's more convenient. Thanks. -- tejun

Re: [PATCH V4 03/12] kernfs: add an API to get kernfs node from inode number

2017-06-28 Thread Tejun Heo
To make the API lock free, kernfs node is freed in RCU > context. And we depend on kernfs_node count/ino number to filter out > stale kernfs nodes. > > Signed-off-by: Shaohua Li Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Tejun Heo
Hello, On Wed, Jun 28, 2017 at 10:54:28AM -0600, Jens Axboe wrote: > >> Series looks fine to me. I don't know how you want to split or funnel it, > >> since it touches multiple different parts. Would it make sense to split > >> this > >> series into two - one for the kernfs changes, and then a su

Re: [PATCH V4 07/12] cgroup: export fhandle info for a cgroup

2017-06-28 Thread Tejun Heo
ed a 'FILEID_INO32_GEN_PARENT' type fhandle, > we only need export the inode number and generation number just like > what generic_fh_to_dentry does. And we can avoid the overhead of getting > an inode too, since kernfs_node_id (ino and generation) has all the info > requi

Re: [PATCH] sd: add support for TCG OPAL self encrypting disks

2017-06-28 Thread Tejun Heo
On Mon, Jun 19, 2017 at 02:26:46PM +0200, Christoph Hellwig wrote: > Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver > and the Security In/Out commands. > > Note that I don't know of any actual SCSI disks that do support TCG OPAL, > but this is required to support ATA disk

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Tejun Heo
Hello, On Wed, Jun 28, 2017 at 02:57:38PM -0600, Jens Axboe wrote: > Personally I don't care that much, but the risk of conflicts is much > higher on the block side, than on the kernfs side. So might be the > path of less resistance to pull it through the block tree. And I'd be > happy to do that,

Re: [PATCH V2 00/10] unify the interface of the proportional-share policy in blkio/io

2018-11-20 Thread Tejun Heo
Hello, Paolo. On Mon, Nov 19, 2018 at 11:34:14AM +0100, Paolo Valente wrote: > - if all entities produce the same output, the this common output is > shown only once; > - if the outputs differ, then every per-entity output is shown, > followed by the name of the entity that produced that output.

Re: [PATCH 04/13] blkcg: introduce common blkg association logic

2018-11-29 Thread Tejun Heo
are associating bsaed on the current css. If there is already a blkg > associated, the css will be reused and association will be redone as the > request_queue may have changed. > > Signed-off-by: Dennis Zhou Acked-by: Tejun Heo A minor nit below. > +/** > + * bio_associate_b

Re: [PATCH 05/13] blkcg: associate blkg when associating a device

2018-11-29 Thread Tejun Heo
On Mon, Nov 26, 2018 at 04:19:38PM -0500, Dennis Zhou wrote: > diff --git a/include/linux/bio.h b/include/linux/bio.h > index 62715a5a4f32..8bc9d9b29fd3 100644 > --- a/include/linux/bio.h > +++ b/include/linux/bio.h > @@ -486,6 +486,12 @@ extern unsigned int bvec_nr_vecs(unsigned short idx); > ext

Re: [PATCH 11/13] blkcg: remove bio_disassociate_task()

2018-11-29 Thread Tejun Heo
by: Dennis Zhou Acked-by: Tejun Heo Thanks. -- tejun

Re: [PATCH V2 00/10] unify the interface of the proportional-share policy in blkio/io

2018-11-30 Thread Tejun Heo
Hello, Paolo. On Fri, Nov 30, 2018 at 07:23:24PM +0100, Paolo Valente wrote: > > Then we understood that exactly the same happens with throttling, in > > case the latter is activated on different devices w.r.t. bfq. > > > > In addition, the same may happen, in the near future, with the > > bandwi

Re: [PATCH V2 00/10] unify the interface of the proportional-share policy in blkio/io

2018-12-18 Thread Tejun Heo
Hello, Paolo. On Tue, Dec 18, 2018 at 08:48:10AM +0100, Paolo Valente wrote: > If Tejun cannot see any solution to his concern, then can we just > switch to this extension, considering that > - for non-shared names the interface is *identical* to the current > one; > - by using this new interfac

Re: [PATCH V2 00/10] unify the interface of the proportional-share policy in blkio/io

2018-12-27 Thread Tejun Heo
Hello, Paolo. On Sun, Dec 23, 2018 at 12:00:14PM +0100, Paolo Valente wrote: > 4.21 is coming ... and the legacy proportional share interface will > be gone with cfq. This will break legacy code using the > proportional-share interface on top of bfq. This code may just fail > when trying to use

Re: [PATCH V2 00/10] unify the interface of the proportional-share policy in blkio/io

2019-01-02 Thread Tejun Heo
Hello, Paolo. On Sun, Dec 30, 2018 at 11:25:25AM +0100, Paolo Valente wrote: > What's the benefit of throwing away months of work, on which we agreed > before starting it, and that solves a problem already acknowledged by > interested parties? Showing multiple conflicting numbers definitely isn't

Re: [PATCH] block: fix use-after-free in seq file

2016-07-29 Thread Tejun Heo
oc() // fails > - .seq_stop() >- class_dev_iter_exit(seqf->private) // boom! old pointer > > As the comment in disk_seqf_stop() says, stop is called even if start > failed, so we need to reinitialise the private pointer to NULL when seq > iteration stops. > > An

Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-09 Thread Tejun Heo
Hello, Tom. On Sun, Aug 07, 2016 at 10:10:17PM +0800, Tom Yan wrote: > So the (not so) recent bump of BLK_DEF_MAX_SECTORS from 1024 to 2560 > (commit d2be537c3ba3) seemed to have caused trouble to some of the ATA > devices, which were then worked around with ATA_HORKAGE_MAX_SEC_1024. > > However,

Re: [PATCH v2 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence

2016-08-09 Thread Tejun Heo
Hello, On Mon, Aug 08, 2016 at 01:39:08PM +0200, Roman Pen wrote: > Long time ago there was a similar fix proposed by Akinobu Mita[1], > but it seems that time everyone decided to fix this subtle race in > percpu-refcount and Tejun Heo[2] did an attempt (as I can see that > patc

Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices

2016-08-09 Thread Tejun Heo
Hello, On Tue, Aug 09, 2016 at 10:45:47PM +0800, tom.t...@gmail.com wrote: > From: Tom Yan > > Currently block layer limit max_hw_sectors is set to > ATA_MAX_SECTORS_LBA48 (65535), for devices with LBA48 support. > > However, block layer limit max_sectors (which is the effective > one; also adj

Re: [PATCH v2 1/1] blk-mq: fix hang caused by freeze/unfreeze sequence

2016-08-10 Thread Tejun Heo
On Wed, Aug 10, 2016 at 10:42:09AM +0200, Roman Penyaev wrote: > Hi, > > On Wed, Aug 10, 2016 at 5:55 AM, Tejun Heo wrote: > > Hello, > > > > On Mon, Aug 08, 2016 at 01:39:08PM +0200, Roman Pen wrote: > >> Long time ago there was a similar fix proposed by Aki

Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices

2016-08-10 Thread Tejun Heo
Hello, Tom. On Wed, Aug 10, 2016 at 04:32:39PM +0800, Tom Yan wrote: > I have to admit that libata may not be the right place to deal with my > concern over the current BLK_DEF_MAX_SECTORS, which seems non-sensical > to me. In the original commit message: > > (d2be537c3ba3, "block: bump BLK_DEF_M

Re: Regarding AHCI_MAX_SG and (ATA_HORKAGE_MAX_SEC_1024)

2016-08-10 Thread Tejun Heo
Hello, Tom. On Wed, Aug 10, 2016 at 06:04:10PM +0800, Tom Yan wrote: > On 10 August 2016 at 11:26, Tejun Heo wrote: > > Hmmm.. why not? The hardware limit is 64k and the driver is using a > > Is that referring to the maximum number of entries allowed in the > PRDT, Physical

Re: Time to make dynamically allocated devt the default for scsi disks?

2016-08-13 Thread Tejun Heo
Hello, Dan. On Fri, Aug 12, 2016 at 02:29:30PM -0700, Dan Williams wrote: > Before spending effort trying to flush the destruction of old bdi > instances before new ones are registered, is it rather time to > complete the conversion of sd to only use dynamically allocated devt? I think that proba

Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello, On Sat, Aug 27, 2016 at 11:30:22AM +0200, Vegard Nossum wrote: > > Don't know what's the right fix, but I posted a slightly different one > > for the same crash some months ago: > > https://patchwork.kernel.org/patch/8556941/ > > > > Ah, I'm sorry, I didn't see that. > > Your patch is 10

Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello, On Mon, Aug 29, 2016 at 10:49:57PM +0200, Vegard Nossum wrote: > That didn't work at all. I guess bd_acquire() would just do a bdgrab() > and not touch ->bd_holders, whereas blkdev_get() would increment Yeah, bdev has two different refs - one for bdev struct itself and the other for the ac

Re: [PATCH] bdev: fix NULL pointer dereference in sync()/close() race

2016-08-29 Thread Tejun Heo
Hello, On Mon, Aug 29, 2016 at 11:33:41PM +0200, Vegard Nossum wrote: > On 08/29/2016 09:55 PM, Tejun Heo wrote: > > I think the right thing to do there is doing blkdev_get() / > > blkdev_put() around func() invocation in iterate_bdevs() rather than > > holding bd_mutex acro

Re: [PATCH] blkcg: Annotate blkg_hint correctly

2016-09-23 Thread Tejun Heo
On Fri, Sep 23, 2016 at 09:07:56AM -0700, Bart Van Assche wrote: > Avoid that sparse complains about blkg_hint manipulations. > > Fixes: a637120e4902 ("blkcg: use radix tree to index blkgs from blkcg") > Signed-off-by: Bart Van Assche > Cc: Tejun Heo Acked-by: Tejun

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-09-29 Thread Tejun Heo
Hello, On Tue, Sep 27, 2016 at 11:14:54AM -0700, Adam Manzanares wrote: > Patch adds association between iocontext and a request. > > Signed-off-by: Adam Manzanares Can you please describe how this may impact existing usages? Thanks. -- tejun -- To unsubscribe from this list: send the line "

Re: [PATCH 2/3] ata: Enabling ATA Command Priorities

2016-09-29 Thread Tejun Heo
Hello, On Tue, Sep 27, 2016 at 11:14:55AM -0700, Adam Manzanares wrote: > +/** > + * ata_ncq_prio_enabled - Test whether NCQ prio is enabled > + * @dev: ATA device to test for > + * > + * LOCKING: > + * spin_lock_irqsave(host lock) > + * > + * RETURNS: > + * 1 if NCQ prio is enabled fo

Re: [PATCH] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-29 Thread Tejun Heo
k-cgroup.c:1378: blkcg_policy_register() error: double unlock > 'mutex:&blkcg_pol_mutex' > > Signed-off-by: Bart Van Assche > Cc: Tejun Heo > Cc: > --- > block/blk-cgroup.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --gi

Re: [PATCH v2] blkcg: Unlock blkcg_pol_mutex once if cpd == NULL

2016-09-30 Thread Tejun Heo
k-cgroup.c:1378: blkcg_policy_register() error: double unlock > 'mutex:&blkcg_pol_mutex' > > Fixes: 06b285bd1125 ("blkcg: fix blkcg_policy_data allocation bug") > Signed-off-by: Bart Van Assche > Cc: Tejun Heo > Cc: Applied to cgroup/for-4.9. We're right

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-10-02 Thread Tejun Heo
Hello, Adam. On Fri, Sep 30, 2016 at 09:02:17AM -0700, Adam Manzanares wrote: > I'll start with the changes I made and work my way through a grep of > > ioprio. Please add or correct any of the assumptions I have made. > Well, it looks like you're the one who's most f

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Vivek. On Tue, Oct 04, 2016 at 09:28:05AM -0400, Vivek Goyal wrote: > On Mon, Oct 03, 2016 at 02:20:19PM -0700, Shaohua Li wrote: > > Hi, > > > > The background is we don't have an ioscheduler for blk-mq yet, so we can't > > prioritize processes/cgroups. > > So this is an interim solution

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, On Tue, Oct 04, 2016 at 06:22:28PM +0200, Paolo Valente wrote: > Could you please elaborate more on this point? BFQ uses sectors > served to measure service, and, on the all the fast devices on which > we have tested it, it accurately distributes > bandwidth as desired, redistributes exces

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Vivek. On Tue, Oct 04, 2016 at 02:12:45PM -0400, Vivek Goyal wrote: > Agreed that we don't have a good basic unit to measure IO cost. I was > thinking of measuring cost in terms of sectors as that's simple and > gets more accurate on faster devices with almost no seek penalty. And If this

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Paolo. On Tue, Oct 04, 2016 at 07:43:48PM +0200, Paolo Valente wrote: > > I don't think IO bandwidth does not matter. The problem is bandwidth can't > > measure IO cost. For example, you can't say 8k IO costs 2x IO resource than > > 4k > > IO. > > For what goal do you need to be able to s

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Paolo. On Tue, Oct 04, 2016 at 09:02:47PM +0200, Paolo Valente wrote: > That's exactly what BFQ has succeeded in doing in all the tests > devised so far. Can you give me a concrete example for which I can > try with BFQ and with any other mechanism you deem better. If > you are right, num

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-04 Thread Tejun Heo
Hello, Paolo. On Tue, Oct 04, 2016 at 09:29:48PM +0200, Paolo Valente wrote: > > Hmm... I think we already discussed this but here's a really simple > > case. There are three unknown workloads A, B and C and we want to > > give A certain best-effort guarantees (let's say around 80% of the > > und

Re: [PATCH 1/3] block: Add iocontext priority to request

2016-10-04 Thread Tejun Heo
Hello, Adam. On Tue, Oct 04, 2016 at 08:49:18AM -0700, Adam Manzanares wrote: > > I wonder whether the right thing to do is adding bio->bi_ioprio which > > is initialized on bio submission and carried through req->ioprio. > > I looked around and thought about this and I'm not sure if this will he

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-05 Thread Tejun Heo
Hello, Paolo. On Wed, Oct 05, 2016 at 02:37:00PM +0200, Paolo Valente wrote: > In this respect, for your generic, unpredictable scenario to make > sense, there must exist at least one real system that meets the > requirements of such a scenario. Or, if such a real system does not > yet exist, it

Re: [PATCH v5 3/4] ata: Enabling ATA Command Priorities

2016-10-13 Thread Tejun Heo
Hello, On Thu, Oct 13, 2016 at 04:00:30PM -0700, Adam Manzanares wrote: > This patch checks to see if an ATA device supports NCQ command priorities. > If so and the user has specified an iocontext that indicates > IO_PRIO_CLASS_RT then we build a tf with a high priority command. > > This is done

Re: [PATCH v5 4/4] ata: ATA Command Priority Disabled By Default

2016-10-13 Thread Tejun Heo
Hello, Adam. Sorry about late reply. Was on vacation. On Thu, Oct 13, 2016 at 04:00:31PM -0700, Adam Manzanares wrote: > Add a sysfs entry to turn on priority information being passed > to a ATA device. By default this feature is turned off. > > This patch depends on ata: Enabling ATA Command P

Re: Fwd: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-14 Thread Tejun Heo
Hello, Kyle. On Sat, Oct 08, 2016 at 06:15:14PM -0700, Kyle Sanderson wrote: > How is this even a discussion when hard numbers, and trying any > reproduction case easily reproduce the issues that CFQ causes. Reading > this thread, and many others only grows not only my disappointment, > but whenev

Re: [PATCH V3 00/11] block-throttle: add .high limit

2016-10-14 Thread Tejun Heo
Hello, Paolo. On Fri, Oct 14, 2016 at 07:13:41PM +0200, Paolo Valente wrote: > That said, your 'thus' seems a little too strong: "bfq does not yet > handle fast SSDs, thus we need something else". What about the > millions of devices (and people) still within 10-20 K IOPS, and > experiencing awfu

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
On Mon, Oct 17, 2016 at 11:27:30AM -0700, Adam Manzanares wrote: > Add a sysfs entry to turn on priority information being passed > to a ATA device. By default this feature is turned off. > > This patch depends on ata: Enabling ATA Command Priorities > > Signed-off-by: Adam Manzanares > --- > d

Re: [PATCH v6 1/3] block: Add iocontext priority to request

2016-10-19 Thread Tejun Heo
Hello, On Mon, Oct 17, 2016 at 11:27:28AM -0700, Adam Manzanares wrote: > Patch adds an association between iocontext ioprio and the ioprio of a > request. This is done to enable request based drivers the ability to > act on priority information stored in the request. An example being > ATA device

Re: [PATCH v6 2/3] ata: Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo
s done to improve the tail latency of commands that are high priority by passing priority to the device. tj: Removed trivial ata_ncq_prio_enabled() and open-coded the test. Signed-off-by: Adam Manzanares Signed-off-by: Tejun Heo --- drivers/ata/libata-core.

Re: [PATCH v6 0/3] Enabling ATA Command Priorities

2016-10-19 Thread Tejun Heo
On Mon, Oct 17, 2016 at 11:27:27AM -0700, Adam Manzanares wrote: > This patch builds ATA commands with high priority if the iocontext of a > process > is set to real time. The goal of the patch is to improve tail latencies of > workloads that use higher queue depths. This requires setting the ioc

Re: [PATCH v6 3/3] ata: ATA Command Priority Disabled By Default

2016-10-19 Thread Tejun Heo
This patch depends on ata: Enabling ATA Command Priorities tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial ata_ncq_prio_on() and open-coded the test. Signed-off-by: Adam Manzanares Signed-off-by: Tejun Heo --- drivers/ata/libahci.c | 1 + drivers/ata/libata-core.c | 3 ++-

[PATCH] block: cfq_cpd_alloc() should use @gfp

2016-11-10 Thread Tejun Heo
fied through the @gfp parameter. This currently doesn't cause any actual issues because all current callers specify GFP_KERNEL. Fix it. Signed-off-by: Tejun Heo Reported-by: Dan Carpenter Fixes: e4a9bde9589f ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() method

Re: [PATCH V4 02/15] blk-throttle: add .high interface

2016-11-22 Thread Tejun Heo
Hello, Shaohua. Sorry about the delay. On Mon, Nov 14, 2016 at 02:22:09PM -0800, Shaohua Li wrote: > @@ -1376,11 +1414,37 @@ static ssize_t tg_set_max(struct kernfs_open_file *of, > goto out_finish; > } > > - tg->bps[READ][LIMIT_MAX] = v[0]; > - tg->bps[WRITE

Re: [PATCH V4 03/15] blk-throttle: configure bps/iops limit for cgroup in high limit

2016-11-22 Thread Tejun Heo
On Mon, Nov 14, 2016 at 02:22:10PM -0800, Shaohua Li wrote: > each queue will have a state machine. Initially queue is in LIMIT_HIGH > state, which means all cgroups will be throttled according to their high > limit. After all cgroups with high limit cross the limit, the queue state > gets upgraded

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:12PM -0800, Shaohua Li wrote: > +static unsigned long tg_last_high_overflow_time(struct throtl_grp *tg) > +{ > + struct throtl_service_queue *parent_sq; > + struct throtl_grp *parent = tg; > + unsigned long ret = __tg_last_high_overflow_time(tg); >

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-22 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:14PM -0800, Shaohua Li wrote: > throtl_slice is important for blk-throttling. A lot of stuffes depend on > it, for example, throughput measurement. It has 100ms default value, > which is not appropriate for all disks. For example, for SSD we might > use a small

Re: [PATCH V4 05/15] blk-throttle: add downgrade logic

2016-11-22 Thread Tejun Heo
Hello, On Tue, Nov 22, 2016 at 04:21:21PM -0500, Tejun Heo wrote: > 1. A cgroup and its high and max limits don't have much to do with >other cgroups and their limits. I don't get how the choice between >high and max limits can be a td-wide state. Ah, okay, this

Re: [PATCH V4 02/15] blk-throttle: add .high interface

2016-11-23 Thread Tejun Heo
Hello, On Tue, Nov 22, 2016 at 03:08:36PM -0800, Shaohua Li wrote: > Yep, the limit could be high or max. It's doable moving the restrictions on > input, but will increase trouble using the limits. If high is bigger than max, > can I set high to max? if not, I'd prefer to keep the restrictions. Y

Re: [PATCH V4 07/15] blk-throttle: make throtl_slice tunable

2016-11-23 Thread Tejun Heo
Hello, On Tue, Nov 22, 2016 at 03:18:24PM -0800, Shaohua Li wrote: > Hmm, it's not a real 'time slice'. The name is a bit confusion. Maybe rename > it > to 'throtl_interval' or 'throtl_sampling_time'? not sure. bandwidth and iops > are always in terms of a time interval we measure them. We can't

Re: [PATCH V4 09/15] blk-throttle: make bandwidth change smooth

2016-11-23 Thread Tejun Heo
Hello, On Mon, Nov 14, 2016 at 02:22:16PM -0800, Shaohua Li wrote: > cg1/cg2 bps: 10/80 -> 15/105 -> 20/100 -> 25/95 -> 30/90 -> 35/85 -> 40/80 > -> 45/75 -> 10/80 I wonder whether it'd make sense to make the clamping down gradual too (way faster than the ramping up but still gradual). The contr

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-23 Thread Tejun Heo
On Mon, Nov 14, 2016 at 02:22:18PM -0800, Shaohua Li wrote: > Add interface to configure the threshold > > Signed-off-by: Shaohua Li > --- > block/blk-sysfs.c| 7 +++ > block/blk-throttle.c | 25 + > block/blk.h | 4 > 3 files changed, 36 insertion

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-23 Thread Tejun Heo
Hello, Shaohua. On Mon, Nov 14, 2016 at 02:22:17PM -0800, Shaohua Li wrote: > Unfortunately it's very hard to determine if a cgroup is real idle. This > patch uses the 'think time check' idea from CFQ for the purpose. Please > note, the idea doesn't work for all workloads. For example, a workload

Re: [PATCH V4 11/15] blk-throttle: add interface to configure think time threshold

2016-11-28 Thread Tejun Heo
Hello, Shaohua. On Wed, Nov 23, 2016 at 05:06:30PM -0800, Shaohua Li wrote: > > Shouldn't this be a per-cgroup setting along with latency target? > > These two are the parameters which define how the cgroup should be > > treated time-wise. > > It should be easy to make it per-cgroup. Just not sur

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-28 Thread Tejun Heo
Hello, Shaohua. On Wed, Nov 23, 2016 at 05:15:18PM -0800, Shaohua Li wrote: > > Hmm... I'm not sure thinktime is the best measure here. Think time is > > used by cfq mainly to tell the likely future behavior of a workload so > > that cfq can take speculative actions on the prediction. However, >

Re: [PATCH V4 10/15] blk-throttle: add a simple idle detection

2016-11-29 Thread Tejun Heo
Hello, Shaohua. On Mon, Nov 28, 2016 at 03:10:18PM -0800, Shaohua Li wrote: > > But we can increase sharing by upping the target latency. That should > > be the main knob - if low, the user wants stricter service guarantee > > at the cost of lower overall utilization; if high, the workload can >

  1   2   3   4   5   >