Re: [LSF/MM TOPIC] De-clustered RAID with MD

2018-01-31 Thread David Brown
On 31/01/18 15:27, Wols Lists wrote: > On 31/01/18 09:58, David Brown wrote: >> I would also be interested in how the data and parities are distributed >> across cabinets and disk controllers. When you manually build from >> smaller raid sets, you can ensure that in set the data disks and the >>

Re: [LSF/MM TOPIC] De-clustered RAID with MD

2018-01-31 Thread Wols Lists
On 31/01/18 09:58, David Brown wrote: > I would also be interested in how the data and parities are distributed > across cabinets and disk controllers. When you manually build from > smaller raid sets, you can ensure that in set the data disks and the > parity are all in different cabinets - that

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Al Viro
On Wed, Jan 31, 2018 at 12:42:45PM -0500, Jerome Glisse wrote: > For block devices the idea is to use struct page and buffer_head (first one of > a page) as a key to find mapping (struct address_space) back. Details, please...

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Jerome Glisse
On Wed, Jan 31, 2018 at 07:09:48PM +0200, Igor Stoppa wrote: > On 30/01/18 02:43, Jerome Glisse wrote: > > [...] > > > Maybe we can kill page->mapping altogether as a result of this. However > > this is > > not my motivation at this time. > > We had a discussion some time ago > >

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Jerome Glisse
On Wed, Jan 31, 2018 at 04:56:46PM +, Al Viro wrote: > On Mon, Jan 29, 2018 at 07:43:48PM -0500, Jerome Glisse wrote: > > I started a patchset about $TOPIC a while ago, right now i am working on > > other > > thing but i hope to have an RFC for $TOPIC before LSF/MM and thus would > > like a

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Al Viro
On Mon, Jan 29, 2018 at 07:43:48PM -0500, Jerome Glisse wrote: > I started a patchset about $TOPIC a while ago, right now i am working on other > thing but i hope to have an RFC for $TOPIC before LSF/MM and thus would like a > slot during common track to talk about it as it impacts FS, BLOCK and

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/30/18 9:25 PM, jianchao.wang wrote: > Hi Jens > > On 01/30/2018 11:57 PM, Jens Axboe wrote: >> On 1/30/18 8:41 AM, Jens Axboe wrote: >>> Hi, >>> >>> I just hit this on 4.15+ on the laptop, it's running Linus' git >>> as of yesterday, right after the block tree merge: >>> >>> commit

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Igor Stoppa
On 30/01/18 02:43, Jerome Glisse wrote: [...] > Maybe we can kill page->mapping altogether as a result of this. However this > is > not my motivation at this time. We had a discussion some time ago http://www.openwall.com/lists/kernel-hardening/2017/07/07/7 where you advised to use it for

Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Scotty Bauer
On 2018-01-31 01:23, Ming Lei wrote: Hi All, After KPTI is merged, there is extra load introduced to context switch between user space and kernel space. It is observed on my laptop that one syscall takes extra ~0.15us[1] compared with 'nopti'. IO performance is affected too, it is observed

[PATCH] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Bart Van Assche
Initialize the request queue lock earlier such that the following race can no longer occur: blk_init_queue_node blkcg_print_blkgs blk_alloc_queue_node (1) q->queue_lock = >__queue_lock (2) blkcg_init_queue(q) (3)

Re: [LSF/MM TOPIC] Killing reliance on struct page->mapping

2018-01-31 Thread Jerome Glisse
On Wed, Jan 31, 2018 at 05:55:58PM +, Al Viro wrote: > On Wed, Jan 31, 2018 at 12:42:45PM -0500, Jerome Glisse wrote: > > > For block devices the idea is to use struct page and buffer_head (first one > > of > > a page) as a key to find mapping (struct address_space) back. > > Details,

Re: [PATCH] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Jens Axboe
On 1/31/18 12:13 PM, Bart Van Assche wrote: > Initialize the request queue lock earlier such that the following > race can no longer occur: > > blk_init_queue_node blkcg_print_blkgs > blk_alloc_queue_node (1) > q->queue_lock = >__queue_lock (2) > blkcg_init_queue(q) (3)

Re: [PATCH] Use bio_endio instead of bio_put in error path of blk_rq_append_bio

2018-01-31 Thread Jiri Palecek
On 1/31/18 6:24 AM, Ming Lei wrote: On Tue, Jan 30, 2018 at 04:24:14PM +0100, Jiri Palecek wrote: On 1/30/18 1:53 PM, Ming Lei wrote: On Thu, Jan 25, 2018 at 9:58 PM, Jiří Paleček wrote: Avoids page leak from bounced requests --- block/blk-map.c | 3 ++- 1 file

Re: [PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Joseph Qi
Hi Bart, On 18/2/1 07:53, Bart Van Assche wrote: > Initialize the request queue lock earlier such that the following > race can no longer occur: > > blk_init_queue_node blkcg_print_blkgs > blk_alloc_queue_node (1) > q->queue_lock = >__queue_lock (2) >

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote: > > How about something like the below? > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 8452fc7164cc..cee102fb060e 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -574,8 +574,13 @@ static int

[PATCH v2 1/2] block: Add a third argument to blk_alloc_queue_node()

2018-01-31 Thread Bart Van Assche
This patch does not change any functionality. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Joseph Qi Cc: Philipp Reisner Cc: Ulf Hansson Cc: Kees Cook

[PATCH v2 0/2] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Bart Van Assche
Hello Jens, The two patches in this series fix a recently reported race between the throttling code and request queue initialization. It would be appreciated if you could have a look at this patch series. Thanks, Bart. Changes between v1 and v2: - Split a single patch into two patches. -

[PATCH v2 2/2] block: Fix a race between the throttling code and request queue initialization

2018-01-31 Thread Bart Van Assche
Initialize the request queue lock earlier such that the following race can no longer occur: blk_init_queue_node blkcg_print_blkgs blk_alloc_queue_node (1) q->queue_lock = >__queue_lock (2) blkcg_init_queue(q) (3)

Re: [PATCH 1/2] lightnvm: remove mlc pairs structure

2018-01-31 Thread Javier González
> On 31 Jan 2018, at 16.35, Matias Bjørling wrote: > > On 01/31/2018 03:00 AM, Javier González wrote: >>> On 30 Jan 2018, at 21.26, Matias Bjørling wrote: >>> >>> The known implementations of the 1.2 specification, and upcoming 2.0 >>> implementation all

[PATCH v2 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread tang . junhui
From: Tang Junhui After long time small writing I/O running, we found the occupancy of CPU is very high and I/O performance has been reduced by about half: [root@ceph151 internal]# top top - 15:51:05 up 1 day,2:43, 4 users, load average: 16.89, 15.15, 16.53 Tasks: 2063

Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-01-31 Thread Matias Bjørling
On 01/31/2018 10:13 AM, Javier Gonzalez wrote: On 31 Jan 2018, at 16.51, Matias Bjørling wrote: On 01/31/2018 03:06 AM, Javier González wrote: In preparation for the OCSSD 2.0 spec. bad block identification, refactor the current code to generalize bad block get/set

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Keith Busch
On Wed, Jan 31, 2018 at 08:07:41PM -0700, Jens Axboe wrote: > if (total_phys_segments > queue_max_segments(q)) > - return 0; > + return 0; This perhaps unintended change happens to point out another problem: queue_max_segments is the wrong limit for discards,

Re: [PATCH 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread Michael Lyle
LGTM on first read-- I'll read it again and test in test branch. Reviewed-by: Michael Lyle On 01/26/2018 12:24 AM, tang.jun...@zte.com.cn wrote: > From: Tang Junhui > > After long time small writing I/O running, we found the occupancy of CPU > is very

Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
Hi Scotty, On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote: > On 2018-01-31 01:23, Ming Lei wrote: > > Hi All, > > > > After KPTI is merged, there is extra load introduced to context switch > > between user space and kernel space. It is observed on my laptop that > > one > > syscall

Re: [LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
On Wed, Jan 31, 2018 at 11:43:33AM -0700, Scotty Bauer wrote: > On 2018-01-31 01:23, Ming Lei wrote: > > Hi All, > > > > After KPTI is merged, there is extra load introduced to context switch > > between user space and kernel space. It is observed on my laptop that > > one > > syscall takes extra

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 8:33 PM, jianchao.wang wrote: > Sorry, Jens, I think I didn't get the point. > Do I miss anything ? > > On 02/01/2018 11:07 AM, Jens Axboe wrote: >> Yeah I agree, and my last patch missed that we do care about segments for >> discards. Below should be better... >> >> diff --git

Re: [PATCH 1/2] bcache: add journal statistic

2018-01-31 Thread Michael Lyle
LGTM except for formatting / an extra newline (will fix) -- in my test branch for possible 4.16 Reviewed-by: Michael Lyle On 01/26/2018 12:23 AM, tang.jun...@zte.com.cn wrote: > From: Tang Junhui > > Sometimes, Journal takes up a lot of CPU, we need

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 4:33 PM, Keith Busch wrote: > On Wed, Jan 31, 2018 at 08:29:37AM -0700, Jens Axboe wrote: >> >> How about something like the below? >> >> >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index 8452fc7164cc..cee102fb060e 100644 >> --- a/block/blk-merge.c >> +++

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread jianchao.wang
Hi Jens On 01/31/2018 11:29 PM, Jens Axboe wrote: > How about something like the below? > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index 8452fc7164cc..cee102fb060e 100644 > --- a/block/blk-merge.c > +++ b/block/blk-merge.c > @@ -574,8 +574,13 @@ static int

Re: [PATCH 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread Michael Lyle
Unfortunately, this doesn't build because of nonexistent call heap_empty (I assume some changes to util.h got left out). I really need clean patches that build and are formatted properly. Mike On 01/26/2018 12:24 AM, tang.jun...@zte.com.cn wrote: > From: Tang Junhui >

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread Jens Axboe
On 1/31/18 8:03 PM, jianchao.wang wrote: > Hi Jens > > > On 01/31/2018 11:29 PM, Jens Axboe wrote: >> How about something like the below? >> >> >> diff --git a/block/blk-merge.c b/block/blk-merge.c >> index 8452fc7164cc..cee102fb060e 100644 >> --- a/block/blk-merge.c >> +++ b/block/blk-merge.c

Re: [PATCH 2/2] bcache: fix high CPU occupancy during journal

2018-01-31 Thread tang . junhui
From: Tang Junhui > Unfortunately, this doesn't build because of nonexistent call heap_empty > (I assume some changes to util.h got left out). I really need clean > patches that build and are formatted properly. > > Mike Oh, I am so sorry for that. A new version patch

Re: WARNING: CPU: 2 PID: 207 at drivers/nvme/host/core.c:527 nvme_setup_cmd+0x3d3

2018-01-31 Thread jianchao.wang
Sorry, Jens, I think I didn't get the point. Do I miss anything ? On 02/01/2018 11:07 AM, Jens Axboe wrote: > Yeah I agree, and my last patch missed that we do care about segments for > discards. Below should be better... > > diff --git a/block/blk-merge.c b/block/blk-merge.c > index

Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute

2018-01-31 Thread Matias Bjørling
On 01/31/2018 03:06 AM, Javier González wrote: From: Hans Holmberg When pblk receives a sync, all data up to that point in the write buffer must be comitted to persistent storage, and as flash memory comes with a minimal write size there is a significant cost

Re: [LSF/MM TOPIC] De-clustered RAID with MD

2018-01-31 Thread David Brown
On 30/01/18 10:40, Johannes Thumshirn wrote: > Wols Lists writes: > >> On 29/01/18 15:23, Johannes Thumshirn wrote: >>> Hi linux-raid, lsf-pc >>> >>> (If you've received this mail multiple times, I'm sorry, I'm having >>> trouble with the mail setup). >> >> My immediate

Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-01-31 Thread Matias Bjørling
On 01/31/2018 03:06 AM, Javier González wrote: In preparation for the OCSSD 2.0 spec. bad block identification, refactor the current code to generalize bad block get/set functions and structures. Signed-off-by: Javier González --- drivers/lightnvm/pblk-init.c | 213

[LSF/MM TOPIC] KPTI effect on IO performance

2018-01-31 Thread Ming Lei
Hi All, After KPTI is merged, there is extra load introduced to context switch between user space and kernel space. It is observed on my laptop that one syscall takes extra ~0.15us[1] compared with 'nopti'. IO performance is affected too, it is observed that IOPS drops by 32% in my test[2] on

Re: [PATCH 1/2] lightnvm: remove mlc pairs structure

2018-01-31 Thread Matias Bjørling
On 01/31/2018 03:00 AM, Javier González wrote: On 30 Jan 2018, at 21.26, Matias Bjørling wrote: The known implementations of the 1.2 specification, and upcoming 2.0 implementation all expose a sequential list of pages to write. Remove the data structure, as it is no longer

Re: [PATCH 5/5] lightnvm: pblk: refactor bad block identification

2018-01-31 Thread Javier Gonzalez
> On 31 Jan 2018, at 16.51, Matias Bjørling wrote: > >> On 01/31/2018 03:06 AM, Javier González wrote: >> In preparation for the OCSSD 2.0 spec. bad block identification, >> refactor the current code to generalize bad block get/set functions and >> structures. >>

Re: [LSF/MM TOPIC] De-clustered RAID with MD

2018-01-31 Thread Johannes Thumshirn
David Brown writes: > That sounds smart. I don't see that you need anything particularly > complicated for how you distribute your data and parity drives across > the 100 disks - you just need a fairly even spread. Exactly. > I would be more concerned with how you

Re: [LSF/MM TOPIC] De-clustered RAID with MD

2018-01-31 Thread David Brown
On 29/01/18 22:50, NeilBrown wrote: > On Mon, Jan 29 2018, Wols Lists wrote: > >> On 29/01/18 15:23, Johannes Thumshirn wrote: >>> Hi linux-raid, lsf-pc >>> >>> (If you've received this mail multiple times, I'm sorry, I'm having >>> trouble with the mail setup). >> >> My immediate reactions as a