Re: [PATCH] blk-mq: Avoid race condition with uninitialized requests

2014-09-22 Thread Jens Axboe

On 2014-09-22 08:15, Christian Borntraeger wrote:

On 09/18/2014 11:04 AM, David Hildenbrand wrote:

This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249.

We have to initialize at least the atomic_flags and the cmd_flags when
allocating storage for the requests.

Otherwise blk_mq_timeout_check() might dereference uninitialized pointers when
racing with the creation of a request.

Also move the reset of cmd_flags for the initializing code to the point where a
request is freed. So we will never end up with pending flush request indicators
that might trigger dereferences of invalid pointers in blk_mq_timeout_check().

Cc: sta...@vger.kernel.org
Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com


Acked-by: Christian Borntraeger borntrae...@de.ibm.com

Can you please add
Reported-by: Paulo De Rezende Pinatti ppina...@linux.vnet.ibm.com
Tested-by: Paulo De Rezende Pinatti ppina...@linux.vnet.ibm.com

as Paulo did the testing work?

We thing this patch is fine and should go upstream.


I might have to pick'n rebase the series, in which case I'll add it. But 
I already queued it up last week, so if I don't, then I can't easily add 
it. I wish the git notes wasn't such a horrible and unusable hack, so we 
had a chance to annotate commits without having to rewrite history...


--
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Jens Axboe

On 2014-09-17 07:52, Ming Lei wrote:

On Wed, 17 Sep 2014 14:00:34 +0200
David Hildenbrand d...@linux.vnet.ibm.com wrote:


Does anyone have an idea?
The request itself is completely filled with cc


That is very weird, the 'rq' is got from hctx-tags,  and rq should be
valid, and rq-q shouldn't have been changed even though it was
double free or double allocation.


I am currently asking myself if blk_mq_map_request should protect against 
softirq here but I cant say for sure,as I have never looked into that code 
before.


No, it needn't the protection.

Thanks,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





Digging through the code, I think I found a possible cause:

tags-rqs[..] is not initialized with zeroes (via alloc_pages_node in
blk-mq.c:blk_mq_init_rq_map()).


Yes, it may cause problem when the request is allocated at the 1st time,
and timeout handler may comes just after the allocation and before its
initialization, then oops triggered because of garbage data in the request.

--
 From ffd0824b7b686074c2d5d70bc4e6bba3ba56a30c Mon Sep 17 00:00:00 2001
From: Ming Lei ming@canonical.com
Date: Wed, 17 Sep 2014 21:00:34 +0800
Subject: [PATCH] blk-mq: initialize request before the 1st allocation

Otherwise the request can be accessed from timeout handler
just after its 1st allocation from tag pool and before
initialization in blk_mq_rq_ctx_init(), so cause oops since
the request is filled up with garbage data.

Signed-off-by: Ming Lei ming@canonical.com
---
  block/blk-mq.c |   10 ++
  1 file changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4aac826..d24673f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -514,6 +514,10 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, 
unsigned int tag)
  {
struct request *rq = tags-rqs[tag];

+   /* uninitialized request */
+   if (!rq-q || rq-tag == -1)
+   return rq;
+
if (!is_flush_request(rq, tag))
return rq;

@@ -1401,6 +1405,12 @@ static struct blk_mq_tags *blk_mq_init_rq_map(struct 
blk_mq_tag_set *set,
left -= to_do * rq_size;
for (j = 0; j  to_do; j++) {
tags-rqs[i] = p;
+
+   /* Avoiding early access from timeout handler */
+   tags-rqs[i]-tag = -1;
+   tags-rqs[i]-q = NULL;
+   tags-rqs[i]-cmd_flags = 0;
+
if (set-ops-init_request) {
if (set-ops-init_request(set-driver_data,
tags-rqs[i], hctx_idx, i,


Another way would be to ensure that the timeout handler doesn't touch 
hw_ctx or tag_sets that aren't fully initialized yet. But I think this 
is safer/cleaner.


--
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: blk-mq crash under KVM in multiqueue block code (with virtio-blk and ext4)

2014-09-17 Thread Jens Axboe
On 09/17/2014 01:09 PM, David Hildenbrand wrote:
 0. That should already be sufficient to hinder blk_mq_tag_to_rq and the 
 calling
 method to do the wrong thing.

 Yes, clearing rq-cmd_flags should be enough.

 And looks better to move rq initialization to __blk_mq_free_request()
 too, otherwise timeout still may see old cmd_flags and rq-q before
 rq's new initialization.
 
 Yes, __blk_mq_free_request() should also reset at least rq-cmd_flags, and I
 think we can remove the initialization from  __blk_mq_alloc_request().

And then we come full circle, that's how the code originally started out
(and it is the saner way to do things). So yes, I'd greatly applaud that.


-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-rng only returns zeros with CONFIG_HW_RANDOM=m

2013-05-30 Thread Jens Axboe
On Thu, May 30 2013, Rusty Russell wrote:
 Jens Axboe ax...@kernel.dk writes:
  On Wed, Feb 27 2013, Rusty Russell wrote:
  Aurelien Jarno aurel...@aurel32.net writes:
   Hi,
  
   I have noticed that virtio-rng only returns zero for kernels = 2.6.33
   built with CONFIG_HW_RANDOM=m. This is a bit much too predictable for a
   random generator ;-).
  
  Wow.  Fortunately, all of SLES, RHEL, Ubuntu or Fedora set
  CONFIG_HW_RANDOM=y.  What do they know that we don't?
  
  Oops, looks like Debian testing: config-3.2.0-4-amd64:CONFIG_HW_RANDOM=m
  
   The reason for that is virtio expects guest real addresses, while
   rng_core.ko (ie when built as a module) is passing a vmalloced buffer 
   to the virtio-rng read function, declared as such:
  
 static u8 rng_buffer[SMP_CACHE_BYTES  32 ? 32 : SMP_CACHE_BYTES]
 __cacheline_aligned;
  
  Yuck...  It would be nice if this has oopsed.  Jens, what about this patch?
  
  Cheers,
  Rusty.
  
  Subject: scatterlist: sg_set_buf() argument must be in linear mapping.
  
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  
  diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
  index 4bd6c06..9365375 100644
  --- a/include/linux/scatterlist.h
  +++ b/include/linux/scatterlist.h
  @@ -111,6 +111,9 @@ static inline struct page *sg_page(struct scatterlist 
  *sg)
   static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
   unsigned int buflen)
   {
  +#ifdef CONFIG_DEBUG_SG
  +  BUG_ON(!virt_addr_valid(buf));
  +#endif
 sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
   }
 
  Looks good to me, in lieu of being able to return an error. Want me to
  queue it up?
 
  -- 
  Jens Axboe
 
 Ping?  I haven't seen this go into Linus' tree...

I forget if I didn't get a reply, or whether it just got lost! Queued up
now, at least.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio-rng only returns zeros with CONFIG_HW_RANDOM=m

2013-02-27 Thread Jens Axboe
On Wed, Feb 27 2013, Rusty Russell wrote:
 Aurelien Jarno aurel...@aurel32.net writes:
  Hi,
 
  I have noticed that virtio-rng only returns zero for kernels = 2.6.33
  built with CONFIG_HW_RANDOM=m. This is a bit much too predictable for a
  random generator ;-).
 
 Wow.  Fortunately, all of SLES, RHEL, Ubuntu or Fedora set
 CONFIG_HW_RANDOM=y.  What do they know that we don't?
 
 Oops, looks like Debian testing: config-3.2.0-4-amd64:CONFIG_HW_RANDOM=m
 
  The reason for that is virtio expects guest real addresses, while
  rng_core.ko (ie when built as a module) is passing a vmalloced buffer 
  to the virtio-rng read function, declared as such:
 
static u8 rng_buffer[SMP_CACHE_BYTES  32 ? 32 : SMP_CACHE_BYTES]
__cacheline_aligned;
 
 Yuck...  It would be nice if this has oopsed.  Jens, what about this patch?
 
 Cheers,
 Rusty.
 
 Subject: scatterlist: sg_set_buf() argument must be in linear mapping.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
 index 4bd6c06..9365375 100644
 --- a/include/linux/scatterlist.h
 +++ b/include/linux/scatterlist.h
 @@ -111,6 +111,9 @@ static inline struct page *sg_page(struct scatterlist *sg)
  static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
 unsigned int buflen)
  {
 +#ifdef CONFIG_DEBUG_SG
 + BUG_ON(!virt_addr_valid(buf));
 +#endif
   sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
  }

Looks good to me, in lieu of being able to return an error. Want me to
queue it up?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/8] virtio: new API for addition of buffers, scatterlist changes

2013-02-08 Thread Jens Axboe
On Fri, Feb 08 2013, Rusty Russell wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
  The virtqueue_add_buf function has two limitations:
 
  1) it requires the caller to provide all the buffers in a single call;
 
  2) it does not support chained scatterlists: the buffers must be
  provided as an array of struct scatterlist.
 
  Because of these limitations, virtio-scsi has to copy each request into
  a scatterlist internal to the driver.  It cannot just use the one that
  was prepared by the upper SCSI layers.
 
 Hi Paulo,
 
 Note that you've defined your problem in terms of your solution
 here.  For clarity:
 
 The problem: we want to prepend and append to a scatterlist.  We can't
 append, because the chained scatterlist implementation requires
 an element to be appended to join two scatterlists together.
 
 The solution: fix scatterlists by introducing struct sg_ring:
 struct sg_ring {
 struct list_head ring;
   unsigned int nents;
   unsigned int orig_nents; /* do we want to replace sg_table? */
 struct scatterlist *sg;
 };

This would definitely be more flexible than the current chaining.
However:

 The workaround: make virtio accept multiple scatterlists for a single
 buffer.
 
 There's nothing wrong with your workaround, but if other subsystems have
 the same problem we do, perhaps we should consider a broader solution?

Do other use cases actually exist? I don't think I've come across this
requirement before, since it was introduced (6 years ago, from a cursory
look at the git logs!).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 5/8] scatterlist: introduce sg_unmark_end

2013-02-07 Thread Jens Axboe
On Thu, Feb 07 2013, Paolo Bonzini wrote:
 This is useful in places that recycle the same scatterlist multiple
 times, and do not want to incur the cost of sg_init_table every
 time in hot paths.

Looks fine to me.

Acked-by: Jens Axboe ax...@kernel.dk

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5 0/4] Improve virtio-blk performance

2012-08-02 Thread Jens Axboe
On 08/02/2012 08:25 AM, Asias He wrote:
 Hi folks,
 
 This version added REQ_FLUSH and REQ_FUA support as suggested by Christoph and
 rebased against latest linus's tree. 
 
 Jens, could you please consider picking up the dependencies 1/4 and
 2/4 in your tree. Thanks!

Pickedup, thanks for getting that done!


-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 1/2] block: Add blk_bio_map_sg() helper

2012-06-14 Thread Jens Axboe
On 2012-06-14 04:31, Tejun Heo wrote:
 On Wed, Jun 13, 2012 at 03:41:46PM +0800, Asias He wrote:
 Add a helper to map a bio to a scatterlist, modelled after
 blk_rq_map_sg.

 This helper is useful for any driver that wants to create
 a scatterlist from its -make_request_fn method.
 
 This may not be possible but I really wanna avoid having two copies of
 that complex logic.  Any chance blk_rq_map_bio() can be implemented in
 a way that allows blk_rq_map_sg() can be built on top of it?  Also,

Was thinking the same thing, definitely code we don't want to have
duplicated. We've had mapping bugs in the past.

Asias, this should be trivial to do, except that blk_rq_map_sg()
potentially maps across bio's as well. The tracking of the prev bio_vec
does not care about cross bio boundaries.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: Call revalidate_disk() upon online disk resize

2012-03-29 Thread Jens Axboe
On 03/28/2012 10:32 PM, Vivek Goyal wrote:
 If a virtio disk is open in guest and a disk resize operation is done,
 (virsh blockresize), new size is not visible to tools like fdisk -l.
 This seems to be happening as we update only part-nr_sects and not
 bdev-bd_inode size.
 
 Call revalidate_disk() which should take care of it. I tested growing disk
 size of already open disk and it works for me.

Thanks, obviously right. Applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio_blk: add helper function to support mass of disks naming

2012-03-29 Thread Jens Axboe
On 03/29/2012 07:36 AM, Ren Mingxin wrote:
  On 03/28/2012 06:58 PM, Michael S. Tsirkin wrote:
 On Wed, Mar 28, 2012 at 02:54:43PM +0800, Ren Mingxin wrote:
   Hi,

 The current virtblk's naming algorithm just supports 26^3
 disks. If there are mass of virtblks(exceeding 26^3), there
 will be disks with the same name.

 According to sd_format_disk_name(), I add function
 virtblk_name_format() for virtblk to support mass of
 disks.

 Signed-off-by: Ren Mingxinre...@cn.fujitsu.com
 Nod. This is basically what 3e1a7ff8a0a7b948f2684930166954f9e8e776fe
 did. Except, maybe we should move this function into block core
 instead of duplicating it? Where would be a good place to put it?
 
 Yes, this was also what I thought.
 
 How about placing the sd_format_disk_name()
 as disk_name_format() into block/genhd.c
 (include/linux/genhd.h)?

Yes, that sounds like the appropriate solution (and name, and where to
put it).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFT PATCH] blkio: alloc per cpu data from worker thread context( Re: kvm deadlock)

2011-12-20 Thread Jens Axboe
On 2011-12-19 19:27, Vivek Goyal wrote:
 I think reverting the previous series is not going to be simple either. It
 had 13 patches.
 
 https://lkml.org/lkml/2011/5/19/560
 
 By making stats per cpu, I was able to reduce contention on request
 queue lock. Now we shall have to bring the lock back. 

That's not going to happen, having that lock in there was a disaster for
small IO on fast devices. It's essentially the limiting factor on
benchmark runs on the RHEL kernels that have it included and enabled...


-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm deadlock

2011-12-14 Thread Jens Axboe
On 2011-12-14 14:43, Avi Kivity wrote:
 On 12/14/2011 02:25 PM, Marcelo Tosatti wrote:
 On Mon, Dec 05, 2011 at 04:48:16PM -0600, Nate Custer wrote:
 Hello,

 I am struggling with repeatable full hardware locks when running 8-12 KVM 
 vms. At some point before the hard lock I get a inconsistent lock state 
 warning. An example of this can be found here:

 http://pastebin.com/8wKhgE2C

 After that the server continues to run for a while and then starts its 
 death spiral. When it reaches that point it fails to log anything further 
 to the disk, but by attaching a console I have been able to get a stack 
 trace documenting the final implosion:

 http://pastebin.com/PbcN76bd

 All of the cores end up hung and the server stops responding to all input, 
 including SysRq commands. 

 I have seen this behavior on two machines (dual E5606 running Fedora 16) 
 both passed cpuburnin testing and memtest86 scans without error. 

 I have reproduced the crash and stack traces from a Fedora debugging kernel 
 - 3.1.2-1 and with a vanilla 3.1.4 kernel.

 Busted hardware, apparently. Can you reproduce these issues with the
 same workload on different hardware?
 
 I don't think it's hardware related.  The second trace (in the first
 paste) is called during swap, so GFP_FS is set.  The first one is not,
 so GFP_FS is clear.  Lockdep is worried about the following scenario:
 
   acpi_early_init() is called
   calls pcpu_alloc(), which takes pcpu_alloc_mutex
   eventually, calls kmalloc(), or some other allocation function
   no memory, so swap
   call try_to_free_pages()
   submit_bio()
   blk_throtl_bio()
   blkio_alloc_blkg_stats()
   alloc_percpu()
   pcpu_alloc(), which takes pcpu_alloc_mutex
   deadlock
 
 It's a little unlikely that acpi_early_init() will OOM, but lockdep
 doesn't know that.  Other callers of pcpu_alloc() could trigger the same
 thing.
 
 When lockdep says
 
 [ 5839.924953] other info that might help us debug this:
 [ 5839.925396]  Possible unsafe locking scenario:
 [ 5839.925397]
 [ 5839.925840]CPU0
 [ 5839.926063]
 [ 5839.926287]   lock(pcpu_alloc_mutex);
 [ 5839.926533]   Interrupt
 [ 5839.926756] lock(pcpu_alloc_mutex);
 [ 5839.926986]
 
 It really means
 
swap, set GFP_FS
 
 GFP_FS simply marks the beginning of a nested, unrelated context that
 uses the same thread, just like an interrupt.  Kudos to lockdep for
 catching that.
 
 I think the allocation in blkio_alloc_blkg_stats() should be moved out
 of the I/O path into some init function. Copying Jens.

That's completely buggy, basically you end up with a GFP_KERNEL
allocation from the IO submit path. Vivek, per_cpu data needs to be set
up at init time. You can't allocate it dynamically off the IO path.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm deadlock

2011-12-14 Thread Jens Axboe
On 2011-12-14 18:03, Vivek Goyal wrote:
 I think the allocation in blkio_alloc_blkg_stats() should be moved out
 of the I/O path into some init function. Copying Jens.

 That's completely buggy, basically you end up with a GFP_KERNEL
 allocation from the IO submit path. Vivek, per_cpu data needs to be set
 up at init time. You can't allocate it dynamically off the IO path.
 
 Hi Jens,
 
 I am wondering how does CFQ get away with blocking cfqq allocation in
 IO submit path. I see that blk_queue_bio() will do following.
 
 blk_queue_bio()
  get_request_wait()
   get_request(..,..,GFP_NOIO)
blk_alloc_request()
 elv_set_request()
  cfq_set_request()
   --- Can sleep and do memory allocation in IO submit path as
GFP_NOIO has __GFP_WAIT.
 
 So that means sleeping allocation from IO submit path is not necessarily
 a problem?

__GFP_WAIT isn't the problem, you can block in the IO path. You cannot,
however, recurse back into IO submission. That's why CFQ is using
GFP_NOIO, implying that waiting is OK, but submitting new IO to satisfy
the allocation is not.

 But in case of per cpu data allocation, we might be holding
 pcpu_alloc_mutex() already at the time of calling pcpu allocation
 again and that might lead to deadlock.  (As Avi mentioned). If yes,
 then it is a problem.

It is both a problem and somewhat of a mess...

 Right now allocation of root group and associated stats happens at
 queue initialization time. For non-root cgroups, group allocation and
 associated per cpu stats allocation happens dynamically when the IO is
 submitted. So in this case may be we are creating a new blkio cgroup
 and then doing IO which leads to this warning.
 
 I am not sure how to move this allocation to init path. These stats
 are per group and groups are created dynamically as IO happens in
 them. Only init path seems to be cgroup creation time. blkg is an
 object which is contained in a parent object and at that time parent
 object is not available. It is created dynamically at the IO time
 (cfq_group, blkio_group etc).
 
 Though it is little hackish but can we just delay the allocation of
 stats if pcpu_alloc_mutex is held. We shall have to make
 pcpu_alloc_mutex non static though. Delaying will just not capture the
 stats for some time and sooner or later we will get regular IO with
 pcpu_alloc_mutex not held and we can do per cpu allocation at that
 time. I will write a a test patch.
 
 Or may be there is a safer version of pcpu alloc which will return
 without allocation if pcpu_alloc_mutex is already locked.

Both of these proposed solutions aren't really pretty. It would be
cleaner and have better runtime for the fast path if you could alloc all
of these when the non-root groups are setup. Why isn't it done that way
right now?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv4] virtio-blk: use ida to allocate disk index

2011-10-31 Thread Jens Axboe
On 2011-10-30 20:29, Michael S. Tsirkin wrote:
 Based on a patch by Mark Wu d...@redhat.com
 
 Current index allocation in virtio-blk is based on a monotonically
 increasing variable index. This means we'll run out of numbers
 after a while.  It also could cause confusion about the disk
 name in the case of hot-plugging disks.
 Change virtio-blk to use ida to allocate index, instead.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Changes from Mark's versions:
  use the new ida_simple_get
  error handling cleanup
  fix user after free
 
 Works fine for me.
 
 Jens, could you merge this for 3.2?
 That is, unless Rusty complains shortly ...

Yep, tentatively added.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

2011-10-24 Thread Jens Axboe
On 2011-10-24 12:02, Michael S. Tsirkin wrote:
 On Wed, Oct 19, 2011 at 12:12:20PM +0200, Michael S. Tsirkin wrote:
 On Thu, Jun 09, 2011 at 06:41:56AM -0400, Mark Wu wrote:
 On 06/09/2011 05:14 AM, Tejun Heo wrote:
 Hello,

 On Thu, Jun 09, 2011 at 08:51:05AM +0930, Rusty Russell wrote:
 On Wed, 08 Jun 2011 09:08:29 -0400, Mark Wu d...@redhat.com wrote:
 Hi Rusty,
 Yes, I can't figure out an instance of disk probing in parallel either, 
 but as
 per the following commit, I think we still need use lock for safety. 
 What's your opinion?

 commit 4034cc68157bfa0b6622efe368488d3d3e20f4e6
 Author: Tejun Heo t...@kernel.org
 Date:   Sat Feb 21 11:04:45 2009 +0900

 [SCSI] sd: revive sd_index_lock

 Commit f27bac2761cab5a2e212dea602d22457a9aa6943 which converted sd to
 use ida instead of idr incorrectly removed sd_index_lock around id
 allocation and free.  idr/ida do have internal locks but they protect
 their free object lists not the allocation itself.  The caller is
 responsible for that.  This missing synchronization led to the same 
 id
 being assigned to multiple devices leading to oops.

 I'm confused.  Tejun, Greg, anyone can probes happen in parallel?

 If so, I'll have to review all my drivers.

 Unless async is explicitly used, probe happens sequentially.  IOW, if
 there's no async_schedule() call, things won't happen in parallel.
 That said, I think it wouldn't be such a bad idea to protect ida with
 spinlock regardless unless the probe code explicitly requires
 serialization.

 Thanks.

 Since virtio blk driver doesn't use async probe, it needn't use spinlock to 
 protect ida.
 So remove the lock from patch.

 From fbb396df9dbf8023f1b268be01b43529a3993d57 Mon Sep 17 00:00:00 2001
 From: Mark Wu d...@redhat.com
 Date: Thu, 9 Jun 2011 06:34:07 -0400
 Subject: [PATCH 1/1] [virt] virtio-blk: Use ida to allocate disk index

 Current index allocation in virtio-blk is based on a monotonically
 increasing variable index. It could cause some confusion about disk
 name in the case of hot-plugging disks. And it's impossible to find the
 lowest available index by just maintaining a simple index. So it's
 changed to use ida to allocate index via referring to the index
 allocation in scsi disk.

 Signed-off-by: Mark Wu d...@redhat.com

 Acked-by: Michael S. Tsirkin m...@redhat.com

 This got lost in the noise and missed 3.1 which is unfortunate.
 How about we apply this as is and look at cleanups as a next step?
 
 Rusty, any opinion on merging this for 3.2?
 I expect merge window will open right after the summit,

I can toss it into for-3.2/drivers, if there's consensus to do that now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] virtio-blk: implement -make_request

2011-10-06 Thread Jens Axboe
On Wed, Oct 05 2011, Christoph Hellwig wrote:
 Add an alternate I/O path that implements -make_request for virtio-blk.
 This is required for high IOPs devices which get slowed down to 1/5th of
 the native speed by all the locking, memory allocation and other overhead
 in the request based I/O path.

We definitely have some performance fruit hanging rather low in that
path, but a factor 5 performance difference sounds insanely excessive. I
haven't looked at virtio_blk in detail, but I've done 500K+ IOPS on
request based drivers before (yes, on real hardware). So it could be
that virtio_blk is just doing things rather suboptimally in some places,
and that it would be possible to claim most of that speedup there too.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] compat_ioctl: fix warning caused by qemu

2011-07-01 Thread Jens Axboe
On 2011-07-01 16:46, Arnd Bergmann wrote:
 Yes, that should be fine, unless Jens would like to see a different
 solution for the struct definitions, e.g. moving all of the floppy
 compat ioctl numbers to fd.h. I'm fine with it either way.

Looks OK to me, I've queued it up for 3.1 with your ack. Thanks
Johannes.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio_blk: Use blk-iopoll for host-guest notify

2010-05-18 Thread Jens Axboe
On Tue, May 18 2010, Stefan Hajnoczi wrote:
 On Fri, May 14, 2010 at 05:30:56PM -0500, Brian Jackson wrote:
  Any preliminary numbers? latency, throughput, cpu use? What about comparing 
  different weights?
 
 I am running benchmarks and will report results when they are in.

I'm very interested as well, I have been hoping for some more adoption
of this. I have mptsas and mpt2sas patches pending as well.

I have not done enough and fully exhaustive weight analysis, so note me
down for wanting such an analysis on virtio_blk as well.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-spec: document block CMD and FLUSH

2010-05-04 Thread Jens Axboe
On Tue, May 04 2010, Rusty Russell wrote:
 On Fri, 19 Feb 2010 08:52:20 am Michael S. Tsirkin wrote:
  I took a stub at documenting CMD and FLUSH request types in virtio
  block.  Christoph, could you look over this please?
  
  I note that the interface seems full of warts to me,
  this might be a first step to cleaning them.
 
 ISTR Christoph had withdrawn some patches in this area, and was waiting
 for him to resubmit?
 
 I've given up on figuring out the block device.  What seem to me to be sane
 semantics along the lines of memory barriers are foreign to disk people: they
 want (and depend on) flushing everywhere.
 
 For example, tdb transactions do not require a flush, they only require what
 I would call a barrier: that prior data be written out before any future data.
 Surely that would be more efficient in general than a flush!  In fact, TDB
 wants only writes to *that file* (and metadata) written out first; it has no
 ordering issues with other I/O on the same device.
 
 A generic I/O interface would allow you to specify this request depends on 
 these
 outstanding requests and leave it at that.  It might have some sync flush
 command for dumb applications and OSes.  The userspace API might be not be as
 precise and only allow such a barrier against all prior writes on this fd.
 
 ISTR someone mentioning a desire for such an API years ago, so CC'ing the
 usual I/O suspects...

It would be nice to have a more fuller API for this, but the reality is
that only the flush approach is really workable. Even just strict
ordering of requests could only be supported on SCSI, and even there the
kernel still lacks proper guarantees on error handling to prevent
reordering there.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add serial number support for virtio_blk, V4a

2009-06-09 Thread Jens Axboe
On Tue, Jun 09 2009, Rusty Russell wrote:
 From: john cooper john.coo...@redhat.com
 
 This patch extracts the opaque data from pci i/o
 region 0 via the added VIRTIO_BLK_F_IDENTIFY
 field.  By convention this data takes the form of
 that returned by an ATA IDENTIFY DEVICE command,
 however the driver (except for structure size)
 makes no interpretation of the data.  The structure
 data is copied wholesale to userspace via a
 HDIO_GET_IDENTITY ioctl command (eg: hdparm -i dev).

Added!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Add serial number support for virtio_blk, V4

2009-06-03 Thread Jens Axboe
On Mon, Jun 01 2009, Rusty Russell wrote:
 On Fri, 29 May 2009 01:45:27 pm john cooper wrote:
  virtio_blk-serial-4.patch
 
 Hate to ask dumb questions, but is there a scsi equivalent of this?  It'd be 
 nice if we could avoid being ATA-specific in the long run...

SCSI has mode pages, where ATA pretty much stuffs everything into the
identify data.

 Also, why u16?

The identify page is word based, so u16 makes sense.

 
 Thanks,
 Rusty.
 
  +/* return ATA identify data
  + */
  +static int virtblk_identify(struct gendisk *disk, void *argp)
  +{
  +   struct virtio_blk *vblk = disk-private_data;
  +   u16 *id;
  +   int err = -ENOMEM;
  +
  +   id = kmalloc(VIRTIO_BLK_ID_BYTES, GFP_KERNEL);
  +   if (!id)
  +   goto out;
  +
  +   err = virtio_config_buf(vblk-vdev, VIRTIO_BLK_F_IDENTIFY,
  +   offsetof(struct virtio_blk_config, identify), id,
  +   VIRTIO_BLK_ID_BYTES);
 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Fwd: [PATCH]: Fix crash in virtio_blk during modprobe ; rmmod ; modprobe]

2008-05-22 Thread Jens Axboe
On Thu, May 22 2008, Rusty Russell wrote:
 On Wednesday 21 May 2008 23:12:39 Chris Lalancette wrote:
  diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
  index 4962e62..c678ac5 100644
  --- a/drivers/block/virtio_blk.c
  +++ b/drivers/block/virtio_blk.c
  @@ -294,6 +294,7 @@ static void virtblk_remove(struct virtio_device *vdev)
  vdev-config-reset(vdev);
   
  blk_cleanup_queue(vblk-disk-queue);
  +   del_gendisk(vblk-disk);
  put_disk(vblk-disk);
  unregister_blkdev(major, virtblk);
  mempool_destroy(vblk-pool);
 
 Thanks Chris, it seems reasonable and I'm sure it works (kinda hard to
 test here, it's my root block dev).  Other drivers seem to do
 blk_cleanup_queue() *after* del_gendisk: does it matter?
 
 Jens CC'd: he's gentle with my dumb questions...  Rusty.

del_gendisk() can generate IO, so it would seem safer to do that
_before_ putting the queue reference :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html