Re: Recent kernel "mount" slow

2012-11-26 Thread Jens Axboe
On 2012-11-27 06:57, Jeff Chua wrote:
> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  wrote:
>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  wrote:
>>> So it's better to slow down mount.
>>
>> I am quite proud of the linux boot time pitting against other OS. Even
>> with 10 partitions. Linux can boot up in just a few seconds, but now
>> you're saying that we need to do this semaphore check at boot up. By
>> doing so, it's inducing additional 4 seconds during boot up.
> 
> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
> kind of degradation would this cause or just the same?

It'd likely be the same slow down time wise, but as a percentage it
would appear smaller on a slower disk.

Could you please test Mikulas' suggestion of changing
synchronize_sched() in include/linux/percpu-rwsem.h to
synchronize_sched_expedited()?

linux-next also has a re-write of the per-cpu rw sems, out of Andrews
tree. It would be a good data point it you could test that, too.

In any case, the slow down definitely isn't acceptable. Fixing an
obscure issue like block sizes changing while O_DIRECT is in flight
definitely does NOT warrant a mount slow down.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-26 Thread Jens Axboe
On 2012-11-27 08:38, Jens Axboe wrote:
> On 2012-11-27 06:57, Jeff Chua wrote:
>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  wrote:
>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>>> wrote:
>>>> So it's better to slow down mount.
>>>
>>> I am quite proud of the linux boot time pitting against other OS. Even
>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>> you're saying that we need to do this semaphore check at boot up. By
>>> doing so, it's inducing additional 4 seconds during boot up.
>>
>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>> kind of degradation would this cause or just the same?
> 
> It'd likely be the same slow down time wise, but as a percentage it
> would appear smaller on a slower disk.
> 
> Could you please test Mikulas' suggestion of changing
> synchronize_sched() in include/linux/percpu-rwsem.h to
> synchronize_sched_expedited()?
> 
> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
> tree. It would be a good data point it you could test that, too.
> 
> In any case, the slow down definitely isn't acceptable. Fixing an
> obscure issue like block sizes changing while O_DIRECT is in flight
> definitely does NOT warrant a mount slow down.

Here's Olegs patch, might be easier for you than switching to
linux-next. Please try that.

From: Oleg Nesterov 
Subject: percpu_rw_semaphore: reimplement to not block the readers unnecessarily

Currently the writer does msleep() plus synchronize_sched() 3 times to
acquire/release the semaphore, and during this time the readers are
blocked completely.  Even if the "write" section was not actually started
or if it was already finished.

With this patch down_write/up_write does synchronize_sched() twice and
down_read/up_read are still possible during this time, just they use the
slow path.

percpu_down_write() first forces the readers to use rw_semaphore and
increment the "slow" counter to take the lock for reading, then it
takes that rw_semaphore for writing and blocks the readers.

Also.  With this patch the code relies on the documented behaviour of
synchronize_sched(), it doesn't try to pair synchronize_sched() with
barrier.

Signed-off-by: Oleg Nesterov 
Reviewed-by: Paul E. McKenney 
Cc: Linus Torvalds 
Cc: Mikulas Patocka 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Srikar Dronamraju 
Cc: Ananth N Mavinakayanahalli 
Cc: Anton Arapov 
Cc: Jens Axboe 
Signed-off-by: Andrew Morton 
---

 include/linux/percpu-rwsem.h |   85 +++---
 lib/Makefile |2 
 lib/percpu-rwsem.c   |  123 +
 3 files changed, 138 insertions(+), 72 deletions(-)

diff -puN 
include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily
 include/linux/percpu-rwsem.h
--- 
a/include/linux/percpu-rwsem.h~percpu_rw_semaphore-reimplement-to-not-block-the-readers-unnecessarily
+++ a/include/linux/percpu-rwsem.h
@@ -2,82 +2,25 @@
 #define _LINUX_PERCPU_RWSEM_H
 
 #include 
+#include 
 #include 
-#include 
-#include 
+#include 
 
 struct percpu_rw_semaphore {
-   unsigned __percpu *counters;
-   bool locked;
-   struct mutex mtx;
+   unsigned int __percpu   *fast_read_ctr;
+   struct mutexwriter_mutex;
+   struct rw_semaphore rw_sem;
+   atomic_tslow_read_ctr;
+   wait_queue_head_t   write_waitq;
 };
 
-#define light_mb() barrier()
-#define heavy_mb() synchronize_sched()
+extern void percpu_down_read(struct percpu_rw_semaphore *);
+extern void percpu_up_read(struct percpu_rw_semaphore *);
 
-static inline void percpu_down_read(struct percpu_rw_semaphore *p)
-{
-   rcu_read_lock_sched();
-   if (unlikely(p->locked)) {
-   rcu_read_unlock_sched();
-   mutex_lock(&p->mtx);
-   this_cpu_inc(*p->counters);
-   mutex_unlock(&p->mtx);
-   return;
-   }
-   this_cpu_inc(*p->counters);
-   rcu_read_unlock_sched();
-   light_mb(); /* A, between read of p->locked and read of data, paired 
with D */
-}
-
-static inline void percpu_up_read(struct percpu_rw_semaphore *p)
-{
-   light_mb(); /* B, between read of the data and write to p->counter, 
paired with C */
-   this_cpu_dec(*p->counters);
-}
-
-static inline unsigned __percpu_count(unsigned __percpu *counters)
-{
-   unsigned total = 0;
-   int cpu;
-
-   for_each_possible_cpu(cpu)
-   total += ACCESS_ONCE(*per_cpu_ptr(counters, cpu));
-
-   return total;
-}
-
-static inline void percpu_down_write(struct percpu_rw_semaphore *p)
-{
-   mutex_lock(&a

Re: Alignment Issue with Direct IO to NVMe Drive

2012-11-27 Thread Jens Axboe
On 2012-11-27 01:35, Laine Walker-Avina wrote:
> Hi all,
> 
> We are experiencing an issue with doing direct IO to a NVMe device I'm
> helping to develop. Every so often, the physical address given by
> sg_dma_address() is aligned to 0x800 instead of 0x1000 as specified by
> blk_queue_dma_alignement(queue, 4095) when the queue is initialized.
> The request is also split over multiple segments to make up for the
> missing space (eg: for a 4k IO it's split into two segments 2k in
> size, and for an 8k IO it's split into 3 segments--2k,4k,2k). Our
> design requires the physical segments given to the device be aligned
> to 4k boundaries and be multiples of 4k in size. When not doing direct
> IO the physical addresses appear to always be 4k aligned as expected.
> One possible issue is the kernel we're primarily testing against is
> 2.6.32-220 from CentOS, but we have observed similar behavior from a
> vanilla 3.3 kernel as well. Any help would be greatly appreciated.

I'm assuming you set the hardware sector size to 4k as well?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-27 Thread Jens Axboe
On 2012-11-27 11:06, Jeff Chua wrote:
> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe  wrote:
>> On 2012-11-27 06:57, Jeff Chua wrote:
>>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  
>>> wrote:
>>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>>>> wrote:
>>>>> So it's better to slow down mount.
>>>>
>>>> I am quite proud of the linux boot time pitting against other OS. Even
>>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>>> you're saying that we need to do this semaphore check at boot up. By
>>>> doing so, it's inducing additional 4 seconds during boot up.
>>>
>>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>>> kind of degradation would this cause or just the same?
>>
>> It'd likely be the same slow down time wise, but as a percentage it
>> would appear smaller on a slower disk.
>>
>> Could you please test Mikulas' suggestion of changing
>> synchronize_sched() in include/linux/percpu-rwsem.h to
>> synchronize_sched_expedited()?
> 
> Tested. It seems as fast as before, but may be a "tick" slower. Just
> perception. I was getting pretty much 0.012s with everything reverted.
> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s.
> So, it's good.

Excellent

>> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
>> tree. It would be a good data point it you could test that, too.
> 
> Tested. It's slower. 0.350s. But still faster than 0.500s without the patch.

Makes sense, it's 2 synchronize_sched() instead of 3. So it doesn't fix
the real issue, which is having to do synchronize_sched() in the first
place.

> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt
> 
> 
> So, here's the comparison ...
> 
> 0.500s 3.7.0-rc7
> 0.168s 3.7.0-rc2
> 0.012s 3.6.0
> 0.013s 3.7.0-rc7 + synchronize_sched_expedited()
> 0.350s 3.7.0-rc7 + Oleg's patch.

I wonder how many of them are due to changing to the same block size.
Does the below patch make a difference?

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 1a1e5e3..f041c56 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -126,29 +126,28 @@ int set_blocksize(struct block_device *bdev, int size)
if (size < bdev_logical_block_size(bdev))
return -EINVAL;
 
-   /* Prevent starting I/O or mapping the device */
-   percpu_down_write(&bdev->bd_block_size_semaphore);
-
/* Check that the block device is not memory mapped */
mapping = bdev->bd_inode->i_mapping;
mutex_lock(&mapping->i_mmap_mutex);
if (mapping_mapped(mapping)) {
mutex_unlock(&mapping->i_mmap_mutex);
-   percpu_up_write(&bdev->bd_block_size_semaphore);
return -EBUSY;
}
mutex_unlock(&mapping->i_mmap_mutex);
 
/* Don't change the size if it is same as current */
if (bdev->bd_block_size != size) {
-   sync_blockdev(bdev);
-   bdev->bd_block_size = size;
-   bdev->bd_inode->i_blkbits = blksize_bits(size);
-   kill_bdev(bdev);
+   /* Prevent starting I/O */
+   percpu_down_write(&bdev->bd_block_size_semaphore);
+   if (bdev->bd_block_size != size) {
+   sync_blockdev(bdev);
+   bdev->bd_block_size = size;
+   bdev->bd_inode->i_blkbits = blksize_bits(size);
+   kill_bdev(bdev);
+   }
+   percpu_up_write(&bdev->bd_block_size_semaphore);
}
 
-   percpu_up_write(&bdev->bd_block_size_semaphore);
-
return 0;
 }
 
@@ -1649,14 +1648,12 @@ EXPORT_SYMBOL_GPL(blkdev_aio_write);
 
 static int blkdev_mmap(struct file *file, struct vm_area_struct *vma)
 {
+   struct address_space *mapping = file->f_mapping;
int ret;
-   struct block_device *bdev = I_BDEV(file->f_mapping->host);
-
-   percpu_down_read(&bdev->bd_block_size_semaphore);
 
+   mutex_lock(&mapping->i_mmap_mutex);
ret = generic_file_mmap(file, vma);
-
-   percpu_up_read(&bdev->bd_block_size_semaphore);
+   mutex_unlock(&mapping->i_mmap_mutex);
 
return ret;
 }

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-28 Thread Jens Axboe
On 2012-11-28 04:57, Mikulas Patocka wrote:
> 
> 
> On Tue, 27 Nov 2012, Jens Axboe wrote:
> 
>> On 2012-11-27 11:06, Jeff Chua wrote:
>>> On Tue, Nov 27, 2012 at 3:38 PM, Jens Axboe  wrote:
>>>> On 2012-11-27 06:57, Jeff Chua wrote:
>>>>> On Sun, Nov 25, 2012 at 7:23 AM, Jeff Chua  
>>>>> wrote:
>>>>>> On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  
>>>>>> wrote:
>>>>>>> So it's better to slow down mount.
>>>>>>
>>>>>> I am quite proud of the linux boot time pitting against other OS. Even
>>>>>> with 10 partitions. Linux can boot up in just a few seconds, but now
>>>>>> you're saying that we need to do this semaphore check at boot up. By
>>>>>> doing so, it's inducing additional 4 seconds during boot up.
>>>>>
>>>>> By the way, I'm using a pretty fast SSD (Samsung PM830) and fast CPU
>>>>> (2.8GHz). I wonder if those on slower hard disk or slower CPU, what
>>>>> kind of degradation would this cause or just the same?
>>>>
>>>> It'd likely be the same slow down time wise, but as a percentage it
>>>> would appear smaller on a slower disk.
>>>>
>>>> Could you please test Mikulas' suggestion of changing
>>>> synchronize_sched() in include/linux/percpu-rwsem.h to
>>>> synchronize_sched_expedited()?
>>>
>>> Tested. It seems as fast as before, but may be a "tick" slower. Just
>>> perception. I was getting pretty much 0.012s with everything reverted.
>>> With synchronize_sched_expedited(), it seems to be 0.012s ~ 0.013s.
>>> So, it's good.
>>
>> Excellent
>>
>>>> linux-next also has a re-write of the per-cpu rw sems, out of Andrews
>>>> tree. It would be a good data point it you could test that, too.
>>>
>>> Tested. It's slower. 0.350s. But still faster than 0.500s without the patch.
>>
>> Makes sense, it's 2 synchronize_sched() instead of 3. So it doesn't fix
>> the real issue, which is having to do synchronize_sched() in the first
>> place.
>>
>>> # time mount /dev/sda1 /mnt; sync; sync; umount /mnt
>>>
>>>
>>> So, here's the comparison ...
>>>
>>> 0.500s 3.7.0-rc7
>>> 0.168s 3.7.0-rc2
>>> 0.012s 3.6.0
>>> 0.013s 3.7.0-rc7 + synchronize_sched_expedited()
>>> 0.350s 3.7.0-rc7 + Oleg's patch.
>>
>> I wonder how many of them are due to changing to the same block size.
>> Does the below patch make a difference?
> 
> This patch is wrong because you must check if the device is mapped while 
> holding bdev->bd_block_size_semaphore (because 
> bdev->bd_block_size_semaphore prevents new mappings from being created)

No it doesn't. If you read the patch, that was moved to i_mmap_mutex.

> I'm sending another patch that has the same effect.
> 
> 
> Note that ext[234] filesystems set blocksize to 1024 temporarily during 
> mount, so it doesn't help much (it only helps for other filesystems, such 
> as jfs). For ext[234], you have a device with default block size 4096, the 
> filesystem sets block size to 1024 during mount, reads the super block and 
> sets it back to 4096.

That is true, hence I was hesitant to think it'll actually help. In any
case, basically any block device will have at least one blocksize
transitioned when being mounted for the first time. I wonder if we just
shouldn't default to having a 4kb soft block size to avoid that one,
though it is working around the issue to some degree.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/62] block: convert to idr_alloc()

2013-02-04 Thread Jens Axboe
On Sat, Feb 02 2013, Tejun Heo wrote:
> Convert to the much saner new idr interface.  Both bsg and genhd
> protect idr w/ mutex making preloading unnecessary.
> 
> Signed-off-by: Tejun Heo 
> Cc: Jens Axboe 
> ---
> This patch depends on an earlier idr changes and I think it would be
> best to route these together through -mm.  Please holler if there's
> any objection.  Thanks.

Acked-by: Jens Axboe 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/62] block/loop: convert to idr_alloc()

2013-02-04 Thread Jens Axboe
On Sat, Feb 02 2013, Tejun Heo wrote:
> Convert to the much saner new idr interface.
> 
> Signed-off-by: Tejun Heo 
> Cc: Jens Axboe 
> ---
> This patch depends on an earlier idr changes and I think it would be
> best to route these together through -mm.  Please holler if there's
> any objection.  Thanks.

Acked-by: Jens Axboe 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 1/1] block: IBM RamSan 70/80 device driver.

2013-02-05 Thread Jens Axboe
On Fri, Feb 01 2013, Philip J. Kelleher wrote:
> From: Joshua H Morris 
>   Philip J Kelleher 
> 
> This patch includes the device driver for the IBM RamSan family
> of PCI SSD flash storage cards. This driver will inlcude support for the
> RamSan 70 and 80. The driver presents a block device for device I/O.
> 
> Signed-off-by: Philip J Kelleher 
> ---
> This update addresses all issues raised thus far. Changes include:
> o Moved blk_queue_max_discard_sectors into the 'discard supported' if
>   statement.
> o Changed email address to linux.vnet.ibm.com

Tentatively applied for 3.9.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] drbd build fix in case CONFIG_CRYPTO_HMAC is not set

2013-02-05 Thread Jens Axboe
On Mon, Feb 04 2013, Philipp Reisner wrote:
> The following changes since commit d88c3ab963d4cce09b25ef661b871bd7af6dad0d:
> 
>   drbd: only fail empty flushes if no good data is reachable (2013-01-30 
> 10:40:33 +0100)
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens-3.8-fix
> 
> for you to fetch changes up to 78aa7987a223e8542f2735dace439690c6171ac5:
> 
>   drbd: Fix build error when CONFIG_CRYPTO_HMAC is not set (2013-02-04 
> 18:14:03 +0100)

NOT pulled. Base the branch off my for-linus. When I pull the below into
that, I don't get a small patch, I get all changes from 3.7 to 3.8-rc5.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: For the condition "file->f_mode", when it failed, it should return EACCES rather than EBADF.

2013-02-05 Thread Jens Axboe
On Sun, Feb 03 2013, majianpeng wrote:
> Hi all,
>   When I wanted to do discard operations,but i set the  openflag was 
> O_RDONLY,it returned a EBADF rather than EACCES or EPERM.
> I searched the code and found:
> >case BLKDISCARD:
> >case BLKSECDISCARD: {
> > uint64_t range[2];
> 
> > if (!(mode & FMODE_WRITE))
> > return -EBADF;
> Initial i thought there was error.But i searched all code of kernel and found 
> some places like this.
> 
> The description of EBADF is "Bad file numbe". There are some places where 
> returned EBADF like,
> >if (!f.file)
> > return -EBADF;
> 
> So i think for checking file->f_mode when failed, it should return EACCESS.

But that would break the ABI at this point. I agree with you, though,
EBADF is not the right error for this case.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: For the condition "file->f_mode", when it failed, it should return EACCES rather than EBADF.

2013-02-05 Thread Jens Axboe
On Tue, Feb 05 2013, majianpeng wrote:
> >On Sun, Feb 03 2013, majianpeng wrote:
> >> Hi all,
> >>When I wanted to do discard operations,but i set the  openflag was 
> >> O_RDONLY,it returned a EBADF rather than EACCES or EPERM.
> >> I searched the code and found:
> >> >case BLKDISCARD:
> >> >case BLKSECDISCARD: {
> >> >  uint64_t range[2];
> >> 
> >> >  if (!(mode & FMODE_WRITE))
> >> >  return -EBADF;
> >> Initial i thought there was error.But i searched all code of kernel and 
> >> found some places like this.
> >> 
> >> The description of EBADF is "Bad file numbe". There are some places where 
> >> returned EBADF like,
> >> >if (!f.file)
> >> >  return -EBADF;
> >> 
> >> So i think for checking file->f_mode when failed, it should return EACCESS.
> >
> >But that would break the ABI at this point. I agree with you, though,
> >EBADF is not the right error for this case.
> >
> >-- 
> >Jens Axboe
> >
> Sorry, can you explain in detail? Why can it break the ABI ?

Applications already depending on EBADF being returned for attempt to
discard on a file descriptor not opened for write. Granted it's a slim
possiblity, but it exists.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the block tree

2013-02-06 Thread Jens Axboe
On Wed, Feb 06 2013, Stephen Rothwell wrote:
> Hi Jens,
> 
> After merging the block tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/block/rsxx/core.c:322:22: error: expected '=', ',', ';', 'asm' or 
> '__attribute__' before 'rsxx_pci_probe'
> drivers/block/rsxx/core.c:513:23: error: expected '=', ',', ';', 'asm' or 
> '__attribute__' before 'rsxx_pci_remove'
> drivers/block/rsxx/core.c:610:12: error: 'rsxx_pci_probe' undeclared here 
> (not in a function)
> drivers/block/rsxx/core.c:611:2: error: implicit declaration of function 
> '__devexit_p' [-Werror=implicit-function-declaration]
> drivers/block/rsxx/core.c:611:25: error: 'rsxx_pci_remove' undeclared here 
> (not in a function)
> drivers/block/rsxx/core.c:51:8: warning: 'rsxx_disk_ida' defined but not used 
> [-Wunused-variable]
> drivers/block/rsxx/core.c:52:8: warning: 'rsxx_ida_lock' defined but not used 
> [-Wunused-variable]
> drivers/block/rsxx/core.c:219:13: warning: 'card_event_handler' defined but 
> not used [-Wunused-function]
> drivers/block/rsxx/core.c:311:12: warning: 'rsxx_compatibility_check' defined 
> but not used [-Wunused-function]
> 
> Caused by commit 8722ff8cdbfa ("block: IBM RamSan 70/80 device driver")
> interacting with commit 54b956b90360 ("Remove __dev* markings from
> init.h") from Linus' tree (as of v3.8-rc4).
> 
> I added the following merge fix patch:

Thanks Stephen, I merged it into the for-3.9/drivers branch and updated
for-next.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 06/62] block: fix synchronization and limit check in blk_alloc_devt()

2013-02-06 Thread Jens Axboe
On Sat, Feb 02 2013, Tejun Heo wrote:
> idr allocation in blk_alloc_devt() wasn't synchronized against lookup
> and removal, and its limit check was off by one - 1 << MINORBITS is
> the number of minors allowed, not the maximum allowed minor.
> 
> Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit
> checking.
> 
> Signed-off-by: Tejun Heo 
> Cc: Jens Axboe 
> Cc: sta...@vger.kernel.org

Acked-by: Jens Axboe 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Block bits for 3.8 final

2013-02-06 Thread Jens Axboe
Hi Linus,

I've got a few bits pending for 3.8 final, that I better get sent out.
It's all been sitting for a while, I consider it safe.

It contains:

- Two bug fixes for mtip32xx, fixing a driver hang and a crash.

- A few-liner protocol error fix for drbd.

- A few fixes for the xen block front/back driver, fixing a potential
  data corruption issue.

- A race fix for disk_clear_events(), causing spurious warnings. Out of
  the Chrome OS base.

- A deadlock fix for disk_clear_events(), moving it to the a unfreezable
  workqueue. Also from the Chrome OS base.

Please pull!

  git://git.kernel.dk/linux-block.git for-linus


Asai Thambi S P (2):
  mtip32xx: fix for driver hang after a command timeout
  mtip32xx: fix for crash when the device surprise removed during rebuild

Derek Basehore (2):
  block: remove deadlock in disk_clear_events
  block: prevent race/cleanup

Jens Axboe (2):
  Merge branch 'stable/for-jens-3.8' of git://git.kernel.org/.../konrad/xen 
into for-linus
  Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-linus

Lars Ellenberg (1):
  drbd: fix potential protocol error and resulting disconnect/reconnect

Roger Pau Monne (3):
  xen-blkback: implement safe iterator for the list of persistent grants
  llist/xen-blkfront: implement safe version of llist_for_each_entry
  xen-blkfront: handle bvecs with partial data

 block/genhd.c   | 42 -
 drivers/block/drbd/drbd_req.c   |  2 +-
 drivers/block/drbd/drbd_req.h   |  1 +
 drivers/block/drbd/drbd_state.c |  7 +++
 drivers/block/mtip32xx/mtip32xx.c   | 24 +++--
 drivers/block/xen-blkback/blkback.c | 18 +---
 drivers/block/xen-blkfront.c| 10 +
 include/linux/llist.h   | 25 ++
 8 files changed, 101 insertions(+), 28 deletions(-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/15] drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency

2013-02-06 Thread Jens Axboe
On Wed, Feb 06 2013, Heiko Carstens wrote:
> The MTIP32XX driver calls devm_request_irq() and therefore needs a
> GENERIC_HARDIRQS dependency to prevent building it on s390.

I'll queue this up for 3.9, thanks Heiko.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 5/8] scatterlist: introduce sg_unmark_end

2013-02-07 Thread Jens Axboe
On Thu, Feb 07 2013, Paolo Bonzini wrote:
> This is useful in places that recycle the same scatterlist multiple
> times, and do not want to incur the cost of sg_init_table every
> time in hot paths.

Looks fine to me.

Acked-by: Jens Axboe 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/8] virtio: new API for addition of buffers, scatterlist changes

2013-02-08 Thread Jens Axboe
On Fri, Feb 08 2013, Rusty Russell wrote:
> Paolo Bonzini  writes:
> > The virtqueue_add_buf function has two limitations:
> >
> > 1) it requires the caller to provide all the buffers in a single call;
> >
> > 2) it does not support chained scatterlists: the buffers must be
> > provided as an array of struct scatterlist.
> >
> > Because of these limitations, virtio-scsi has to copy each request into
> > a scatterlist internal to the driver.  It cannot just use the one that
> > was prepared by the upper SCSI layers.
> 
> Hi Paulo,
> 
> Note that you've defined your problem in terms of your solution
> here.  For clarity:
> 
> The problem: we want to prepend and append to a scatterlist.  We can't
> append, because the chained scatterlist implementation requires
> an element to be appended to join two scatterlists together.
> 
> The solution: fix scatterlists by introducing struct sg_ring:
> struct sg_ring {
> struct list_head ring;
>   unsigned int nents;
>   unsigned int orig_nents; /* do we want to replace sg_table? */
> struct scatterlist *sg;
> };

This would definitely be more flexible than the current chaining.
However:

> The workaround: make virtio accept multiple scatterlists for a single
> buffer.
> 
> There's nothing wrong with your workaround, but if other subsystems have
> the same problem we do, perhaps we should consider a broader solution?

Do other use cases actually exist? I don't think I've come across this
requirement before, since it was introduced (6 years ago, from a cursory
look at the git logs!).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/32] Generic dynamic per cpu refcounting

2013-02-08 Thread Jens Axboe
On Fri, Feb 08 2013, Tejun Heo wrote:
> (cc'ing Andrew)
> 
> On Wed, Dec 26, 2012 at 06:00:02PM -0800, Kent Overstreet wrote:
> > This implements a refcount with similar semantics to
> > atomic_get()/atomic_dec_and_test(), that starts out as just an atomic_t
> > but dynamically switches to per cpu refcounting when the rate of
> > gets/puts becomes too high.
> > 
> > It also implements two stage shutdown, as we need it to tear down the
> > percpu counts. Before dropping the initial refcount, you must call
> > percpu_ref_kill(); this puts the refcount in "shutting down mode" and
> > switches back to a single atomic refcount with the appropriate barriers
> > (synchronize_rcu()).
> > 
> > It's also legal to call percpu_ref_kill() multiple times - it only
> > returns true once, so callers don't have to reimplement shutdown
> > synchronization.
> > 
> > For the sake of simplicity/efficiency, the heuristic is pretty simple -
> > it just switches to percpu refcounting if there are more than x gets
> > in one second (completely arbitrarily, 4096).
> > 
> > It'd be more correct to count the number of cache misses or something
> > else more profile driven, but doing so would require accessing the
> > shared ref twice per get - by just counting the number of gets(), we can
> > stick that counter in the high bits of the refcount and increment both
> > with a single atomic64_add(). But I expect this'll be good enough in
> > practice.
> > 
> > Signed-off-by: Kent Overstreet 
> 
> What's the status of this series?  The percpu-refcnt part is still
> going through review and the merge window is opening up pretty soon.
> Kent, Andrew?

I'd feel a lot better deferring the whole aio/dio performance series for
one merge window. There's very little point in rushing it, and I don't
think it's been reviewed/tested enough yet.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread Jens Axboe
On Tue, Apr 02 2013, Dave Chinner wrote:
> [Added jens Axboe to CC]
> 
> On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > Saw on almost all the servers range from x64, ppc64 and s390x with kernel
> > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks like
> > something new broke this. Log is here with sysrq debug info.
> > http://people.redhat.com/qcai/stable/log

CAI Qian, can you try and back the below out and test again?


commit 8761a3dc1f07b163414e2215a2cadbb4cfe2a107
Author: Phillip Susi 
Date:   Fri Mar 22 12:21:53 2013 -0600

loop: cleanup partitions when detaching loop device

Any partitions added by user space to the loop device were being
left in place after detaching the loop device.  This was because
the detach path issued a BLKRRPART to clean up partitions if
LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
scanned on attach.  Replace this BLKRRPART with code that
unconditionally cleans up partitions on detach instead.

Signed-off-by: Phillip Susi 

Modified by Jens to export delete_partition().
    
Signed-off-by: Jens Axboe 

diff --git a/block/partition-generic.c b/block/partition-generic.c
index 789cdea..ae95ee6 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -257,6 +257,7 @@ void delete_partition(struct gendisk *disk, int partno)
 
hd_struct_put(part);
 }
+EXPORT_SYMBOL(delete_partition);
 
 static ssize_t whole_disk_show(struct device *dev,
   struct device_attribute *attr, char *buf)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ee13a82..fe5f640 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1044,12 +1044,29 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_unbound;
/* This is safe: open() is still holding a reference. */
module_put(THIS_MODULE);
-   if (lo->lo_flags & LO_FLAGS_PARTSCAN && bdev)
-   ioctl_by_bdev(bdev, BLKRRPART, 0);
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
mutex_unlock(&lo->lo_ctl_mutex);
+
+   /*
+* Remove all partitions, since BLKRRPART won't remove user
+* added partitions when max_part=0
+*/
+   if (bdev) {
+   struct disk_part_iter piter;
+   struct hd_struct *part;
+
+   mutex_lock_nested(&bdev->bd_mutex, 1);
+   invalidate_partition(bdev->bd_disk, 0);
+   disk_part_iter_init(&piter, bdev->bd_disk,
+   DISK_PITER_INCL_EMPTY);
+   while ((part = disk_part_iter_next(&piter)))
+   delete_partition(bdev->bd_disk, part->partno);
+   disk_part_iter_exit(&piter);
+   mutex_unlock(&bdev->bd_mutex);
+   }
+
/*
 * Need not hold lo_ctl_mutex to fput backing file.
 * Calling fput holding lo_ctl_mutex triggers a circular

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread Jens Axboe
On Tue, Apr 02 2013, Jens Axboe wrote:
> On Tue, Apr 02 2013, Dave Chinner wrote:
> > [Added jens Axboe to CC]
> > 
> > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > Saw on almost all the servers range from x64, ppc64 and s390x with kernel
> > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks like
> > > something new broke this. Log is here with sysrq debug info.
> > > http://people.redhat.com/qcai/stable/log
> 
> CAI Qian, can you try and back the below out and test again?

Nevermind, it's clearly that one. The below should improve the
situation, but it's not pretty. A better fix would be to allow
auto-deletion even if PART_NO_SCAN is set.

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..d6c5764 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
struct disk_part_iter piter;
struct hd_struct *part;
 
-   mutex_lock_nested(&bdev->bd_mutex, 1);
-   invalidate_partition(bdev->bd_disk, 0);
-   disk_part_iter_init(&piter, bdev->bd_disk,
-   DISK_PITER_INCL_EMPTY);
-   while ((part = disk_part_iter_next(&piter)))
-   delete_partition(bdev->bd_disk, part->partno);
-   disk_part_iter_exit(&piter);
-   mutex_unlock(&bdev->bd_mutex);
+   if (mutex_trylock(&bdev->bd_mutex, 1))
+   invalidate_partition(bdev->bd_disk, 0);
+   disk_part_iter_init(&piter, bdev->bd_disk,
+   DISK_PITER_INCL_EMPTY);
+   while ((part = disk_part_iter_next(&piter)))
+   delete_partition(bdev->bd_disk, part->partno);
+       disk_part_iter_exit(&piter);
+   mutex_unlock(&bdev->bd_mutex);
+   }
}
 
/*

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread Jens Axboe
On Tue, Apr 02 2013, CAI Qian wrote:
> 
> 
> - Original Message -
> > From: "Jens Axboe" 
> > To: "Dave Chinner" 
> > Cc: "CAI Qian" , x...@oss.sgi.com, "LKML" 
> > 
> > Sent: Tuesday, April 2, 2013 3:30:35 PM
> > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
> > xfstests case #78]
> > 
> > On Tue, Apr 02 2013, Jens Axboe wrote:
> > > On Tue, Apr 02 2013, Dave Chinner wrote:
> > > > [Added jens Axboe to CC]
> > > > 
> > > > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > > > Saw on almost all the servers range from x64, ppc64 and s390x with
> > > > > kernel
> > > > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so looks
> > > > > like
> > > > > something new broke this. Log is here with sysrq debug info.
> > > > > http://people.redhat.com/qcai/stable/log
> > > 
> > > CAI Qian, can you try and back the below out and test again?
> > 
> > Nevermind, it's clearly that one. The below should improve the
> > situation, but it's not pretty. A better fix would be to allow
> > auto-deletion even if PART_NO_SCAN is set.
> Jens, when compiled the mainline (up to fefcdbe) with this patch,
> it error-ed out,

Looks like I sent the wrong one, updated below.

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index fe5f640..faa3afa 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1057,14 +1057,15 @@ static int loop_clr_fd(struct loop_device *lo)
struct disk_part_iter piter;
struct hd_struct *part;
 
-   mutex_lock_nested(&bdev->bd_mutex, 1);
-   invalidate_partition(bdev->bd_disk, 0);
-   disk_part_iter_init(&piter, bdev->bd_disk,
-   DISK_PITER_INCL_EMPTY);
-   while ((part = disk_part_iter_next(&piter)))
-   delete_partition(bdev->bd_disk, part->partno);
-   disk_part_iter_exit(&piter);
-   mutex_unlock(&bdev->bd_mutex);
+   if (mutex_trylock(&bdev->bd_mutex)) {
+   invalidate_partition(bdev->bd_disk, 0);
+   disk_part_iter_init(&piter, bdev->bd_disk,
+       DISK_PITER_INCL_EMPTY);
+   while ((part = disk_part_iter_next(&piter)))
+   delete_partition(bdev->bd_disk, part->partno);
+   disk_part_iter_exit(&piter);
+   mutex_unlock(&bdev->bd_mutex);
+   }
}
 
/*

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-02 Thread Jens Axboe
On Tue, Apr 02 2013, CAI Qian wrote:
> 
> 
> - Original Message -
> > From: "Jens Axboe" 
> > To: "CAI Qian" 
> > Cc: "Dave Chinner" , x...@oss.sgi.com, "LKML" 
> > 
> > Sent: Tuesday, April 2, 2013 5:00:47 PM
> > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running 
> > xfstests case #78]
> > 
> > On Tue, Apr 02 2013, CAI Qian wrote:
> > > 
> > > 
> > > - Original Message -
> > > > From: "Jens Axboe" 
> > > > To: "Dave Chinner" 
> > > > Cc: "CAI Qian" , x...@oss.sgi.com, "LKML"
> > > > 
> > > > Sent: Tuesday, April 2, 2013 3:30:35 PM
> > > > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5
> > > > running xfstests case #78]
> > > > 
> > > > On Tue, Apr 02 2013, Jens Axboe wrote:
> > > > > On Tue, Apr 02 2013, Dave Chinner wrote:
> > > > > > [Added jens Axboe to CC]
> > > > > > 
> > > > > > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > > > > > Saw on almost all the servers range from x64, ppc64 and s390x with
> > > > > > > kernel
> > > > > > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so 
> > > > > > > looks
> > > > > > > like
> > > > > > > something new broke this. Log is here with sysrq debug info.
> > > > > > > http://people.redhat.com/qcai/stable/log
> > > > > 
> > > > > CAI Qian, can you try and back the below out and test again?
> > > > 
> > > > Nevermind, it's clearly that one. The below should improve the
> > > > situation, but it's not pretty. A better fix would be to allow
> > > > auto-deletion even if PART_NO_SCAN is set.
> > > Jens, when compiled the mainline (up to fefcdbe) with this patch,
> > > it error-ed out,
> > 
> > Looks like I sent the wrong one, updated below.
> The patch works well. Thanks!

Thanks for testing! I don't particularly like this stuff in loop,
though. It's quite nasty and depends on other behaviour. It would be
prettier if we just had rescan_partitions() do the right thing, and only
drop partitions and not rescan if NO_PART_SCAN is set.

Ala the below, dropping the loop change and implementing that change in
the core code. Phillip, can you check whether this does the right thing
for your bug too?

diff --git a/block/ioctl.c b/block/ioctl.c
index a31d91d..8b78b5a 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -155,7 +155,7 @@ static int blkdev_reread_part(struct block_device *bdev)
struct gendisk *disk = bdev->bd_disk;
int res;
 
-   if (!disk_part_scan_enabled(disk) || bdev != bdev->bd_contains)
+   if (bdev != bdev->bd_contains)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
diff --git a/block/partition-generic.c b/block/partition-generic.c
index ae95ee6..bf4bb60 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -431,6 +431,15 @@ rescan:
disk->fops->revalidate_disk(disk);
check_disk_size_change(disk, bdev);
bdev->bd_invalidated = 0;
+
+   /*
+* If partition scanning is disabled, we are done.
+*/
+   if (!disk_part_scan_enabled(disk)) {
+   kobject_uevent(&disk_to_dev(disk)->kobj, KOBJ_CHANGE);
+   return 0;
+   }
+
if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
return 0;
if (IS_ERR(state)) {
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2c127f9..8b6df76 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1057,24 +1057,6 @@ static int loop_clr_fd(struct loop_device *lo)
mutex_unlock(&lo->lo_ctl_mutex);
 
/*
-* Remove all partitions, since BLKRRPART won't remove user
-* added partitions when max_part=0
-*/
-   if (bdev) {
-   struct disk_part_iter piter;
-   struct hd_struct *part;
-
-   mutex_lock_nested(&bdev->bd_mutex, 1);
-   invalidate_partition(bdev->bd_disk, 0);
-   disk_part_iter_init(&piter, bdev->bd_disk,
-   DISK_PITER_INCL_EMPTY);
-   while ((part = disk_part_iter_next(&piter)))
-   delete_partition(bdev->bd_disk, part->partno);
-   disk_part_iter_exit(&piter);
-   mutex_unlock(&bdev->bd_mutex);
-   }
-
-   /*
 * Need not hold lo_ctl_mutex to fput backing file.
 * Calling fput holding lo_ctl_mutex triggers a circular
 * lock dependency possibility warning as fput can take

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] writeback: convert writeback to unbound workqueue

2013-04-02 Thread Jens Axboe
On Mon, Apr 01 2013, Tejun Heo wrote:
> Hello, Jens.
> 
> This is the pull request for the earlier patchset[1] with the same
> name.  It's only three patches (the first one was committed to
> workqueue tree) but the merge strategy is a bit involved due to the
> dependencies.
> 
> * Because the conversion needs features from wq/for-3.10,
>   block/for-3.10/core is based on rc3, and wq/for-3.10 has conflicts
>   with rc3, I pulled mainline (rc5) into wq/for-3.10 to prevent those
>   workqueue conflicts from flaring up in block tree.
> 
> * Resolving the issue that Jan and Dave raised about debugging
>   requires arch-wide changes.  The patchset is being worked on[2] but
>   it'll have to go through -mm after these changes show up in -next,
>   and not included in this pull request.
> 
> The three commits are located in the following git branch.
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git writeback-workqueue
> 
> Pulling it into block/for-3.10/core produces a conflict in
> drivers/md/raid5.c between the following two commits.
> 
>   e3620a3ad5 ("MD RAID5: Avoid accessing gendisk or queue structs when not 
> available")
>   2f6db2a707 ("raid5: use bio_reset()")
> 
> The conflict is trivial - one removes an "if ()" conditional while the
> other removes "rbi->bi_next = NULL" right above it.  We just need to
> remove both.  The merged branch is available at
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git block-test-merge
> 
> so that you can use it for verification.  The test merge commit has
> proper merge description.
> 
> While these changes are a bit of pain to route, they make code simpler
> and even have, while minute, measureable performance gain[3] even on a
> workload which isn't particularly favorable to showing the benefits of
> this conversion.

Thanks, pulled in for testing. We'll need the debug change in before
sending this upstream, though. I agree with Jan/Dave that this is
required functionality, for debugging purposes.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 running xfstests case #78]

2013-04-03 Thread Jens Axboe
On Tue, Apr 02 2013, Jens Axboe wrote:
> On Tue, Apr 02 2013, CAI Qian wrote:
> > 
> > 
> > - Original Message -----
> > > From: "Jens Axboe" 
> > > To: "CAI Qian" 
> > > Cc: "Dave Chinner" , x...@oss.sgi.com, "LKML" 
> > > 
> > > Sent: Tuesday, April 2, 2013 5:00:47 PM
> > > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5 
> > > running xfstests case #78]
> > > 
> > > On Tue, Apr 02 2013, CAI Qian wrote:
> > > > 
> > > > 
> > > > - Original Message -
> > > > > From: "Jens Axboe" 
> > > > > To: "Dave Chinner" 
> > > > > Cc: "CAI Qian" , x...@oss.sgi.com, "LKML"
> > > > > 
> > > > > Sent: Tuesday, April 2, 2013 3:30:35 PM
> > > > > Subject: Re: Loopback device hung [was Re: xfs deadlock on 3.9-rc5
> > > > > running xfstests case #78]
> > > > > 
> > > > > On Tue, Apr 02 2013, Jens Axboe wrote:
> > > > > > On Tue, Apr 02 2013, Dave Chinner wrote:
> > > > > > > [Added jens Axboe to CC]
> > > > > > > 
> > > > > > > On Tue, Apr 02, 2013 at 02:08:49AM -0400, CAI Qian wrote:
> > > > > > > > Saw on almost all the servers range from x64, ppc64 and s390x 
> > > > > > > > with
> > > > > > > > kernel
> > > > > > > > 3.9-rc5 and xfsprogs-3.1.10. Never caught this in 3.9-rc4, so 
> > > > > > > > looks
> > > > > > > > like
> > > > > > > > something new broke this. Log is here with sysrq debug info.
> > > > > > > > http://people.redhat.com/qcai/stable/log
> > > > > > 
> > > > > > CAI Qian, can you try and back the below out and test again?
> > > > > 
> > > > > Nevermind, it's clearly that one. The below should improve the
> > > > > situation, but it's not pretty. A better fix would be to allow
> > > > > auto-deletion even if PART_NO_SCAN is set.
> > > > Jens, when compiled the mainline (up to fefcdbe) with this patch,
> > > > it error-ed out,
> > > 
> > > Looks like I sent the wrong one, updated below.
> > The patch works well. Thanks!
> 
> Thanks for testing! I don't particularly like this stuff in loop,
> though. It's quite nasty and depends on other behaviour. It would be
> prettier if we just had rescan_partitions() do the right thing, and only
> drop partitions and not rescan if NO_PART_SCAN is set.
> 
> Ala the below, dropping the loop change and implementing that change in
> the core code. Phillip, can you check whether this does the right thing
> for your bug too?

Phillip? I'm going to revert the loop change asap, so if you want this
fixed for 3.10, it's about that time to test it out.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] writeback: convert writeback to unbound workqueue

2013-04-03 Thread Jens Axboe
On Tue, Apr 02 2013, Tejun Heo wrote:
> Hello, Jens.
> 
> On Tue, Apr 2, 2013 at 2:53 AM, Jens Axboe  wrote:
> > Thanks, pulled in for testing. We'll need the debug change in before
> > sending this upstream, though. I agree with Jan/Dave that this is
> > required functionality, for debugging purposes.
> 
> Yeah, sure thing. The debug changes are targeting the coming merge
> window but they'll have to go through -mm instead of block or
> workqueue trees due to dependencies on arch changes.  So, AFAICS,
> while the routing is rather complicated, things should fall in place
> in the same merge window.

OK, that will work. Thanks, Tejun.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: avoid using uninitialized value in from queue_var_store

2013-04-03 Thread Jens Axboe
On Wed, Apr 03 2013, Arnd Bergmann wrote:
> As found by gcc-4.8, the QUEUE_SYSFS_BIT_FNS macro creates functions
> that use a value generated by queue_var_store independent of whether
> that value was set or not.
> 
> block/blk-sysfs.c: In function 'queue_store_nonrot':
> block/blk-sysfs.c:244:385: warning: 'val' may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
> 
> Unlike most other such warnings, this one is not a false positive,
> writing any non-number string into the sysfs files indeed has
> an undefined result, rather than returning an error.

Huh indeed, thanks Arnd. Queued up.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] mtip32xx: recovery improvement and fix for a crash

2013-04-03 Thread Jens Axboe
On Wed, Apr 03 2013, Asai Thambi S P wrote:
> Hi Jens,
> 
> This patchset includes the following. It was generated against your 
> for-3.9/drivers
> 
> * improved recovery for command timeout
> * fix for a crash during rmmod
> * add new debugfs entry 'device_status'
> 
> 
> Asai Thambi S P (3):
>   mtip32xx: recovery from command timeout
>   mtip32xx: return 0 from pci probe in case of rebuild
>   mtip32xx: Add debugfs entry device_status

Thanks Asai, applied for current series.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Block IO core bits for 3.9

2013-02-28 Thread Jens Axboe
Hi Linus,

Below are the core block IO bits for 3.9. It was delayed a few days
since my workstation kept crashing every 2-8h after pulling it into
current -git, but turns out it is a bug in the new pstate code (divide
by zero, will report separately). In any case, it contains:

- The big cfq/blkcg update from Tejun and and Vivek.

- Additional block and writeback tracepoints from Tejun.

- Improvement of the should sort (based on queues) logic in the plug
  flushing.

- _io() variants of the wait_for_completion() interface, using
  io_schedule() instead of schedule() to contribute to io wait properly.

- Various little fixes.

You'll get two trivial merge conflicts, which should be easy enough to
fix up.

Please pull!

  git://git.kernel.dk/linux-block.git for-3.9/core




Cong Ding (1):
  drivers/block/swim3.c: fix null pointer dereference

Glauber Costa (1):
  cfq: fix lock imbalance with failed allocations

Guo Chao (2):
  block: use i_size_write() in bd_set_size()
  block: remove redundant check to bd_openers()

Jens Axboe (1):
  Merge branch 'blkcg-cfq-hierarchy' of git://git.kernel.org/.../tj/cgroup 
into for-3.9/core

Jianpeng Ma (1):
  block: Remove should_sort judgement when flush blk_plug

Mikulas Patocka (1):
  block: don't select PERCPU_RWSEM

Sasha Levin (1):
  block,elevator: use new hashtable implementation

Tejun Heo (24):
  blkcg: fix minor bug in blkg_alloc()
  blkcg: reorganize blkg_lookup_create() and friends
  blkcg: cosmetic updates to blkg_create()
  blkcg: make blkcg_gq's hierarchical
  cfq-iosched: add leaf_weight
  cfq-iosched: implement cfq_group->nr_active and ->children_weight
  cfq-iosched: implement hierarchy-ready cfq_group charge scaling
  cfq-iosched: convert cfq_group_slice() to use cfqg->vfraction
  cfq-iosched: enable full blkcg hierarchy support
  blkcg: add blkg_policy_data->plid
  blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online
  blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/
  blkcg: export __blkg_prfill_rwstat()
  blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
  block: RCU free request_queue
  blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
  cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
  cfq-iosched: collect stats from dead cfqgs
  cfq-iosched: add hierarchical cfq_group statistics
  block: add missing block_bio_complete() tracepoint
  block: add @req to bio_{front|back}_merge tracepoints
  buffer: make touch_buffer() an exported function
  block: add block_{touch|dirty}_buffer tracepoint
  writeback: add more tracepoints

Vivek Goyal (6):
  cfq-iosched: Properly name all references to IO class
  cfq-iosched: More renaming to better represent wl_class and wl_type
  cfq-iosched: Rename "service_tree" to "st" at some places
  cfq-iosched: Rename few functions related to selecting workload
  cfq-iosched: Get rid of unnecessary local variable
  cfq-iosched: Print sync-noidle information in blktrace messages

Vladimir Davydov (2):
  sched: add wait_for_completion_io[_timeout]
  block: account iowait time when waiting for completion of IO request

 Documentation/block/cfq-iosched.txt|  58 +++
 Documentation/cgroups/blkio-controller.txt |  35 +-
 block/Kconfig  |   1 -
 block/blk-cgroup.c | 277 +++--
 block/blk-cgroup.h |  68 +++-
 block/blk-core.c   |  18 +-
 block/blk-exec.c   |   4 +-
 block/blk-flush.c  |   2 +-
 block/blk-lib.c|   6 +-
 block/blk-sysfs.c  |   9 +-
 block/blk.h|   2 +-
 block/cfq-iosched.c| 629 +++--
 block/elevator.c   |  23 +-
 drivers/block/swim3.c  |   5 +-
 drivers/md/dm.c|   1 -
 drivers/md/raid5.c |  11 +-
 fs/bio.c   |   2 +
 fs/block_dev.c |   6 +-
 fs/buffer.c|  10 +
 fs/fs-writeback.c  |  16 +-
 include/linux/blkdev.h |   3 +-
 include/linux/blktrace_api.h   |   1 +
 include/linux/buffer_head.h|   2 +-
 include/linux/completion.h |   3 +
 include/linux/elevator.h   |   5 +-
 include/trace/events/block.h   | 104 -
 include/trace/events/writeback.h   | 116 ++
 kernel/sched/core.c|  57 ++-
 kernel/trace/blktrace.c|  28 +-
 mm/page-writeback.c|   2 +
 30 file

[GIT PULL] Block driver bits for 3.9

2013-02-28 Thread Jens Axboe
Hi Linus,

After the block IO core bits are in, please grab the driver updates from
below as well. It contains:

- Fix ancient regression in dac960. Nobody must be using that anymore...

- Some good fixes from Guo Ghao for loop, fixing both potential oopses
  and deadlocks.

- Improve mtip32xx for NUMA systems, by being a bit more clever in
  distributing work.

- Add IBM RamSan 70/80 driver. A second round of fixes for that is
  pending, that will come in through for-linus during the 3.9 cycle as
  per usual.

- A few xen-blk{back,front} fixes from Konrad and Roger.

- Other minor fixes and improvements.


Please pull!

  git://git.kernel.dk/linux-block.git for-3.9/drivers



Asai Thambi S P (2):
  mtip32xx: Add workqueue and NUMA support
  mtip32xx: add trim support

Dan Carpenter (1):
  dac960: return success instead of -ENOTTY

Fengguang Wu (2):
  drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 
'mtip_workq_sdbf0' was not declared. Should it be static?
  drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' 
was not declared. Should it be static?

Guo Chao (5):
  loopdev: fix a deadlock
  loopdev: update block device size in loop_set_status()
  loopdev: move common code into loop_figure_size()
  loopdev: remove an user triggerable oops
  loopdev: ignore negative offset when calculate loop device size

Heiko Carstens (1):
  drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency

Jan Beulich (1):
  xen-blkback: do not leak mode property

Jens Axboe (3):
  rsxx: add slab.h include to dma.c
  Merge branch 'delete-xt-disk' of git://git.kernel.org/.../paulg/linux 
into for-3.9/drivers
  Merge branch 'stable/for-jens-3.9' of git://git.kernel.org/.../konrad/xen 
into for-3.9/drivers

Konrad Rzeszutek Wilk (2):
  xen/blkback: Don't trust the handle from the frontend.
  xen-blkfront: drop the use of llist_for_each_entry_safe

Paul Gortmaker (1):
  block: delete super ancient PC-XT driver for 1980's hardware

Philip J Kelleher (1):
  block: IBM RamSan 70/80 driver fixes

Roger Pau Monne (1):
  xen-blkback: use balloon pages for persistent grants

Stephen Rothwell (1):
  block: remove new __devinit/exit annotations on ramsam driver

josh.h.mor...@us.ibm.com (1):
  block: IBM RamSan 70/80 device driver

 MAINTAINERS |6 +
 drivers/block/DAC960.c  |1 +
 drivers/block/Kconfig   |   23 +-
 drivers/block/Makefile  |3 +-
 drivers/block/loop.c|   61 +-
 drivers/block/mtip32xx/Kconfig  |2 +-
 drivers/block/mtip32xx/mtip32xx.c   |  431 +++---
 drivers/block/mtip32xx/mtip32xx.h   |   48 +-
 drivers/block/rsxx/Makefile |2 +
 drivers/block/rsxx/config.c |  213 +++
 drivers/block/rsxx/core.c   |  649 
 drivers/block/rsxx/cregs.c  |  758 +++
 drivers/block/rsxx/dev.c|  367 
 drivers/block/rsxx/dma.c|  998 +++
 drivers/block/rsxx/rsxx.h   |   45 ++
 drivers/block/rsxx/rsxx_cfg.h   |   72 +++
 drivers/block/rsxx/rsxx_priv.h  |  399 +
 drivers/block/xd.c  | 1123 ---
 drivers/block/xd.h  |  134 -
 drivers/block/xen-blkback/blkback.c |7 +-
 drivers/block/xen-blkback/xenbus.c  |   49 +-
 drivers/block/xen-blkfront.c|   13 +-
 include/linux/llist.h   |   25 -
 23 files changed, 3986 insertions(+), 1443 deletions(-)
 create mode 100644 drivers/block/rsxx/Makefile
 create mode 100644 drivers/block/rsxx/config.c
 create mode 100644 drivers/block/rsxx/core.c
 create mode 100644 drivers/block/rsxx/cregs.c
 create mode 100644 drivers/block/rsxx/dev.c
 create mode 100644 drivers/block/rsxx/dma.c
 create mode 100644 drivers/block/rsxx/rsxx.h
 create mode 100644 drivers/block/rsxx/rsxx_cfg.h
 create mode 100644 drivers/block/rsxx/rsxx_priv.h
 delete mode 100644 drivers/block/xd.c
 delete mode 100644 drivers/block/xd.h

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.9

2013-02-19 Thread Jens Axboe
On Tue, Feb 19 2013, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> Please git pull the following branch:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-3.9
> 
> which has bug-fixes that did not make it in v3.8. They all are marked as
> material for the stable tree as well. There are two bug-fixes for
> the code that has been in there for some time (that is the Jan's fix
> and one of mine). And there are two bug-fixes for the persistent grant
> feature that debuted in v3.8 for xen blk[back|front]end.

Konrad, the tree is against 3.8, so I'm pulling it into my for-linus
branch which will be pushed a bit after the initial for-3.9/drivers goes
out. I can't pull it into the latter without getting a whole lot of
extra changes too.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET] writeback: convert writeback to unbound workqueue

2013-03-12 Thread Jens Axboe
On Thu, Mar 07 2013, Tejun Heo wrote:
> Hello,
> 
> There's no reason for writeback to implement its own worker pool when
> using workqueue is much simpler and more efficient.  This patchset
> replaces writeback's custom worker pool with unbound workqueue and
> also exports it to userland using WQ_SYSFS so that it can be tuned
> from userland as requested a couple releases ago.
> 
> This patchset contains the following four patches.
> 
>  0001-implement-current_is_workqueue_rescuer.patch
>  0002-writeback-remove-unused-bdi_pending_list.patch
>  0003-writeback-replace-custom-worker-pool-implementation-.patch
>  0004-writeback-expose-the-bdi_wq-workqueue.patch
> 
> 0001-0002 are prep patches.  0003 does the conversion.  0004 makes
> bdi_wq visible to userland.
> 
> This patchset is on top of v3.9-rc1 + "workqueue: implement workqueue
> with custom worker attributes" patchset[1] and available in the
> following git branch.
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git 
> review-writeback-conversion

I like it, diffstat looks nice too :-)

Have you done any performance testing, or just functional verification?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch 01/02v2] block: IBM FlashSystem 70/80 EEH Support.

2013-03-16 Thread Jens Axboe
On Fri, Mar 15 2013, Philip J. Kelleher wrote:
> From: Philip J Kelleher 
> 
> Adding in EEH support to the IBM FlashSystem 70/80 device driver.
> 
> Signed-off-by: Philip J Kelleher 
> ---
> Changes in v2 include:
> o Fixed spelling of guarantee.
> o Fixed potential memory leak if slot reset fails out.
> o Changed list_for_each_entry_safe with list_for_each_entry.

Applied, thanks Philip.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Fix bad range check in bio_sector_offset

2012-08-29 Thread Jens Axboe
On 2012-08-28 11:03, Martin K. Petersen wrote:
> 
> DM would occasionally end up splitting data integrity-enabled requests
> incorrectly. The culprit was a bad range check in bio_sector_offset.
> 
> Signed-off-by: Martin K. Petersen 
> Cc: 
> 
> diff --git a/fs/bio.c b/fs/bio.c
> index 9bfade8..b9a6744 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1552,8 +1552,8 @@ sector_t bio_sector_offset(struct bio *bio, unsigned 
> short index,
>   sector_sz = queue_logical_block_size(bio->bi_bdev->bd_disk->queue);
>   sectors = 0;
>  
> - if (index >= bio->bi_idx)
> - index = bio->bi_vcnt - 1;
> + if (index > bio->bi_vcnt)
> + return 0;
>  
>   __bio_for_each_segment(bv, bio, i, 0) {
>   if (i == index) {

Good catch, merged.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mtip32xx: fix user_buffer check in exec_drive_command

2012-09-12 Thread Jens Axboe
On 2012-09-12 21:06, David Milburn wrote:
> Current user_buffer check is incorrect and causes hdparm to fail
> 
> # hdparm -I /dev/rssda
>  HDIO_DRIVE_CMD(identify) failed: Input/output error
> 
> /dev/rssda:
> 
> Patching linux-3.6-rc5 hdparm works as expected
> 
> # hdparm -I /dev/rssda
> /dev/rssda:
> 
> ATA device, with non-removable media
>   Model Number:   DELL_P320h-MTFDGAL350SAH
>   Serial Number:  121302025F01
>   Firmware Revision:  B1442808
> 
> 
> Reported-by: Tomas Henzl 
> Signed-off-by: David Milburn 
> ---
>  drivers/block/mtip32xx/mtip32xx.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/block/mtip32xx/mtip32xx.c 
> b/drivers/block/mtip32xx/mtip32xx.c
> index a8fddeb..b24efe3 100644
> --- a/drivers/block/mtip32xx/mtip32xx.c
> +++ b/drivers/block/mtip32xx/mtip32xx.c
> @@ -1900,7 +1900,7 @@ static int exec_drive_command(struct mtip_port *port, 
> u8 *command,
>   int rv = 0, xfer_sz = command[3];
>  
>   if (xfer_sz) {
> - if (user_buffer)
> + if (!user_buffer)
>   return -EFAULT;
>  
>   buf = dmam_alloc_coherent(&port->dd->pdev->dev,

Thanks, that's clearly a bug. Applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/9] block/blk-tag.c: Remove useless kfree

2012-09-12 Thread Jens Axboe
On 2012-09-12 17:06, Peter Senna Tschudin wrote:
> From: Peter Senna Tschudin 
> 
> Remove useless kfree() and clean up code related to the removal.

Thanks, nice cleanup! Applied to for-3.7/core

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] unexport add_disk_randomness

2008-01-30 Thread Jens Axboe
On Wed, Jan 30 2008, Adrian Bunk wrote:
> This patch removes the no longer used EXPORT_SYMBOL(add_disk_randomness).
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

Thanks, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] io-scheduler : spinlock deadlock

2008-01-31 Thread Jens Axboe
On Thu, Jan 31 2008, Dave Young wrote:
> While building kernel, lockdep detected spinlock deaklock and after a
> while the system hung.
> 
> Attached is the screenshot.

Fixed by commit 149a051f82d2b3860fe32fa182dbc83a66274894 yesterday, just
updated your kernel.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH retry] bluetooth : add conn add/del workqueues to avoid connection fail

2008-01-31 Thread Jens Axboe
On Wed, Jan 30 2008, Dave Young wrote:
> 
> The bluetooth hci_conn sysfs add/del executed in the default workqueue.
> If the del_conn is executed after the new add_conn with same target,
> add_conn will failed with warning of "same kobject name".
> 
> Here add btaddconn & btdelconn workqueues,
> flush the btdelconn workqueue in the add_conn function to avoid the issue.
> 
> Signed-off-by: Dave Young <[EMAIL PROTECTED]> 
> 
> ---
> diff -upr a/net/bluetooth/hci_sysfs.c b/net/bluetooth/hci_sysfs.c
> --- a/net/bluetooth/hci_sysfs.c   2008-01-30 10:14:27.0 +0800
> +++ b/net/bluetooth/hci_sysfs.c   2008-01-30 10:14:14.0 +0800
> @@ -12,6 +12,8 @@
>  #undef  BT_DBG
>  #define BT_DBG(D...)
>  #endif
> +static struct workqueue_struct *btaddconn;
> +static struct workqueue_struct *btdelconn;
>  
>  static inline char *typetostr(int type)
>  {
> @@ -279,6 +281,7 @@ static void add_conn(struct work_struct 
>   struct hci_conn *conn = container_of(work, struct hci_conn, work);
>   int i;
>  
> + flush_workqueue(btdelconn);
>   if (device_add(&conn->dev) < 0) {
>   BT_ERR("Failed to register connection device");
>   return;
> @@ -313,6 +316,7 @@ void hci_conn_add_sysfs(struct hci_conn 
>  
>   INIT_WORK(&conn->work, add_conn);
>  
> +     queue_work(btaddconn, &conn->work);
>   schedule_work(&conn->work);
>  }

So you queue &conn->work on both btaddconn and keventd_wq?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

2008-01-31 Thread Jens Axboe
On Thu, Jan 31 2008, Nai Xia wrote:
> My dmesg relevant info is quite similar:
> 
> [6.875041] Freeing unused kernel memory: 320k freed
> [8.143120] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> [8.144439]
> [8.144439] sector 10824201199534213, nr/cnr 0/0
> [8.144439] bio cf029280, biotail cf029280, buffer , data
> , len 158
> [8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [8.144439] backup: data_len=158  bi_size=158
> [8.160756] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> [8.160756]
> [8.160756] sector 2669858, nr/cnr 0/0
> [8.160756] bio cf029300, biotail cf029300, buffer , data
> , len 158
> [8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> [8.160756] backup: data_len=158  bi_size=158
> [   14.851101] eth0: link up
> [   27.121883] eth0: no IPv6 routers present
> 
> 
> And by the way, Kiyoshi,
> This can be reproduced in a typical setup vmware workstation 6.02 with
> a vritual IDE cdrom,
> in case you wanna catch that with your own eyes. :-)
> Thanks for your trying hard to correct this annoying bug.

The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time for
eg user issued commands where you don't know an exact byte count.

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
*drive)
 */
if ((stat & DRQ_STAT) == 0) {
spin_lock_irqsave(&ide_lock, flags);
-   if (__blk_end_request(rq, 0, 0))
+   if (__blk_end_request(rq, 0, rq->data_len))
BUG();
    HWGROUP(drive)->rq = NULL;
spin_unlock_irqrestore(&ide_lock, flags);

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

2008-01-31 Thread Jens Axboe
On Thu, Jan 31 2008, Florian Lohoff wrote:
> On Thu, Jan 31, 2008 at 02:05:58PM +0100, Jens Axboe wrote:
> > The below fix should be enough. It's perfectly legal to have leftover
> > byte counts when the drive signals completion, happens all the time for
> > eg user issued commands where you don't know an exact byte count.
> > 
> > diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
> > index 74c6087..bee05a3 100644
> > --- a/drivers/ide/ide-cd.c
> > +++ b/drivers/ide/ide-cd.c
> > @@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t 
> > *drive)
> >  */
> > if ((stat & DRQ_STAT) == 0) {
> > spin_lock_irqsave(&ide_lock, flags);
> > -   if (__blk_end_request(rq, 0, 0))
> > +   if (__blk_end_request(rq, 0, rq->data_len))
> > BUG();
> > HWGROUP(drive)->rq = NULL;
> > spin_unlock_irqrestore(&ide_lock, flags);
> > 
> 
> Fixes the crash on boot for me ...

Great, thanks for confirming that. I'll make sure the patch goes
upstream today, if Linus is available.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 && -g8561b089

2008-01-31 Thread Jens Axboe



On 31/01/2008, at 18.04, Kiyoshi Ueda <[EMAIL PROTECTED]> wrote:


Hi Jens,

On Thu, 31 Jan 2008 14:05:58 +0100, Jens Axboe wrote:

On Thu, Jan 31 2008, Nai Xia wrote:

My dmesg relevant info is quite similar:

[6.875041] Freeing unused kernel memory: 320k freed
[8.143120] ide-cd: rq still having bio: dev hdc: type=2,  
flags=114c8

[8.144439]
[8.144439] sector 10824201199534213, nr/cnr 0/0
[8.144439] bio cf029280, biotail cf029280, buffer , data
, len 158
[8.144439] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.144439] backup: data_len=158  bi_size=158
[8.160756] ide-cd: rq still having bio: dev hdc: type=2,  
flags=114c8

[8.160756]
[8.160756] sector 2669858, nr/cnr 0/0
[8.160756] bio cf029300, biotail cf029300, buffer , data
, len 158
[8.160756] cdb: 12 01 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
[8.160756] backup: data_len=158  bi_size=158
[   14.851101] eth0: link up
[   27.121883] eth0: no IPv6 routers present


And by the way, Kiyoshi,
This can be reproduced in a typical setup vmware workstation 6.02  
with

a vritual IDE cdrom,
in case you wanna catch that with your own eyes. :-)
Thanks for your trying hard to correct this annoying bug.


The below fix should be enough. It's perfectly legal to have leftover
byte counts when the drive signals completion, happens all the time  
for

eg user issued commands where you don't know an exact byte count.

diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74c6087..bee05a3 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -1722,7 +1722,7 @@ static ide_startstop_t cdrom_newpc_intr 
(ide_drive_t *drive)

*/
   if ((stat & DRQ_STAT) == 0) {
   spin_lock_irqsave(&ide_lock, flags);
-if (__blk_end_request(rq, 0, 0))
+if (__blk_end_request(rq, 0, rq->data_len))
   BUG();
   HWGROUP(drive)->rq = NULL;
   spin_unlock_irqrestore(&ide_lock, flags);


OK, I undarstand the leftover is legal.

By the way, is it safe to always return success if there is a  
leftover?

I thought we might have to complete the rq with -EIO in such case.


data_len being non zero should pass the residual count back to the  
issuer. 
--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] as_merged_requests(): possible recursive locking detected

2008-02-01 Thread Jens Axboe
On Thu, Jan 31 2008, Ingo Molnar wrote:
> 
> 
> Jens,
> 
> AS still has some locking issues - see the lockdep warning below that 
> the x86 test-rig just triggered. Config attached. Never saw this one 
> before. Can send more info if needed.
> 
>   Ingo
> 
> -->
> udev: renamed network interface eth0_rename to eth1
> 
> =
> [ INFO: possible recursive locking detected ]
> 2.6.24 #183
> -
> vol_id/1769 is trying to acquire lock:
>  (&ret->lock#2){.+..}, at: [] as_merged_requests+0xa7/0x110
> 
> but task is already holding lock:
>  (&ret->lock#2){.+..}, at: [] as_merged_requests+0x9f/0x110
> 
> other info that might help us debug this:
> 2 locks held by vol_id/1769:
>  #0:  (&q->__queue_lock){.+..}, at: [] 
> __make_request+0x5f/0x3fd
>  #1:  (&ret->lock#2){.+..}, at: [] 
> as_merged_requests+0x9f/0x110
> 
> stack backtrace:
> Pid: 1769, comm: vol_id Not tainted 2.6.24 #183
> 
> Call Trace:
>  [] print_deadlock_bug+0xcb/0xd6
>  [] check_deadlock+0x50/0x60
>  [] validate_chain+0x1ed/0x289
>  [] __lock_acquire+0x547/0x608
>  [] ? as_merged_requests+0xa7/0x110
>  [] lock_acquire+0x99/0xc6
>  [] ? as_merged_requests+0xa7/0x110
>  [] _spin_lock+0x34/0x41
>  [] as_merged_requests+0xa7/0x110
>  [] elv_merge_requests+0x28/0x51
>  [] attempt_merge+0xf5/0x14b
>  [] attempt_back_merge+0x27/0x30
>  [] __make_request+0x180/0x3fd
>  [] generic_make_request+0x355/0x390
>  [] ? create_empty_buffers+0xa0/0xa9
>  [] submit_bio+0xfe/0x107
>  [] submit_bh+0xe7/0x10b
>  [] block_read_full_page+0x289/0x2a5
>  [] ? blkdev_get_block+0x0/0x4c
>  [] ? add_to_page_cache+0xa1/0xd3
>  [] blkdev_readpage+0x13/0x15
>  [] read_pages+0x81/0xa1
>  [] __do_page_cache_readahead+0x195/0x1b8
>  [] ? find_get_page+0x58/0x64
>  [] ondemand_readahead+0xa1/0x155
>  [] page_cache_sync_readahead+0x17/0x19
>  [] do_generic_mapping_read+0xa8/0x372
>  [] ? file_read_actor+0x0/0x1ac
>  [] generic_file_aio_read+0x125/0x164
>  [] do_sync_read+0xeb/0x132
>  [] ? mark_held_locks+0x59/0x75
>  [] ? autoremove_wake_function+0x0/0x38
>  [] ? __lock_release+0x5b/0x64
>  [] ? mutex_unlock+0x9/0xb
>  [] ? __mutex_unlock_slowpath+0x10e/0x139
>  [] ? trace_hardirqs_on+0xfe/0x128
>  [] vfs_read+0xa4/0xe3
>  [] sys_read+0x47/0x6f
>  [] system_call_after_swapgs+0x8a/0x8f

Are you sure this triggered with the as fixup in place? It looks like
the same bug.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] as_merged_requests(): possible recursive locking detected

2008-02-01 Thread Jens Axboe
On Fri, Feb 01 2008, Ingo Molnar wrote:
> 
> * Jens Axboe <[EMAIL PROTECTED]> wrote:
> 
> > Are you sure this triggered with the as fixup in place? It looks like 
> > the same bug.
> 
> most definitely a separate bug.

yeah, I didn't read it carefully enough. Nikanth found the reason.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] as_merged_requests(): possible recursive locking detected

2008-02-01 Thread Jens Axboe
On Fri, Feb 01 2008, Nikanth Karthikesan wrote:
> On Thu, 2008-01-31 at 23:14 +0100, Ingo Molnar wrote:
> > 
> > Jens,
> > 
> > AS still has some locking issues - see the lockdep warning below that 
> > the x86 test-rig just triggered. Config attached. Never saw this one 
> > before. Can send more info if needed.
> > 
> 
> The io_contexts are swapped. And while swapping, the locks were also
> getting swapped, which will change the order of locking after that. This
> may be the cause of these warning. I am not sure whether not swapping
> the locks is the right way to fix this. Using a field of spinlock_t
> itself to order locking might be better, instead of the address of the
> container.
> 
> Now while adding a new member to io_context, one should not forget to
> add it here. Also copying whole io_context and then restoring the locks
> might have a window where this warning could be triggered.

Oops, the locks should definitely be left alone. It's not just the
locking order, but also it would confuse lockdep.

> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index 6d16755..b9c6e39 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -179,9 +179,32 @@ EXPORT_SYMBOL(copy_io_context);
>  void swap_io_context(struct io_context **ioc1, struct io_context **ioc2)
>  {
>   struct io_context *temp;
> +
> + /*
> +  * Do not swap the locks to preserve locking order
> +  */
> +
>   temp = *ioc1;
> - *ioc1 = *ioc2;
> - *ioc2 = temp;
> +
> + (*ioc1)->refcount = (*ioc2)->refcount;
> + (*ioc1)->nr_tasks = (*ioc2)->nr_tasks;
> + (*ioc1)->ioprio = (*ioc2)->ioprio;
> + (*ioc1)->ioprio_changed = (*ioc2)->ioprio_changed;
> + (*ioc1)->last_waited = (*ioc2)->last_waited;
> + (*ioc1)->nr_batch_requests = (*ioc2)->nr_batch_requests;
> + (*ioc1)->aic = (*ioc2)->aic;
> + (*ioc1)->radix_root = (*ioc2)->radix_root;
> + (*ioc1)->ioc_data = (*ioc2)->ioc_data;
> +
> + (*ioc2)->refcount = (temp)->refcount;
> + (*ioc2)->nr_tasks = (temp)->nr_tasks;
> + (*ioc2)->ioprio = (temp)->ioprio;
> + (*ioc2)->ioprio_changed = (temp)->ioprio_changed;
> + (*ioc2)->last_waited = (temp)->last_waited;
> + (*ioc2)->nr_batch_requests = (temp)->nr_batch_requests;
> + (*ioc2)->aic = (temp)->aic;
> + (*ioc2)->radix_root = (temp)->radix_root;
> + (*ioc2)->ioc_data = (temp)->ioc_data;
>  }
>  EXPORT_SYMBOL(swap_io_context);

Ugh, that's pretty horrible. How about moving the lock first in the
struct and just doing memcpy()? Still ugly, but better.

I think the right solution is to remove swap_io_context() and fix the io
context referencing in as-iosched.c instead.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [bug] as_merged_requests(): possible recursive locking detected

2008-02-01 Thread Jens Axboe
On Fri, Feb 01 2008, Jens Axboe wrote:
> On Fri, Feb 01 2008, Nikanth Karthikesan wrote:
> > On Thu, 2008-01-31 at 23:14 +0100, Ingo Molnar wrote:
> > > 
> > > Jens,
> > > 
> > > AS still has some locking issues - see the lockdep warning below that 
> > > the x86 test-rig just triggered. Config attached. Never saw this one 
> > > before. Can send more info if needed.
> > > 
> > 
> > The io_contexts are swapped. And while swapping, the locks were also
> > getting swapped, which will change the order of locking after that. This
> > may be the cause of these warning. I am not sure whether not swapping
> > the locks is the right way to fix this. Using a field of spinlock_t
> > itself to order locking might be better, instead of the address of the
> > container.
> > 
> > Now while adding a new member to io_context, one should not forget to
> > add it here. Also copying whole io_context and then restoring the locks
> > might have a window where this warning could be triggered.
> 
> Oops, the locks should definitely be left alone. It's not just the
> locking order, but also it would confuse lockdep.
> 
> > diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> > index 6d16755..b9c6e39 100644
> > --- a/block/blk-ioc.c
> > +++ b/block/blk-ioc.c
> > @@ -179,9 +179,32 @@ EXPORT_SYMBOL(copy_io_context);
> >  void swap_io_context(struct io_context **ioc1, struct io_context **ioc2)
> >  {
> > struct io_context *temp;
> > +
> > +   /*
> > +* Do not swap the locks to preserve locking order
> > +*/
> > +
> > temp = *ioc1;
> > -   *ioc1 = *ioc2;
> > -   *ioc2 = temp;
> > +
> > +   (*ioc1)->refcount = (*ioc2)->refcount;
> > +   (*ioc1)->nr_tasks = (*ioc2)->nr_tasks;
> > +   (*ioc1)->ioprio = (*ioc2)->ioprio;
> > +   (*ioc1)->ioprio_changed = (*ioc2)->ioprio_changed;
> > +   (*ioc1)->last_waited = (*ioc2)->last_waited;
> > +   (*ioc1)->nr_batch_requests = (*ioc2)->nr_batch_requests;
> > +   (*ioc1)->aic = (*ioc2)->aic;
> > +   (*ioc1)->radix_root = (*ioc2)->radix_root;
> > +   (*ioc1)->ioc_data = (*ioc2)->ioc_data;
> > +
> > +   (*ioc2)->refcount = (temp)->refcount;
> > +   (*ioc2)->nr_tasks = (temp)->nr_tasks;
> > +   (*ioc2)->ioprio = (temp)->ioprio;
> > +   (*ioc2)->ioprio_changed = (temp)->ioprio_changed;
> > +   (*ioc2)->last_waited = (temp)->last_waited;
> > +   (*ioc2)->nr_batch_requests = (temp)->nr_batch_requests;
> > +   (*ioc2)->aic = (temp)->aic;
> > +   (*ioc2)->radix_root = (temp)->radix_root;
> > +   (*ioc2)->ioc_data = (temp)->ioc_data;
> >  }
> >  EXPORT_SYMBOL(swap_io_context);
> 
> Ugh, that's pretty horrible. How about moving the lock first in the
> struct and just doing memcpy()? Still ugly, but better.
> 
> I think the right solution is to remove swap_io_context() and fix the io
> context referencing in as-iosched.c instead.

IOW, the below. I don't know why Nick originally wanted to swap io
contexts for a rq <-> rq merge, there seems little (if any) benefit to
doing so.

diff --git a/block/as-iosched.c b/block/as-iosched.c
index 9603684..852803e 100644
--- a/block/as-iosched.c
+++ b/block/as-iosched.c
@@ -1266,22 +1266,8 @@ static void as_merged_requests(struct request_queue *q, 
struct request *req,
 */
if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
if (time_before(rq_fifo_time(next), rq_fifo_time(req))) {
-   struct io_context *rioc = RQ_IOC(req);
-   struct io_context *nioc = RQ_IOC(next);
-
list_move(&req->queuelist, &next->queuelist);
rq_set_fifo_time(req, rq_fifo_time(next));
-   /*
-* Don't copy here but swap, because when anext is
-* removed below, it must contain the unused context
-*/
-   if (rioc != nioc) {
-   double_spin_lock(&rioc->lock, &nioc->lock,
-   rioc < nioc);
-   swap_io_context(&rioc, &nioc);
-   double_spin_unlock(&rioc->lock, &nioc->lock,
-   rioc < nioc);
-   }
}
}
 
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 6d16755..80245dc 100644
--- a/block/blk-ioc.c
+++ b/block/

Re: [bug] as_merged_requests(): possible recursive locking detected

2008-02-01 Thread Jens Axboe
On Fri, Feb 01 2008, Ingo Molnar wrote:
> 
> * Jens Axboe <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, Feb 01 2008, Ingo Molnar wrote:
> > > 
> > > * Jens Axboe <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Are you sure this triggered with the as fixup in place? It looks like 
> > > > the same bug.
> > > 
> > > most definitely a separate bug.
> > 
> > yeah, I didn't read it carefully enough. Nikanth found the reason.
> 
> /me processes his mbox some more and sees the mails
> 
> am i right that lockdep complained about real lockup potential here? 
> (i.e. it caught a real bug) So there's no need to change anything on the 
> lockdep side, right?

Right, no bug in lockdep, the locking code and swap_io_context() are
just screwed up.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] direct IO submission and completion scalability issues

2008-02-04 Thread Jens Axboe
On Sun, Feb 03 2008, Nick Piggin wrote:
> On Fri, Jul 27, 2007 at 06:21:28PM -0700, Suresh B wrote:
> > 
> > Second experiment which we did was migrating the IO submission to the
> > IO completion cpu. Instead of submitting the IO on the same cpu where the
> > request arrived, in this experiment  the IO submission gets migrated to the
> > cpu that is processing IO completions(interrupt). This will minimize the
> > access to remote cachelines (that happens in timers, slab, scsi layers). The
> > IO submission request is forwarded to the kblockd thread on the cpu 
> > receiving
> > the interrupts. As part of this, we also made kblockd thread on each cpu as 
> > the
> > highest priority thread, so that IO gets submitted as soon as possible on 
> > the
> > interrupt cpu with out any delay. On x86_64 SMP platform with 16 cores, this
> > resulted in 2% performance improvement and 3.3% improvement on two node ia64
> > platform.
> > 
> > Quick and dirty prototype patch(not meant for inclusion) for this io 
> > migration
> > experiment is appended to this e-mail.
> > 
> > Observation #1 mentioned above is also applicable to this experiment. CPU's
> > processing interrupts will now have to cater IO submission/processing
> > load aswell.
> > 
> > Observation #2: This introduces some migration overhead during IO 
> > submission.
> > With the current prototype, every incoming IO request results in an IPI and
> > context switch(to kblockd thread) on the interrupt processing cpu.
> > This issue needs to be addressed and main challenge to address is
> > the efficient mechanism of doing this IO migration(how much batching to do 
> > and
> > when to send the migrate request?), so that we don't delay the IO much and 
> > at
> > the same point, don't cause much overhead during migration.
> 
> Hi guys,
> 
> Just had another way we might do this. Migrate the completions out to
> the submitting CPUs rather than migrate submission into the completing
> CPU.
> 
> I've got a basic patch that passes some stress testing. It seems fairly
> simple to do at the block layer, and the bulk of the patch involves
> introducing a scalable smp_call_function for it.
> 
> Now it could be optimised more by looking at batching up IPIs or
> optimising the call function path or even mirating the completion event
> at a different level...
> 
> However, this is a first cut. It actually seems like it might be taking
> slightly more CPU to process block IO (~0.2%)... however, this is on my
> dual core system that shares an llc, which means that there are very few
> cache benefits to the migration, but non-zero overhead. So on multisocket
> systems hopefully it might get to positive territory.

That's pretty funny, I did pretty much the exact same thing last week!
The primary difference between yours and mine is that I used a more
private interface to signal a softirq raise on another CPU, instead of
allocating call data and exposing a generic interface. That put the
locking in blk-core instead, turning blk_cpu_done into a structure with
a lock and list_head instead of just being a list head, and intercepted
at blk_complete_request() time instead of waiting for an already raised
softirq on that CPU.

Didn't get around to any performance testing yet, though. Will try and
clean it up a bit and do that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] direct IO submission and completion scalability issues

2008-02-04 Thread Jens Axboe
On Mon, Feb 04 2008, Nick Piggin wrote:
> On Mon, Feb 04, 2008 at 11:12:44AM +0100, Jens Axboe wrote:
> > On Sun, Feb 03 2008, Nick Piggin wrote:
> > > On Fri, Jul 27, 2007 at 06:21:28PM -0700, Suresh B wrote:
> > > 
> > > Hi guys,
> > > 
> > > Just had another way we might do this. Migrate the completions out to
> > > the submitting CPUs rather than migrate submission into the completing
> > > CPU.
> > > 
> > > I've got a basic patch that passes some stress testing. It seems fairly
> > > simple to do at the block layer, and the bulk of the patch involves
> > > introducing a scalable smp_call_function for it.
> > > 
> > > Now it could be optimised more by looking at batching up IPIs or
> > > optimising the call function path or even mirating the completion event
> > > at a different level...
> > > 
> > > However, this is a first cut. It actually seems like it might be taking
> > > slightly more CPU to process block IO (~0.2%)... however, this is on my
> > > dual core system that shares an llc, which means that there are very few
> > > cache benefits to the migration, but non-zero overhead. So on multisocket
> > > systems hopefully it might get to positive territory.
> > 
> > That's pretty funny, I did pretty much the exact same thing last week!
> 
> Oh nice ;)
> 
> 
> > The primary difference between yours and mine is that I used a more
> > private interface to signal a softirq raise on another CPU, instead of
> > allocating call data and exposing a generic interface. That put the
> > locking in blk-core instead, turning blk_cpu_done into a structure with
> > a lock and list_head instead of just being a list head, and intercepted
> > at blk_complete_request() time instead of waiting for an already raised
> > softirq on that CPU.
> 
> Yeah I was looking at that... didn't really want to add the spinlock
> overhead to the non-migration case. Anyway, I guess that sort of
> fine implementation details is going to have to be sorted out with
> results.

As Andi mentions, we can look into making that lockless. For the initial
implementation I didn't really care, just wanted something to play with
that would nicely allow me to control both the submit and complete side
of the affinity issue.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: soft lock failure on FC8

2008-02-04 Thread Jens Axboe
On Mon, Feb 04 2008, Andrew Morton wrote:
> On Mon, 4 Feb 2008 17:52:16 +0530 "Dragon kumar" <[EMAIL PROTECTED]> wrote:
> 
> > Hi Andrew,
> > 
> > I am not able to boot 2.6.24-mm1 kernel on x86_64 machine with FC8. I
> > am attaching config file and call trace also with this mail.
> > 
> > 
> > [  921.273592] BUG: soft lockup - CPU#0 stuck for 61s! [scsi_scan_0:473]
> > [  921.273601] CPU 0:
> > [  921.273601] Modules linked in: scsi_wait_scan tg3 shpchp
> > pci_hotplug aacraid sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd
> > ssb ehci_hcd usbcore
> > [  921.273601] Pid: 473, comm: scsi_scan_0 Not tainted 2.6.24-mm1-autokern1 
> > #1
> > [  921.273601] RIP: 0010:[]  []
> > radix_tree_gang_lookup+0xe1/0x139
> > [  921.273601] RSP: 0018:81007d99bda8  EFLAGS: 0293
> > [  921.273601] RAX:  RBX: 81007d99bde0 RCX: 
> > 
> > [  921.273601] RDX:  RSI: 81007d8f2690 RDI: 
> > 000c
> > [  921.273601] RBP: 0001 R08: 81007d8dce0c R09: 
> > 0001
> > [  921.273601] R10: 81007d90caf0 R11: 000d R12: 
> > 81000106ac00
> > [  921.273601] R13: 8100808ed000 R14: 81007d99a000 R15: 
> > 8077eef0
> > [  921.273601] FS:  () GS:8057e000()
> > knlGS:
> > [  921.273601] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> > [  921.273601] CR2: 006167d0 CR3: 00201000 CR4: 
> > 06e0
> > [  921.273601] DR0:  DR1:  DR2: 
> > 
> > [  921.273601] DR3:  DR6: 0ff0 DR7: 
> > 0400
> > [  921.273601]
> > [  921.273601] Call Trace:
> > [  921.273601]  [] ? call_for_each_cic+0x72/0x104
> > [  921.273601]  [] ? cfq_exit_single_io_context+0x0/0x4e
> > [  921.273601]  [] ? cfq_exit_io_context+0x18/0x1a
> > [  921.273601]  [] ? exit_io_context+0x101/0x111
> > [  921.273601]  [] ? do_exit+0x794/0x7c1
> > [  921.273601]  [] ? child_rip+0x11/0x12
> > [  921.273601]  [] ? restore_args+0x0/0x30
> > [  921.273601]  [] ? kthreadd+0x17d/0x1a2
> > [  921.273601]  [] ? kthread+0x0/0x77
> > [  921.273601]  [] ? child_rip+0x0/0x12
> > [  921.273601]
> > 
> 
> At a guess I'd say that call_for_each_cic() is failing to advance across
> the radix-tree and got stuck.

I'd say that's a good guess, I don't see how else it could get stuck
looping forever.

> 
> Could you please apply this debug patch and retest?
> 
> Thanks.
> 
> diff -puN block/cfq-iosched.c~a block/cfq-iosched.c
> --- a/block/cfq-iosched.c~a
> +++ a/block/cfq-iosched.c
> @@ -1159,6 +1159,7 @@ call_for_each_cic(struct io_context *ioc
>  
>   do {
>   int i;
> + unsigned long next_index;
>  
>   /*
>* Perhaps there's a better way - this just gang lookups from
> @@ -1171,8 +1172,13 @@ call_for_each_cic(struct io_context *ioc
>   break;
>  
>   called += nr;
> - index = 1 + (unsigned long) cics[nr - 1]->key;
> -
> + next_index = 1 + (unsigned long) cics[nr - 1]->key;
> + if (next_index <= index) {
> + printk("next_index=%lu, index=%lu\n",
> + next_index, index);
> + dump_stack();
> + }
> + index = next_index;
>   for (i = 0; i < nr; i++)
>   func(ioc, cics[i]);
>   } while (nr == CIC_GANG_NR);
> _
> 

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] direct IO submission and completion scalability issues

2008-02-04 Thread Jens Axboe
On Mon, Feb 04 2008, Zach Brown wrote:
> [ ugh, still jet lagged. ]
> 
> > Hi Nick,
> > 
> > When Matthew was describing this work at an LCA presentation (not
> > sure whether you were at that presentation or not), Zach came up
> > with the idea that allowing the submitting application control the
> > CPU that the io completion processing was occurring would be a good
> > approach to try.  That is, we submit a "completion cookie" with the
> > bio that indicates where we want completion to run, rather than
> > dictating that completion runs on the submission CPU.
> > 
> > The reasoning is that only the higher level context really knows
> > what is optimal, and that changes from application to application.
> > The "complete on the submission CPU" policy _may_ be more optimal
> > for database workloads, but it is definitely suboptimal for XFS and
> > transaction I/O completion handling because it simply drags a bunch
> > of global filesystem state around between all the CPUs running
> > completions. In that case, we really only want a single CPU to be
> > handling the completions.
> > 
> > (Zach - please correct me if I've missed anything)
> 
> Yeah, I think Nick's patch (and Jens' approach, presumably) is just the
> sort of thing we were hoping for when discussing this during Matthew's talk.
> 
> I was imagining the patch a little bit differently (per-cpu tasks, do a
> wake_up from the driver instead of cpu nr testing up in blk, work
> queues, whatever), but we know how to iron out these kinds of details ;).

per-cpu tasks/wq's might be better, it's a little awkward to jump
through hoops

> > Looking at your patch - if you turn it around so that the
> > "submission CPU" field can be specified as the "completion cpu" then
> > I think the patch will expose the policy knobs needed to do the
> > above.
> 
> Yeah, that seems pretty straight forward.
> 
> We might need some logic for noticing that the desired cpu has been
> hot-plugged away while the IO was in flight, it occurs to me.

the softirq completion stuff already handles cpus going away, at least
with my patch that stuff works fine (with a dead flag added).

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [rfc] direct IO submission and completion scalability issues

2008-02-05 Thread Jens Axboe
On Mon, Feb 04 2008, Arjan van de Ven wrote:
> Jens Axboe wrote:
> >>I was imagining the patch a little bit differently (per-cpu tasks, do a
> >>wake_up from the driver instead of cpu nr testing up in blk, work
> >>queues, whatever), but we know how to iron out these kinds of details ;).
> >
> >per-cpu tasks/wq's might be better, it's a little awkward to jump
> >through hoops
> >
> 
> one caveat btw; when the multiqueue storage hw becomes available for Linux,
> we need to figure out how to deal with the preference thing; since there
> honoring a "non-logical" preference would be quite expensive (it means

non-local?

> you can't make the local submit queues lockless etc etc), so before we
> go down the road of having widespread APIs for this stuff.. we need to
> make sure we're not going to do something that's going to be really
> stupid 6 to 18 months down the road.

As far as I'm concerned, so far this is just playing around with
affinity (and to some extents taking it too far, on purpose). For
instance, my current patch can move submissions and completions
independently, with a set mask or by 'binding' a request to a CPU. Most
of that doesn't make sense. 'complete on the same CPU, if possible'
makes sense and would fit fine with multi-queue hw.

Moving submissions at the block layer to a defined set of CPUs is a bit
silly imho, it's pretty costly and it's a lot more sane simply bind the
submitters instead. So if you can set irq affinity, then just make the
submitters follow that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] blk_end_request: full I/O completion handler

2008-02-05 Thread Jens Axboe
On Tue, Feb 05 2008, S, Chandrakala (STSD) wrote:
> Hello,
> 
> We would like to know in which kernel version these patches are
> available.  

They were merged after 2.6.24 was released, so they will show up in the
2.6.25 kernel.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] drbd-8.3 updates

2012-10-03 Thread Jens Axboe
On 2012-10-02 15:02, Philipp Reisner wrote:
> Hi Jens,
> 
> Please consider to pull these changes for the 3.7 merge window.
> 
> Best,
>  Phil
> 
> The following changes since commit a0d271cbfed1dd50278c6b06bead3d00ba0a88f9:
> 
>   Linux 3.6 (2012-09-30 16:47:46 -0700)
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens
> 
> for you to fetch changes up to a783d564a1badbb87b3f96aa8df581ed4167a9c9:
> 
>   drbd: log request sector offset and size for IO errors (2012-10-02 14:52:24 
> +0200)

Not pulled. Two reasons:

- It's late (in the merge window)
- and it's not based off my for-3.7/drivers branch, hence I get a ton of
  unrelated changes with a pull into that branch.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] drbd-8.3 updates

2012-10-03 Thread Jens Axboe
On 2012-10-03 12:07, Philipp Reisner wrote:
> Am Mittwoch, 3. Oktober 2012, 11:24:09 schrieben Sie:
>>> Not pulled. Two reasons:
>>>
>>> - It's late (in the merge window)
>>> - and it's not based off my for-3.7/drivers branch, hence I get a ton of
>>>
>>>   unrelated changes with a pull into that branch.
>>
>> Hi Jens,
>>
>> I can rebase it for you in a few hours. Would influence this your decission?
>>
> 
> Hi Jens,
> 
> Is there a convenient way for me to find our when it is the right time
> to send pull requests your way? (i.e. a notification when you create your
> for-3.x/drivers branch)

The right time is anytime between -rc1 and -rcN for the previous
release, where N is the last released -rc for that series. IOW, I should
have it before the next merge window opens, not a days into that window.

> Rebasing it on your drivers tree was trivial, here is the updated pull
> request:
> 
> The following changes since commit fab74e7a8f0f8d0af2356c28aa60d55f9e6f5f8b:
> 
>   loop: Make explicit loop device destruction lazy (2012-09-28 10:42:23 +0200)
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens
> 
> for you to fetch changes up to 61e8114a682b0e868696f8363ed03e5fd4c750d1:
> 
>   drbd: log request sector offset and size for IO errors (2012-10-03 11:54:45 
> +0200)

Thanks, one question before this is pulled in:

> Philipp Reisner (6):
>   drbd: Add a drbd directory to sysfs
>   drbd: expose the data generation identifiers via sysfs

What are these? It's sitting in /sys/block//drbd/, I don't see any
documentation or justification for that.

Why isn't it off in debugfs or similar instead?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] drbd-8.3 updates

2012-10-03 Thread Jens Axboe
On 2012-10-03 15:49, Philipp Reisner wrote:
>> Thanks, one question before this is pulled in:
>>> Philipp Reisner (6):
>>>   drbd: Add a drbd directory to sysfs
>>>   drbd: expose the data generation identifiers via sysfs
>>
>> What are these? It's sitting in /sys/block//drbd/, I don't see any
>> documentation or justification for that.
>>
>> Why isn't it off in debugfs or similar instead?
> 
> The long-time goal is to get rid of the /proc/drbd virtual file, and
> present the information that was there in a more structured way in /sys.
> 
> This patch adds a very first step into that direction. Later we intend to
> have here things like the connections state, device roles, statistics
> counters there.
> 
> When coming up with the layout we used the sysfs presence of software raid 
> as example.

Software raid is different, though, in that it's a class/type of device.
So for drbd, I'd still recommend you look outside of the regular
/sys/block hierarchy for adding something like this.

> I have removed it from this pull-request, so that there is more time for
> consideration before the next merge window.

Thanks, pulled.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch/rfc/rft] sd: allocate request_queue on device's local numa node

2012-10-23 Thread Jens Axboe
On 2012-10-23 19:42, Bart Van Assche wrote:
> On 10/23/12 18:52, Jeff Moyer wrote:
>> Bart Van Assche  writes:
>>> Please keep in mind that a
>>> single PCIe bus may have a minimal distance to more than one NUMA
>>> node. See e.g. the diagram at the top of page 8 in
>>> http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c03261871/c03261871.pdf
>>> for a system diagram of a NUMA system where each PCIe bus has a
>>> minimal distance to two different NUMA nodes.
>>
>> That's an interesting configuration.  I wonder what the numa_node sysfs
>> file contains for such systems--do you know?  I'm not sure how we could
>> allow this to be user-controlled at probe time.  Did you have a specific
>> mechanism in mind?  Module parameters?  Something else?
> 
> As far as I can see in drivers/pci/pci-sysfs.c the numa_node sysfs 
> attribute contains a single number, even for a topology like the one 
> described above.

This is an artifact of how ACPI works, it's not possible to have it be a
mask of nodes. But obviously that is how most intel based systems from
the last few years works, so the kernel parts should be updated to at
least allow it to be a mask. How to get this information is a separate
problem.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 033/193] block: remove CONFIG_EXPERIMENTAL

2012-10-23 Thread Jens Axboe
On 2012-10-23 22:01, Kees Cook wrote:
> This config item has not carried much meaning for a while now and is
> almost always enabled by default. As agreed during the Linux kernel
> summit, remove it.
> 
> CC: Jens Axboe 
> Signed-off-by: Kees Cook 
> ---
>  block/Kconfig |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index 09acf1b..a7e40a7 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -89,7 +89,7 @@ config BLK_DEV_INTEGRITY
>  
>  config BLK_DEV_THROTTLING
>   bool "Block layer bio throttling support"
> - depends on BLK_CGROUP=y && EXPERIMENTAL
> + depends on BLK_CGROUP=y
>   default n
>   ---help---
>   Block layer bio throttling support. It can be used to limit

No worries on my end, EXPERIMENTAL seems to mean very little these days.
I have applied 33 and 38.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] block: move blk_queue_bypass_{start,end} to include/linux/blkdev.h

2012-10-25 Thread Jens Axboe
On 2012-10-25 11:41, Jun'ichi Nomura wrote:
> [PATCH] block: move blk_queue_bypass_{start,end} to include/linux/blkdev.h
> 
> dm wants to use those functions to control the bypass status of
> half-initialized device.
> 
> This patch is a preparation for:
>   [PATCH] dm: stay in blk_queue_bypass until queue becomes initialized

Applied. The previous state of private blk.h declaration but the symbols
exported was wrong, too.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the block tree with Linus' tree

2012-10-31 Thread Jens Axboe
On 2012-10-31 14:55, Ben Hutchings wrote:
> On Wed, 2012-10-31 at 13:04 +1100, Stephen Rothwell wrote:
>> Hi Jens,
>>
>> Today's linux-next merge of the block tree got a conflict in
>> drivers/block/floppy.c between a set of common patches from Linus' tree
>> and commit b33d002f4b6b ("genhd: Make put_disk() safe for disks that have
>> not been registered") from the block tree.
>>
>> I fixed it up (by using the block tree version) and can carry the fix as
>> necessary (no action is required).
> 
> commit b33d002f4b6b ("genhd: Make put_disk() safe for disks that have
> not been registered") was supposed to be dropped or reverted; I don't
> why it's hanging around.

It's my for-next branch that wasn't updated yet. Will do that now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] 4 fixes for drbd

2012-11-01 Thread Jens Axboe
On Mon, Oct 29 2012, Philipp Reisner wrote:
> Hi Jens,
> 
> please consider to pull these fixes in. It is based on your
> for-3.7/drivers branch.
> 
> Best regards,
>  Phil
> 
> 
> The following changes since commit 34a73dd594699dc3834167297a74c43948bb6e41:
> 
>   Revert "memstick: add support for legacy memorysticks" (2012-10-10 16:13:26 
> -0600)
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens
> 
> for you to fetch changes up to 731d4596a6f3ba41418a0b11018c453456c51d92:
> 
>   drbd: check return of kmalloc in receive_uuids (2012-10-29 13:18:13 +0100)

Can you rebase this against for-3.8/drivers? Thanks.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] 4 fixes for drbd

2012-11-05 Thread Jens Axboe
On 2012-11-05 15:35, Philipp Reisner wrote:
>>
>> Can you rebase this against for-3.8/drivers? Thanks.
> 
> Hi Jens,
> 
> One of these changes fixes a regression that was introduced in the last
> merge window. Please consider to pull this single commit.
> (I will rebase the other 3 commit on for-3.8/drivers)
> 
> 
> The following changes since commit a13c29ddf73d3be4fbb2b1bbced64014986cd87a:
>   Jens Axboe (1):
> Merge branch 'for-jens' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/jikos/linux-block into 
> for-3.7/drivers
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens
> 
> Lars Ellenberg (1):
>   drbd: fix regression: potential NULL pointer dereference
> 
>  drivers/block/drbd/drbd_int.h  |5 +
>  drivers/block/drbd/drbd_main.c |8 ++--
>  2 files changed, 11 insertions(+), 2 deletions(-)

Sure, of course I'll take a regression fix. But the fix does not apply
to current tree:

axboe@nelson:/src/linux-block $ patch -p1 --dry-run < ~/1.txt 
patching file drivers/block/drbd/drbd_int.h
Hunk #1 FAILED at 2545.
1 out of 1 hunk FAILED -- saving rejects to file 
drivers/block/drbd/drbd_int.h.rej
patching file drivers/block/drbd/drbd_main.c
Hunk #1 FAILED at 4232.
1 out of 1 hunk FAILED -- saving rejects to file drivers/block/drbd/drbd_main

Regression fixes for the current tree should be based on the current
tree, looks like you used an outdated for-3.7/drivers branch and this
fix depends on other fixes.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: drbd-8.3 fixes

2012-07-30 Thread Jens Axboe
On 07/24/2012 04:24 PM, Philipp Reisner wrote:
> Hi Jens,
> 
> Please consider to pull these changes for the 3.6 merge window.
> I did not found a for-3.6/drivers branch, so I based the patches
> on the 3.5 release.
> 
> Best,
>  Phil
> 
> 
> The following changes since commit 28a33cbc24e4256c143dce96c7d93bf423229f92:
> 
>   Linux 3.5 (2012-07-21 13:58:29 -0700)
> 
> are available in the git repository at:
> 
>   git://git.drbd.org/linux-drbd.git for-jens
> 
> for you to fetch changes up to a73ff3231df59a4b92ccd0dd4e73897c5822489b:
> 
>   drbd: announce FLUSH/FUA capability to upper layers (2012-07-24 15:14:28 
> +0200)
> 
> 
> Lars Ellenberg (10):
>   drbd: cleanup, remove two unused global flags
>   drbd: differentiate between normal and forced detach
>   drbd: report congestion if we are waiting for some userland callback
>   drbd: reset congestion information before reporting it in /proc/drbd
>   drbd: do not reset rs_pending_cnt too early
>   drbd: call local-io-error handler early
>   drbd: fix potential access after free
>   drbd: flush drbd work queue before invalidate/invalidate remote 
> 
>   drbd: fix max_bio_size to be unsigned   
> 
>   drbd: announce FLUSH/FUA capability to upper layers
> 
>  drivers/block/drbd/drbd_actlog.c   |8 +++--
>  drivers/block/drbd/drbd_bitmap.c   |4 +--
>  drivers/block/drbd/drbd_int.h  |   44 
>  drivers/block/drbd/drbd_main.c |   65 
> +++-
>  drivers/block/drbd/drbd_nl.c   |   36 +++-
>  drivers/block/drbd/drbd_proc.c |3 ++
>  drivers/block/drbd/drbd_receiver.c |   38 +++--
>  drivers/block/drbd/drbd_req.c  |    9 +++--
>  drivers/block/drbd/drbd_worker.c   |   12 ++-
>  9 files changed, 153 insertions(+), 66 deletions(-)

Pulled, thanks.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] block/nbd: micro-optimization in nbd request completion

2012-07-30 Thread Jens Axboe
On 07/30/2012 05:02 PM, chetan loke wrote:
> On Wed, Jun 6, 2012 at 1:20 PM, Paul Clements
>  wrote:
>> Makes sense. Looks good to me.
>>
>> Acked-by: paul.cleme...@steeleye.com
>>
>> On Wed, Jun 6, 2012 at 10:15 AM, Chetan Loke  wrote:
>>>
>>> Add in-flight cmds to the tail. That way while searching(during request 
>>> completion),we will always get a hit on the first element.
>>>
> 
> Paul/Jens,
> 
> Looks like this one isn't merged yet.

I've merged it now.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT] floppy

2012-07-31 Thread Jens Axboe
On 07/31/2012 11:46 AM, Jiri Kosina wrote:
> On Wed, 18 Jul 2012, Jiri Kosina wrote:
> 
>> please pull from
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/jikos/floppy.git upstream
>>
>> to receive the patch below that should go into 3.6. Thanks.
>>
>> Andi Kleen (1):
>>   floppy: Run floppy initialization asynchronous
>>
>>  drivers/block/floppy.c |   21 -
>>  1 files changed, 20 insertions(+), 1 deletions(-)
> 
> Jens,
> 
> as you haven't seem to have pulled yet, I am sending an updated request 
> with minor cleanup from Fengguang.
> 
> Please pull from
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jikos/floppy.git upstream
> 
> to receive the floppy patches below.
> 
> Andi Kleen (1):
>   floppy: Run floppy initialization asynchronous
> 
> Fengguang Wu (1):
>   floppy: remove duplicated flag FD_RAW_NEED_DISK

I have pulled, it's in my for-3.6/drivers branch since I sent my reply.
Updated now with Wu's patch too.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/3] block: Add test-iosched scheduler

2012-07-31 Thread Jens Axboe
On 07/31/2012 04:36 PM, me...@codeaurora.org wrote:
> Hi Jens,
> 
> Do you have comments on this patch?
> Can we push it to kernel 3.6 version?

I have questions - what is this good for? In other words, explain to me
why this is useful code. And in particular why this cannot be done from
userspace with bsg and block tracing?

IOW, I'm dubious as to whether a new _io scheduler_ is the correct
solution to the problem you have at hand.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/14] userns: Convert loop to use kuid_t instead of uid_t

2012-09-20 Thread Jens Axboe
On 2012-09-21 02:28, Eric W. Biederman wrote:
> From: "Eric W. Biederman" 
> 
> Cc: Signed-off-by: Jens Axboe 
> Acked-by: Serge Hallyn 
> Signed-off-by: Eric W. Biederman 
> ---
>  drivers/block/loop.c |4 ++--
>  include/linux/loop.h |2 +-
>  init/Kconfig |1 -
>  3 files changed, 3 insertions(+), 4 deletions(-)

Thanks Eric, queued for 3.7.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/14] userns: Convert loop to use kuid_t instead of uid_t

2012-09-21 Thread Jens Axboe
On 2012-09-21 09:07, Eric W. Biederman wrote:
> Jens Axboe  writes:
> 
>> On 2012-09-21 02:28, Eric W. Biederman wrote:
>>> From: "Eric W. Biederman" 
>>>
>>> Cc: Signed-off-by: Jens Axboe 
>>> Acked-by: Serge Hallyn 
>>> Signed-off-by: Eric W. Biederman 
>>> ---
>>>  drivers/block/loop.c |4 ++--
>>>  include/linux/loop.h |2 +-
>>>  init/Kconfig |1 -
>>>  3 files changed, 3 insertions(+), 4 deletions(-)
>>
>> Thanks Eric, queued for 3.7.
> 
> Would it be a problem if I also merged throug my user-namespace.git tree?
> 
> That was the original plan and it is handy to keep all of the patches
> together.
> 
> Not that it matters much in this case.

Not at all, go ahead. It'll be trivial to resolve any merge conflict due
to this.

BTW, this:

Cc: Signed-off-by: Jens Axboe 

in your original patch needs fixing up. I'm assuming it's a copy-paste
error and meant to be a Cc, since I haven't signed-off on it.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()

2012-09-21 Thread Jens Axboe
On 09/20/2012 11:08 PM, Tejun Heo wrote:
> b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
> request_queues bypassed on allocation to avoid switching on and off
> bypass mode on a queue being initialized.  Some drivers allocate and
> then destroy a lot of queues without fully initializing them and
> incurring bypass latency overhead on each of them could add upto
> significant overhead.
> 
> Unfortunately, blk_init_allocated_queue() is never used by queues of
> bio-based drivers, which means that all bio-based driver queues are in
> bypass mode even after initialization and registration complete
> successfully.
> 
> Due to the limited way request_queues are used by bio drivers, this
> problem is hidden pretty well but it shows up when blk-throttle is
> used in combination with a bio-based driver.  Trying to configure
> (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
> indefinitely in blkg_conf_prep() waiting for bypass mode to end.
> 
> This patch moves the initial blk_queue_bypass_end() call from
> blk_init_allocated_queue() to blk_register_queue() which is called for
> any userland-visible queues regardless of its type.
> 
> I believe this is correct because I don't think there is any block
> driver which needs or wants working elevator and blk-cgroup on a queue
> which isn't visible to userland.  If there are such users, we need a
> different solution.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Joseph Glanville 
> Cc: Vivek Goyal 
> Cc: sta...@vger.kernel.org
> ---
> Jens, while these are fixes, I think it isn't extremely urgent and
> routing these through 3.7-rc1 should be enough.

Agree, I'll shove them into for-3.7/core

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Query of zram/zsmalloc promotion

2012-09-21 Thread Jens Axboe
On 2012-09-21 18:41, Konrad Rzeszutek Wilk wrote:
> On Wed, Sep 12, 2012 at 11:39:14AM +0900, Minchan Kim wrote:
>> Hi all,
>>
>> I would like to promote zram/zsmalloc from staging tree.
>> I already tried it https://lkml.org/lkml/2012/8/8/37 but I didn't get
>> any response from you guys.
>>
>> I think zram/zsmalloc's code qulity is good and they
>> are used for many embedded vendors for a long time.
>> So it's proper time to promote them.
>>
>> The zram should put on under driver/block/. I think it's not
>> arguable but the issue is which directory we should keep *zsmalloc*.
>>
>> Now Nitin want to keep it with zram so it would be in driver/blocks/zram/
>> But I don't like it because zsmalloc touches several fields of struct page
>> freely(and AFAIRC, Andrew had a same concern with me) so I want to put
>> it under mm/.
> 
> I like the idea of keeping it in /lib or /mm. Actually 'lib' sounds more
> appropriate since it is dealing with storing a bunch of pages in a nice
> layout for great density purposes.
>>
>> In addtion, now zcache use it, too so it's rather awkward if we put it
>> under dirver/blocks/zram/.
>>
>> So questions.
>>
>> To Andrew:
>> Is it okay to put it under mm/ ? Or /lib?
>>
>> To Jens:
>> Is it okay to put zram under drvier/block/ If you are okay, I will start 
>> sending
>> patchset after I sort out zsmalloc's location issue.
> 
> I would think it would be OK.

We can certainly put it in drivers/block, I have no issue with that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: export trace_block_unplug

2012-09-25 Thread Jens Axboe
On 09/25/2012 01:57 AM, NeilBrown wrote:
> 
> Hi Jens,
>  is there any chance this can be in the next merge window?  I'm
> adding block tracing to md and found I need another export.

No problem, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: makes bio_split support bio without data

2012-09-25 Thread Jens Axboe
On 09/24/2012 06:56 AM, NeilBrown wrote:
> 
> Hi Jens,
>  this patch has been sitting in my -next tree for a little while and I was
>  hoping for it to go in for the next merge window.
>  It simply allows bio_split() to be used on bios without a payload, such as
>  'discard'.
>  Are you happy with it going in though my 'md' tree, or would you rather take
>  it though your 'block' tree?

It should go through my tree, especially since we've got conflicts with
other changes. In other words, your patch does not apply to for-3.7/core
as-is...

Shaohua, could you resend an updated variant?

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Jens Axboe
On 2012-09-25 19:49, Jeff Moyer wrote:
> Jeff Moyer  writes:
> 
>> Mikulas Patocka  writes:
>>
>>> Hi Jeff
>>>
>>> Thanks for testing.
>>>
>>> It would be interesting ... what happens if you take the patch 3, leave 
>>> "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
>>> block_device", but remove any use of the semaphore from fs/block_dev.c? - 
>>> will the performance be like unpatched kernel or like patch 3? It could be 
>>> that the change in the alignment affects performance on your CPU too, just 
>>> differently than on my CPU.
>>
>> It turns out to be exactly the same performance as with the 3rd patch
>> applied, so I guess it does have something to do with cache alignment.
>> Here is the patch (against vanilla) I ended up testing.  Let me know if
>> I've botched it somehow.
>>
>> So, I next up I'll play similar tricks to what you did (padding struct
>> block_device in all kernels) to eliminate the differences due to
>> structure alignment and provide a clear picture of what the locking
>> effects are.
> 
> After trying again with the same padding you used in the struct
> bdev_inode, I see no performance differences between any of the
> patches.  I tried bumping up the number of threads to saturate the
> number of cpus on a single NUMA node on my hardware, but that resulted
> in lower IOPS to the device, and hence consumption of less CPU time.
> So, I believe my results to be inconclusive.
> 
> After talking with Vivek about the problem, he had mentioned that it
> might be worth investigating whether bd_block_size could be protected
> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
> updating both the bd_block_size and the inode->i_blkbits at the same
> time.  It would involve (afaiui) adding fields to both the inode and the
> block_device data structures and using rcu_assign_pointer  and
> rcu_dereference to modify and access the fields, and both fields would
> need to protected by the same struct srcu_struct.  I'm not sure whether
> that's a desirable approach.  When I started to implement it, it got
> ugly pretty quickly.  What do others think?
> 
> For now, my preference is to get the full patch set in.  I will continue
> to investigate the performance impact of the data structure size changes
> that I've been seeing.
> 
> So, for the four patches:
> 
> Acked-by: Jeff Moyer 
> 
> Jens, can you have a look at the patch set?  We are seeing problem
> reports of this in the wild[1][2].

I'll queue it up for 3.7. I can run my regular testing on the 8-way, it
has a nack for showing scaling problems very nicely in aio/dio. As long
as we're not adding per-inode cache line dirtying per IO (and the
per-cpu rw sem looks OK), then I don't think there's too much to worry
about.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Jens Axboe
On 2012-09-25 19:59, Jens Axboe wrote:
> On 2012-09-25 19:49, Jeff Moyer wrote:
>> Jeff Moyer  writes:
>>
>>> Mikulas Patocka  writes:
>>>
>>>> Hi Jeff
>>>>
>>>> Thanks for testing.
>>>>
>>>> It would be interesting ... what happens if you take the patch 3, leave 
>>>> "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
>>>> block_device", but remove any use of the semaphore from fs/block_dev.c? - 
>>>> will the performance be like unpatched kernel or like patch 3? It could be 
>>>> that the change in the alignment affects performance on your CPU too, just 
>>>> differently than on my CPU.
>>>
>>> It turns out to be exactly the same performance as with the 3rd patch
>>> applied, so I guess it does have something to do with cache alignment.
>>> Here is the patch (against vanilla) I ended up testing.  Let me know if
>>> I've botched it somehow.
>>>
>>> So, I next up I'll play similar tricks to what you did (padding struct
>>> block_device in all kernels) to eliminate the differences due to
>>> structure alignment and provide a clear picture of what the locking
>>> effects are.
>>
>> After trying again with the same padding you used in the struct
>> bdev_inode, I see no performance differences between any of the
>> patches.  I tried bumping up the number of threads to saturate the
>> number of cpus on a single NUMA node on my hardware, but that resulted
>> in lower IOPS to the device, and hence consumption of less CPU time.
>> So, I believe my results to be inconclusive.
>>
>> After talking with Vivek about the problem, he had mentioned that it
>> might be worth investigating whether bd_block_size could be protected
>> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
>> updating both the bd_block_size and the inode->i_blkbits at the same
>> time.  It would involve (afaiui) adding fields to both the inode and the
>> block_device data structures and using rcu_assign_pointer  and
>> rcu_dereference to modify and access the fields, and both fields would
>> need to protected by the same struct srcu_struct.  I'm not sure whether
>> that's a desirable approach.  When I started to implement it, it got
>> ugly pretty quickly.  What do others think?
>>
>> For now, my preference is to get the full patch set in.  I will continue
>> to investigate the performance impact of the data structure size changes
>> that I've been seeing.
>>
>> So, for the four patches:
>>
>> Acked-by: Jeff Moyer 
>>
>> Jens, can you have a look at the patch set?  We are seeing problem
>> reports of this in the wild[1][2].
> 
> I'll queue it up for 3.7. I can run my regular testing on the 8-way, it
> has a nack for showing scaling problems very nicely in aio/dio. As long
> as we're not adding per-inode cache line dirtying per IO (and the
> per-cpu rw sem looks OK), then I don't think there's too much to worry
> about.

I take that back. The series doesn't apply to my current tree. Not too
unexpected, since it's some weeks old. But more importantly, please send
this is a "real" patch series. I don't want to see two implementations
of rw semaphores. I think it's perfectly fine to first do a regular rw
sem, then a last patch adding the cache friendly variant from Eric and
converting to that.

In other words, get rid of 3/4.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Jens Axboe
On 2012-09-26 00:49, Mikulas Patocka wrote:
> Here I'm resending it as two patches. The first one uses existing 
> semaphore, the second converts it to RCU-based percpu semaphore.

Thanks, applied. In the future, please send new patch 'series' as a new
thread instead of replying to some email in the middle of an existing
thread. It all becomes very messy pretty quickly. Patch #2 had a botched
subject line, so that didn't help either :-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.7

2012-09-26 Thread Jens Axboe
On 2012-09-26 15:58, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> I've one more patch (a small change) that I was hoping you could
> pull in your v3.7 branch.
> 
> The branch is:
>  git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git 
> stable/for-jens-3.7
> 
> and has this tiny patch:
> 
> Oliver Chick (1):
>   xen/blkback: Change xen_vbd's flush_support and discard_secure to have 
> type unsigned int, rather than bool

Pulled.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] block driver bits for 3.7

2012-10-29 Thread Jens Axboe
Hi Linus,

Later than I would have liked, but I was traveling around the merge
window and then a fixes snuck in past that and I wanted to ensure that
everything was fully baked again. There was a particular issue around a
floppy regression due to the genhd change, which has now gotten
reverted. So without further ado, here are the driver updates queued up
for 3.7. It contains:

- Updates to the Micron mtip32xx real-ssd card from Micron.
- A few fixes for loop, from Dave Chinner and Eric Biederman.
- A round of floppy fixes, and a subsequent revert of genhd
  part since it caused regressions.
- The usual round of drbd updates/fixes.
- A single xen-blkback fixup.
- A few other minor fixes and/or code cleanups.
- memstick driver and later revert, this isn't well reviewed
  enough yet to go directly in.

Please pull!

  git://git.kernel.dk/linux-block.git for-3.7/drivers

for you to fetch changes up to a13c29ddf73d3be4fbb2b1bbced64014986cd87a:

  Merge branch 'for-jens' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/linux-block into 
for-3.7/drivers (2012-10-29 19:20:25 +0100)



Akinobu Mita (1):
  cciss: select CONFIG_CHECK_SIGNATURE

Asai Thambi S P (6):
  mtip32xx: Add support for new devices
  mtip32xx: Handle NCQ commands during the security locked state
  mtip32xx: Increase timeout for standby command
  mtip32xx: Proper reporting of write protect status on big-endian
  mtip32xx: Change printk to pr_
  mtip32xx: Remove dead code

Ben Hutchings (1):
  genhd: Make put_disk() safe for disks that have not been registered

Dave Chinner (1):
  loop: Make explicit loop device destruction lazy

David Milburn (1):
  mtip32xx: fix user_buffer check in exec_drive_command

Eric W. Biederman (1):
  userns: Convert loop to use kuid_t instead of uid_t

Herton Ronaldo Krzesinski (5):
  floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop
  floppy: do put_disk on current dr if blk_init_queue fails
  floppy: properly handle failure on add_disk loop
  floppy: use common function to check if floppies can be registered
  floppy: remove dr, reuse drive on do_floppy_init

Jens Axboe (6):
  Merge branch 'stable/for-jens-3.7' of git://git.kernel.org/.../konrad/xen 
into for-3.7/drivers
  Merge branch 'for-jens' of git://git.kernel.org/.../jikos/linux-block 
into for-3.7/drivers
  Merge branch 'stable/for-jens-3.7' of git://git.kernel.org/.../konrad/xen 
into for-3.7/drivers
  Merge branch 'for-3.7/core' into for-3.7/drivers
  Revert "memstick: add support for legacy memorysticks"
  Merge branch 'for-jens' of git://git.kernel.org/.../jikos/linux-block 
into for-3.7/drivers

Jiri Kosina (3):
  pktcdvd: update MAINTAINERS
  Merge branches 'floppy' and 'pktcdvd' into for-jens
  Revert "genhd: Make put_disk() safe for disks that have not been 
registered"

Konrad Rzeszutek Wilk (1):
  xen/blkback: Fix compile warning

Lars Ellenberg (8):
  drbd: introduce stop-sector to online verify
  drbd: panic on delayed completion of aborted requests
  drbd: fix potential deadlock during bitmap (re-)allocation
  drbd: a few more GFP_KERNEL -> GFP_NOIO
  drbd: wait for meta data IO completion even with failed disk, unless 
force-detached
  drbd: prepare for more than 32 bit flags
  drbd: always write bitmap on detach
  drbd: log request sector offset and size for IO errors

Maxim Levitsky (1):
  memstick: add support for legacy memorysticks

Oliver Chick (1):
  xen/blkback: Change xen_vbd's flush_support and discard_secure to have 
type unsigned int, rather than bool

Philipp Reisner (4):
  drbd: Protect accesses to the uuid set with a spinlock
  drbd: Fix a potential issue with the DISCARD_CONCURRENT flag
  drbd: Avoid NetworkFailure state during disconnect
  drbd: Remove dead code

Selvan Mani (1):
  mtip32xx:Added appropriate timeout value for secure erase

Wei Yongjun (2):
  xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  cciss: remove unneeded memset()

 MAINTAINERS|2 +-
 drivers/block/Kconfig  |1 +
 drivers/block/cciss.c  |1 -
 drivers/block/drbd/drbd_actlog.c   |   19 ++--
 drivers/block/drbd/drbd_bitmap.c   |   24 ++---
 drivers/block/drbd/drbd_int.h  |  108 +--
 drivers/block/drbd/drbd_main.c |  170 
 drivers/block/drbd/drbd_nl.c   |   74 +---
 drivers/block/drbd/drbd_proc.c |   14 ++-
 drivers/block/drbd/drbd_receiver.c |  147 ---
 drivers/block/drbd/drbd_req.c  |   43 ++---
 drivers/block/drbd/drbd_worker.c   |   87 ++
 drivers/block/f

Re: [GIT PULL] block driver bits for 3.7

2012-10-29 Thread Jens Axboe
On 2012-10-29 19:35, Linus Torvalds wrote:
> On Mon, Oct 29, 2012 at 11:21 AM, Jens Axboe  wrote:
>>
>> Later than I would have liked, but I was traveling around the merge
>> window and then a fixes snuck in past that and I wanted to ensure that
>> everything was fully baked again.
> 
> Didn't we agree last time that drbd just wasn't important enough to
> *constantly* cause these kinds of "lots of late fixes"?
> 
> In fact, they aren't even fixes. There's a single small commit during
> the merge window since v3.6 touching that driver, and it wasn't even
> drbd-specific.
> 
> In other words, I see absolutely no reason to pull this stuff at this
> time. It's so far out of the merge window that it's not funny, and it
> has absolutely nothing about it that says "fixes".
> 
> So mind sending just the actual *fixes* for actual regressions, and
> then you can re-push this for the next merge window?

It was actually the floppy part that caused a delay on my end. But sure,
I can distill this down given the time. I'll send out a new one
tomorrow.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] block driver bits for 3.7

2012-10-30 Thread Jens Axboe
Hi Linus,

Distilled down variant, the rest will pass over to 3.8. I pulled it into
the for-linus branch I had waiting for a pull request as well, in case
you are wondering why there are new entries in here too. This also got
rid of two reverts and the ones of the mtip32xx patches that went in
later in the 3.6 cycle, so the series looks a bit cleaner.

Please pull!

  git://git.kernel.dk/linux-block.git for-linus

Akinobu Mita (1):
  cciss: select CONFIG_CHECK_SIGNATURE

Anna Leuschner (1):
  vfs: fix: don't increase bio_slab_max if krealloc() fails

Dave Chinner (1):
  loop: Make explicit loop device destruction lazy

Herton Ronaldo Krzesinski (5):
  floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop
  floppy: do put_disk on current dr if blk_init_queue fails
  floppy: properly handle failure on add_disk loop
  floppy: use common function to check if floppies can be registered
  floppy: remove dr, reuse drive on do_floppy_init

Jianpeng Ma (1):
  block: Add blk_rq_pos(rq) to sort rq when plushing

Jiri Kosina (1):
  pktcdvd: update MAINTAINERS

Jun'ichi Nomura (2):
  blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg
  blkcg: stop iteration early if root_rl is the only request list

Kees Cook (2):
  block: remove CONFIG_EXPERIMENTAL
  drivers/block: remove CONFIG_EXPERIMENTAL

Konrad Rzeszutek Wilk (1):
  xen/blkback: Fix compile warning

Oliver Chick (1):
  xen/blkback: Change xen_vbd's flush_support and discard_secure to have 
type unsigned int, rather than bool

Selvan Mani (1):
  mtip32xx:Added appropriate timeout value for secure erase

Wei Yongjun (2):
  xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
  cciss: remove unneeded memset()

 MAINTAINERS|  2 +-
 block/Kconfig  |  2 +-
 block/blk-cgroup.c | 10 +
 block/blk-core.c   |  3 +-
 drivers/block/Kconfig  | 15 ---
 drivers/block/cciss.c  |  1 -
 drivers/block/floppy.c | 90 --
 drivers/block/loop.c   | 17 ++-
 drivers/block/mtip32xx/mtip32xx.c  | 19 ++--
 drivers/block/mtip32xx/mtip32xx.h  |  3 ++
 drivers/block/xen-blkback/common.h |  4 +-
 drivers/block/xen-blkback/xenbus.c |  9 ++--
 fs/bio.c   |  6 ++-
 13 files changed, 113 insertions(+), 68 deletions(-)

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] percpu_counter: fix irq restore in __percpu_counter_add

2013-10-03 Thread Jens Axboe
On 10/03/2013 04:15 AM, Christoph Hellwig wrote:
> The current version of "percpu_counter: make APIs irq safe" in the blk-mq
> tree doesn't properly restore irqs, thus causing the boot of the blk-multique
> tree to fail with various irqs_disabled() BUGs and related issues.
>  
> Signed-off-by: Christoph Hellwig 

Thanks Christoph, merge issue I'm assuming. I'll fix it up.


-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] blk-mq fixes and improvements

2013-10-04 Thread Jens Axboe
On 10/04/2013 07:49 AM, Christoph Hellwig wrote:
> The first two patches make blk-mq work fine again for me in a KVM VM,
> the others make sure that consumers don't have to care if the underlying queue
> uses blk-mq or not and remove some code at the same time.

Thanks Christoph, applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] blk-mq updates

2013-10-06 Thread Jens Axboe
On Sun, Oct 06 2013, Christoph Hellwig wrote:
> Patch 1 makes sure bios are completed more carefully, fixing the regression
> with the earlier patch from Mike as well as a few issues found during code
> inspection.  Patches 2 and 3 are a split up and better documented version
> of "blk-mq: blk-mq should free bios in pass through case", and the last two
> are minor cleanups.

Looks good, applied! Thanks Christoph.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] blk-mq: fix blk_mq_start_stopped_hw_queues from irq context

2013-10-06 Thread Jens Axboe
On Sun, Oct 06 2013, Christoph Hellwig wrote:
> The only caller of blk_mq_start_stopped_hw_queues is in irq context,
> leading to lockdep splat when it actually gets called.  Fix this by
> deferring the hw queue run to workqueue context.
> 
> Signed-off-by: Christoph Hellwig 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2b85029..923e9e1 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -686,7 +686,8 @@ void blk_mq_start_stopped_hw_queues(struct request_queue 
> *q)
>   if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
>   continue;
>  
> - blk_mq_start_hw_queue(hctx);
> + clear_bit(BLK_MQ_S_STOPPED, &hctx->state);
> + blk_mq_run_hw_queue(hctx, true);
>   }
>  }
>  EXPORT_SYMBOL(blk_mq_start_stopped_hw_queues);

Thanks, applied. Might not be a bad idea to just mimic the run queue
API, and provide a blk_mq_start_hw_queue(hctx, is_async) instead.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC] block: use a separate plug list for blk-mq requests

2013-10-06 Thread Jens Axboe
On Sun, Oct 06 2013, Christoph Hellwig wrote:
> blk_flush_plug_list became a bit of a mess with the introduction of blk-mq,
> so I started looking into separating the blk-mq handling from it.  Turns
> out that by doing this we can streamline the blk-mq submission path a lot.
> 
> If we branch out to a blk-mq specific code path early we can do the list sort
> based on the hw ctx instead of the queue and thus avoid the later improvised
> loop to sort again.  In addition we can also remove the hw irq disabling in
> the submission path entirely and collapse a couple of functions in blk-mq.c,
> all at the cost of an additional list_head in struct blk_plug which can go
> away again as soon as we remove old-school request_fn based drivers.

Thanks, I'll take a look at this. The plugging was done mostly hacky
when implementing it, it was meant to be revisited. So definitely room
for improvement there.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Block IO core bits for 3.11

2013-07-08 Thread Jens Axboe
Hi Linus,

Here are the core IO block bits for 3.11. It contains:

- A tweak to the reserved tag logic from Jan, for weirdo devices with
  just 3 free tags. But for those it improves things substantially for
  random writes.

- Periodic writeback fix from Jan. Marked for stable as well.

- Fix for a race condition in IO scheduler switching from Jianpeng.

- The hierarchical blk-cgroup support from Tejun. This is the grunt of
  the series.

- blk-throttle fix from Vivek.

Just a note that I'm in the middle of a relocation, whole family is
flying out tomorrow. Hence I will be awal the remainder of this week,
but back at work again on Monday the 15th. CC'ing Tejun, since any
potential "surprises" will most likely be from the blk-cgroup work. But
it's been brewing for a while and sitting in my tree and linux-next for
a long time, so should be solid.

Please pull!

  git://git.kernel.dk/linux-block.git for-3.11/core

Jan Kara (2):
  writeback: Fix periodic writeback after fs mount
  block: Reserve only one queue tag for sync IO if only 3 tags are available

Jianpeng Ma (1):
  elevator: Fix a race in elevator switching

Tejun Heo (32):
  blkcg: fix error return path in blkg_create()
  blkcg: move blkg_for_each_descendant_pre() to block/blk-cgroup.h
  blkcg: implement blkg_for_each_descendant_post()
  blkcg: invoke blkcg_policy->pd_init() after parent is linked
  blkcg: move bulk of blkcg_gq release operations to the RCU callback
  blk-throttle: remove spurious throtl_enqueue_tg() call from 
throtl_select_dispatch()
  blk-throttle: remove deferred config application mechanism
  blk-throttle: collapse throtl_dispatch() into the work function
  blk-throttle: relocate throtl_schedule_delayed_work()
  blk-throttle: remove pointless throtl_nr_queued() optimizations
  blk-throttle: rename throtl_rb_root to throtl_service_queue
  blk-throttle: simplify throtl_grp flag handling
  blk-throttle: add backlink pointer from throtl_grp to throtl_data
  blk-throttle: pass around throtl_service_queue instead of throtl_data
  blk-throttle: reorganize throtl_service_queue passed around as argument
  blk-throttle: add throtl_grp->service_queue
  blk-throttle: move bio_lists[] and friends to throtl_service_queue
  blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]
  blk-throttle: generalize update_disptime optimization in blk_throtl_bio()
  blk-throttle: add throtl_service_queue->parent_sq
  blk-throttle: implement sq_to_tg(), sq_to_td() and throtl_log()
  blk-throttle: set REQ_THROTTLED from throtl_charge_bio() and gate stats 
update with it
  blk-throttle: separate out throtl_service_queue->pending_timer from 
throtl_data->dispatch_work
  blk-throttle: implement dispatch looping
  blk-throttle: dispatch from throtl_pending_timer_fn()
  blk-throttle: make blk_throtl_drain() ready for hierarchy
  blk-throttle: make blk_throtl_bio() ready for hierarchy
  blk-throttle: make tg_dispatch_one_bio() ready for hierarchy
  blk-throttle: make throtl_pending_timer_fn() ready for hierarchy
  blk-throttle: add throtl_qnode for dispatch fairness
  blk-throttle: implement throtl_grp->has_rules[]
  blk-throttle: implement proper hierarchy support

Vivek Goyal (1):
  blk-throttle: Account for child group's start time in parent while bio 
climbs up

 Documentation/cgroups/blkio-controller.txt |   29 +-
 block/blk-cgroup.c |  105 ++-
 block/blk-cgroup.h |   38 +-
 block/blk-tag.c|   11 +-
 block/blk-throttle.c   | 1064 +++-
 block/cfq-iosched.c|   17 +-
 block/deadline-iosched.c   |   16 +-
 block/elevator.c   |   25 +-
 block/noop-iosched.c   |   17 +-
 fs/block_dev.c |9 +-
 include/linux/cgroup.h |2 +
 include/linux/elevator.h   |6 +-
 12 files changed, 905 insertions(+), 434 deletions(-)


-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: trace all devices plug operation.

2013-09-15 Thread Jens Axboe
On Mon, Sep 02 2013, majianpeng wrote:
> Hi axboe:
>   How about this patch?

Applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BLK_TN_PROCESS events not delivered for all devices

2013-09-17 Thread Jens Axboe
On 09/16/2013 03:23 AM, Jan Kara wrote:
>   Hi,
> 
>   I've been looking into a problem where BLK_TN_PROCESS events are not
> delivered to all devices which are being traced. This results in process
> name being (null) when trace for a single device is parsed.
> 
> The reason for this problem is that trace_note_tsk() is called only if
> tsk->btrace_seq != blktrace_seq and it updates tsk->btrace_seq to
> blktrace_seq. Thus after a trace for another device is started
> BLK_TN_PROCESS event is sent only on behalf of the first device with which
> the task interacts. That isn't necessarily the new device thus traces for
> some devices accumulate several BLK_TN_PROCESS events for one task while
> other have none. Is this a known problem and is this intended to work
> better?
> 
> I was thinking how to fix that for a while and it doesn't seem to be
> possible without tracking with each block trace which tasks it has been
> notified about. And that is relatively expensive...

It is unfortunately a known issue... I have not come up with a good way
to fix it either, while keeping it cheap. So if you think of something,
do let me know.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: BLK_TN_PROCESS events not delivered for all devices

2013-09-17 Thread Jens Axboe
On 09/17/2013 11:10 AM, Jan Kara wrote:
> On Tue 17-09-13 08:29:07, Jens Axboe wrote:
>> On 09/16/2013 03:23 AM, Jan Kara wrote:
>>>   Hi,
>>>
>>>   I've been looking into a problem where BLK_TN_PROCESS events are not
>>> delivered to all devices which are being traced. This results in process
>>> name being (null) when trace for a single device is parsed.
>>>
>>> The reason for this problem is that trace_note_tsk() is called only if
>>> tsk->btrace_seq != blktrace_seq and it updates tsk->btrace_seq to
>>> blktrace_seq. Thus after a trace for another device is started
>>> BLK_TN_PROCESS event is sent only on behalf of the first device with which
>>> the task interacts. That isn't necessarily the new device thus traces for
>>> some devices accumulate several BLK_TN_PROCESS events for one task while
>>> other have none. Is this a known problem and is this intended to work
>>> better?
>>>
>>> I was thinking how to fix that for a while and it doesn't seem to be
>>> possible without tracking with each block trace which tasks it has been
>>> notified about. And that is relatively expensive...
>>
>> It is unfortunately a known issue... I have not come up with a good way
>> to fix it either, while keeping it cheap. So if you think of something,
>> do let me know.
>   Hum... How about linking all running block traces (struct blk_trace) in a
> linked list and sending BLK_TN_PROCESS to all the traces? Sure we will be
> spamming with BLK_TN_PROCESS a bit but starting a trace isn't such a common
> thing so it shouldn't be too bad. What do you think?

That might be good enough. I'm not worried about start/stop type
expenses, those things generally don't matter. And the list wont add any
fast path overhead when tracing.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mtip32xx: dynamically allocate buffer in debugfs functions

2013-09-17 Thread Jens Axboe
On 09/17/2013 12:18 PM, Asai Thambi S P wrote:
> On 07/12/2013 3:59 PM, Asai Thambi S P wrote:
>> On 5/23/2013 2:23 PM, David Milburn wrote:
>>
>>> Dynamically allocate buf to prevent warnings:
>>>
>>> drivers/block/mtip32xx/mtip32xx.c: In function
>>> ‘mtip_hw_read_device_status’:
>>> drivers/block/mtip32xx/mtip32xx.c:2823: warning: the frame size of
>>> 1056 bytes is larger than 1024 bytes
>>> drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_registers’:
>>> drivers/block/mtip32xx/mtip32xx.c:2894: warning: the frame size of
>>> 1056 bytes is larger than 1024 bytes
>>> drivers/block/mtip32xx/mtip32xx.c: In function ‘mtip_hw_read_flags’:
>>> drivers/block/mtip32xx/mtip32xx.c:2917: warning: the frame size of
>>> 1056 bytes is larger than 1024 bytes
>>>
>>> Signed-off-by: David Milburn 
>>
>> Acked-by: Asai Thambi S P 
>>
> Jens,
> 
> Please include this patch too for 3.12.

Queued up.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bio-integrity: Fix use of bs->bio_integrity_pool after free

2013-09-17 Thread Jens Axboe
On 09/16/2013 07:40 AM, Bjorn Helgaas wrote:
> On Thu, Jun 13, 2013 at 12:33 PM, Kent Overstreet
>  wrote:
>> On Thu, Jun 13, 2013 at 12:14:54PM -0600, Bjorn Helgaas wrote:
>>> On Wed, May 29, 2013 at 4:29 PM, Bjorn Helgaas  wrote:
>>>> This fixes a copy and paste error introduced by 9f060e2231
>>>> ("block: Convert integrity to bvec_alloc_bs()").
>>>>
>>>> Found by Coverity (CID 1020654).
>>>>
>>>> Signed-off-by: Bjorn Helgaas 
>>>> ---
>>>>  fs/bio-integrity.c |2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/bio-integrity.c b/fs/bio-integrity.c
>>>> index 8fb4291..45e944f 100644
>>>> --- a/fs/bio-integrity.c
>>>> +++ b/fs/bio-integrity.c
>>>> @@ -734,7 +734,7 @@ void bioset_integrity_free(struct bio_set *bs)
>>>> mempool_destroy(bs->bio_integrity_pool);
>>>>
>>>> if (bs->bvec_integrity_pool)
>>>> -   mempool_destroy(bs->bio_integrity_pool);
>>>> +   mempool_destroy(bs->bvec_integrity_pool);
>>>>  }
>>>>  EXPORT_SYMBOL(bioset_integrity_free);
>>>
>>> Kent, do you want to chime in on this?  Looks like an obvious error to
>>> me, but maybe I'm missing something and we should teach Coverity to
>>> shut up about it.
>>
>> Sorry - no, this is definitely a bug:
>>
>> Acked-by: Kent Overstreet 
> 
> From my v3.12-rc1 reminder list: what's going on with this patch?
> 
> It's been acked, Gu asked whether he could include it in some
> patchset, I see a Sep 11 2013 patch from Gu upstream already
> (bc5c8f078), but this patch (from May 29 2013) still hasn't gone
> anywhere.  Why is this so hard?

Queued up.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Device driver for sTec's PCIe Kronos Card.

2013-09-17 Thread Jens Axboe
On 09/17/2013 01:04 PM, Olof Johansson wrote:
> On Thu, Sep 5, 2013 at 7:12 AM, Jens Axboe  wrote:
>> On 09/05/2013 06:00 AM, Jeff Moyer wrote:
>>> OS Engineering  writes:
>>>
>>>> Hi Jeff,
>>>>
>>>> Thank you for reviewing the patch.
>>>
>>> No problem.  Jens, any objection to queueing this up for 3.12?
>>
>> I'll give it a look-over, but usually I'm pretty lax when it comes to
>> new drivers. So no, I'd be surprised if we can't queue this up for 3.12.
> 
> I came across this driver because it was spewing a lot of really
> trivial and easy to fix compiler warnings. Silly stuff such as
> printing u32 with %lu.
> 
> From a quick look at the code, several things are immediately apparent:
> 
> First, checkpatch says, on the currently existing file in -next:
> 
> total: 3 errors, 61 warnings, 5817 lines checked
> 
> Code like this looks _really_ confused:
> 
> barrier();
> val = readl(skdev->mem_map[1] + offset);
> barrier();
> 
> There are also some crazy long functions that should be refactored,
> such as skd_request_fn().
> 
> So, it looks like this driver needs a bunch of work before it's ready
> to go in. Or, maybe it's better to submit it with a TODO list for the
> staging tree instead?

Not disagreeing with you, it definitely needs a bit of cleaning. And so
far stec has not been very responsive in fixing those issues.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [block:for-next 5/6] drivers/block/skd_main.c:441:3: error: implicit declaration of function 'readq'

2013-09-18 Thread Jens Axboe
On Wed, Sep 18 2013, Akhil Bhansali wrote:
> Hi Jens,
> 
> Please accept this patch that takes care of warnings related to i386 
> compilation.
> 1. Implicit function declaration of readq and writeq.
> 2. Format related warnings for VPRINTK.

You should get rid of the VPRINTK() as well, we do have debug facilities
in place you can use for printing.

And as mentioned a week or so ago, readq() isn't even used.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Device driver for sTec's PCIe Kronos Card.

2013-09-05 Thread Jens Axboe
On 09/05/2013 06:00 AM, Jeff Moyer wrote:
> OS Engineering  writes:
> 
>> Hi Jeff,
>>
>> Thank you for reviewing the patch.
> 
> No problem.  Jens, any objection to queueing this up for 3.12?

I'll give it a look-over, but usually I'm pretty lax when it comes to
new drivers. So no, I'd be surprised if we can't queue this up for 3.12.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] (xen) stable/for-jens-3.12 for Jens Axboe

2013-09-06 Thread Jens Axboe
On 09/06/2013 02:25 PM, Konrad Rzeszutek Wilk wrote:
> Hey Jens,
> 
> I sent you a git pull a couple of weeks ago but I am not sure if
> you pulled it. It does not look like it, so here it is again along
> with an extra bug-fix.
> 
> Please git pull:
> 
>  git pull git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
> stable/for-jens-3.12
> 
> which will give you bug-fixes to Xen blkfront and backend driver:

Thanks, I'll get the 3.12 branch set up and pulled in. I'm behind on a
lot of other drivers, too.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)

2013-09-11 Thread Jens Axboe
On Thu, Aug 29 2013, Joe Perches wrote:
> Use the helper function instead of __GFP_ZERO.

Applied.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] blk-mq: Avoid effects of a weird queue depth

2013-09-12 Thread Jens Axboe
On 09/12/2013 01:42 AM, Alexander Gordeev wrote:
> Hi Jens,
> 
> Could you consider patches 4 and 5, please?

Added. Really should be re-folded since the previous patches were
broken, but I'll do that some time anyway for parts of the series again.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [block:for-next 5/6] drivers/block/skd_main.c:441:3: error: implicit declaration of function 'readq'

2013-09-13 Thread Jens Axboe
On 09/13/2013 06:59 AM, Akhil Bhansali wrote:
> This patch takes care of warnings related to
> 1. Implicit function declaration for readq / writeq.
> 2. Warnings related to -Wformat.

Please do it on top of what was already fixed up. The readq, for
instance, is not even used in the driver.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/4] skd: use regular version of copy_from_user()

2013-09-13 Thread Jens Axboe
On 09/13/2013 01:55 AM, Dan Carpenter wrote:
> The other __copy_from_user() calls have access_ok() checks so they are
> fine, but these two don't have the check.

Thanks Dan, applied this one and the other two. Did you miss sending one
out? I don't see a 3/4 patch.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 0/2] Convert from bio-based to blk-mq

2013-10-08 Thread Jens Axboe
On Tue, Oct 08 2013, Matthew Wilcox wrote:
> On Tue, Oct 08, 2013 at 11:34:20AM +0200, Matias Bjørling wrote:
> > The nvme driver implements itself as a bio-based driver. This primarily 
> > because
> > of high lock congestion for high-performance nvm devices. To remove the
> > congestion, a multi-queue block layer is being implemented.
> 
> Um, no.  You'll crater performance by adding another memory allocation
> (of the struct request).  multi-queue is not the solution.

That's a rather "jump to conclusions" statement to make. As Matias
mentioned, there are no extra fast path allocations. Once the tagging is
converted as well, I'd be surprised if it performs worse than before.
And that on top of a net reduction in code.

blk-mq might not be perfect as it stands, but it's a helluva lot better
than a bunch of flash based drivers with lots of duplicated code and
mechanisms. We need to move away from that.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Fix for bcache regression

2013-10-10 Thread Jens Axboe
On Thu, Oct 10 2013, Kent Overstreet wrote:
> Linus, please apply - the last fix in the bugfix series I sent you had an
> embarassing screwup...
> 
> For 3.13, shall I start sending you pull requests directly?

Sorry I dropped the ball on that one, it's the first time ever I've
missed a deadline. It's being ramped up again, kernel time will be
plenty again shortly.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bcache: Fix a shrinker deadlock

2013-08-30 Thread Jens Axboe
On Fri, Aug 30 2013, Kent Overstreet wrote:
> GFP_NOIO means we could be getting called recursively - mca_alloc() ->
> mca_data_alloc() - definitely can't use mutex_lock(bucket_lock) then.
> Whoops.

Kent, can you provide and updated repo with the pending patches? There's
been some churn here lately (and good fixes), would like to ensure I
don't miss anything.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >