Re: [dm-devel] dm-verity OOPSes on kernel 4.12rc
On 05/18/2017 08:55 AM, Gilad Ben-Yossef wrote: > On Thu, May 18, 2017 at 9:37 AM, Milan Broz wrote: >> Hi Gilad, >> >> seems this OOPs is caused by async crypto hash changes in 4.12 for dm-verity. >> > > Oy, that is not good. > >> Could you please check if it is some known problem? >> > > I am not aware of any problem but of course I will try to reproduce an debug. > >> Fedora rawhide x86_64 (with 4.12rc patches) crashes always, >> running verity-compat-test from cryptsetup testsuite is enough to trigger >> this. >> >> I am not able to reproduce it on other distro but I guess it is just >> some kernel debugging switch that is enabled in Fedora by default. >> > > hmm... looking at the trace - there seems to be buffer io errors *before* the > BUG assert. That is surprising for a KVM run. > > Also the code path triggering this I believe is when we read hash blocks. > Possibly this is triggered by an IO request failing and than we continue > with trying to hash the block. This is just a guess... I will take a look. Thanks! FYI, I am able to reproduce it on 32bit kernel as well, git from Linus' tree updated today. All I need to add is some kernel debug instrumentation, these options: +CONFIG_DEBUG_LIST=y +CONFIG_DEBUG_PI_LIST=y +CONFIG_DEBUG_SG=y then it crashes: : kernel BUG at ./include/linux/scatterlist.h:140! : invalid opcode: [#1] PREEMPT SMP : Modules linked in: dm_verity reed_solomon dm_bufio loop dm_mod crc32_pclmul crc32c_intel pcbc aesni_intel aes_i586 crypto_simd cryptd ata_piix : CPU: 3 PID: 3331 Comm: kworker/u8:6 Not tainted 4.12.0-rc1+ #187 : Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 : Workqueue: kverityd verity_work [dm_verity] : task: d89a2380 task.stack: d8a26000 : EIP: sg_init_one+0x80/0xa0 : EFLAGS: 00010246 CPU: 3 : EAX: EBX: ECX: EDX: d8a27d7c : ESI: d8a27d7c EDI: EBP: d8a27d70 ESP: d8a27d60 : DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 : CR0: 80050033 CR2: bfd71bcc CR3: 18d33000 CR4: 001406d0 : Call Trace: : verity_hash_update.isra.1+0x28/0xb0 [dm_verity] : verity_hash_init+0x84/0xe0 [dm_verity] : verity_hash+0x2e/0x70 [dm_verity] : verity_verify_level+0x122/0x1b0 [dm_verity] : verity_hash_for_block+0xab/0xf0 [dm_verity] : verity_work+0x6d/0x1b1 [dm_verity] : ? sched_clock_cpu+0x19/0x130 : process_one_work+0x1c7/0x5c0 : worker_thread+0x39/0x380 : kthread+0xd6/0x110 : ? process_one_work+0x5c0/0x5c0 : ? kthread_worker_fn+0x100/0x100 : ret_from_fork+0x21/0x2c : Code: c3 81 3e 21 43 65 87 75 2a 83 e1 01 75 2d 8b 45 f0 09 d3 89 7e 0c 89 5e 04 89 46 08 83 c4 04 5b 5e 5f 5d c3 8d b4 26 00 00 00 00 <0f> 0b 8d b6 00 00 00 00 0f 0b 8d b6 00 00 00 00 0f 0b 8d b4 26 : EIP: sg_init_one+0x80/0xa0 SS:ESP: 0068:d8a27d60 : ---[ end trace 9b16cafae4c60e01 ]--- >>> kernel: >>> >>> 4.12.0-0.rc1.git1.1.fc27.x86_64 >>> - latest 4.12 build in rawhide >>> >>> cryptsetup commit: >>> >>> d9a528922b5d1a15c72936ea2e5e87ce2d31bc3d >>> - HEAD of wip-luks2 branch as of 2017-05-16 >>> >>> ... >>> [ 1057.722305] buffer_io_error: 146 callbacks suppressed >>> [ 1057.723514] Buffer I/O error on dev dm-2, logical block 128, async >>> page read >>> [ 1058.364790] [ cut here ] >>> [ 1058.366235] kernel BUG at ./include/linux/scatterlist.h:140! >>> [ 1058.367567] invalid opcode: [#1] SMP >>> [ 1058.368545] Modules linked in: wp512 cast5_generic cast_common >>> des3_ede_x86_64 des_generic blowfish_generic blowfish_x86_64 >>> blowfish_common xfs libcrc32c vfat fat twofish_generic >>> twofish_x86_64_3way twofish_x86_64 twofish_common serpent_sse2_x86_64 >>> serpent_generic ablk_helper dm_verity reed_solomon dm_crypt loop >>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev >>> virtio_net joydev virtio_balloon acpi_cpufreq tpm_tis tpm_tis_core >>> parport_pc tpm i2c_piix4 parport virtio_blk cirrus drm_kms_helper ttm >>> serio_raw drm virtio_pci virtio_ring virtio ata_generic pata_acpi [last >>> unloaded: scsi_debug] >>> [ 1058.368545] CPU: 0 PID: 61 Comm: kworker/u4:1 Not tainted >>> 4.12.0-0.rc1.git1.1.fc27.x86_64 #1 >>> [ 1058.368545] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 >>> [ 1058.368545] Workqueue: kverityd verity_work [dm_verity] >>> [ 1058.368545] task: 97ba361cb100 task.stack: a5b240738000 >>> [ 1058.368545] RIP: 0010:sg_init_one+0x8c/0xa0 >>> [ 1058.368545] RSP: 0018:a5b24073bb30 EFLAGS: 00010246 >>> [ 1058.368545] RAX: RBX: RCX: >>> a5b24073bb80 >>> [ 1058.368545] RDX: 6846 RSI: 87654321 RDI: >>> 8000 >>> [ 1058.368545] RBP: a5b24073bb48 R08: a5b24073bb58 R09: >>> >>> [ 1058.368545] R10: a5b24073bb58 R11: R12: >>> >>> [ 1058.368545] R13: a5b24073bb58 R14: a5b24073bbe8 R15: >>> 97ba743fe400 >>> [ 1058.368545] FS: () GS:97ba7780() >>> knl
[dm-devel] kernel BUG at lib/percpu-refcount.c:192
When running `lvm lvcreate --type raid5 --size 4194304B --name meta_r5 LVMTEST12279nqymhamb_vg --yes` there's a warning followed by the bug. Also at [ 192.140142] it looks like there is an uninitialized variable somewhere. env: rawhide, kvm, 2 (V)CPUs lvm2: upstream kernel: 4.12.0-0.rc1.git1.1.fc27.x86_64 [ 192.131076] mdX: bitmap file is out of date, doing full recovery [ 192.140142] percpu ref ( (null)) <= 0 (-9223372036854775807) after switching to atomic [ 192.140180] [ cut here ] [ 192.143734] WARNING: CPU: 0 PID: 10 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1cd/0x1f0 [ 192.146353] Modules linked in: dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq ...|dm_thin_pool dm_cache_smq dm_cache dm_persistent_data dm_bio_prison libcrc32c loop crct10dif_pclmul crc32_pclmul crc32c_intel ...|ppdev ghash_clmulni_intel acpi_cpufreq virtio_net virtio_balloon tpm_tis joydev tpm_tis_core parport_pc parport tpm i2c_piix4 ...|virtio_blk cirrus drm_kms_helper ttm drm serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi [ 192.154276] CPU: 0 PID: 10 Comm: rcuos/0 Not tainted 4.12.0-0.rc1.git1.1.fc27.x86_64 #1 [ 192.155684] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 192.156741] task: 8d5bb6e73100 task.stack: a9a3c036 [ 192.157830] RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x1cd/0x1f0 [ 192.158956] RSP: 0018:a9a3c0363e50 EFLAGS: 00010282 [ 192.159899] RAX: 0053 RBX: 8000 RCX: [ 192.161192] RDX: RSI: 8d5bb6e73d90 RDI: 0247 [ 192.162483] RBP: a9a3c0363e68 R08: R09: [ 192.163762] R10: a9a3c0363e00 R11: R12: [ 192.165144] R13: 8d5bb5252580 R14: 8d5bb5252580 R15: [ 192.166437] FS: () GS:8d5bb780() knlGS: [ 192.169392] CS: 0010 DS: ES: CR0: 80050033 [ 192.171893] CR2: 7ffc83d87d98 CR3: 13e11000 CR4: 06f0 [ 192.174692] Call Trace: [ 192.176627] ? percpu_ref_reinit+0x140/0x140 [ 192.178875] rcu_nocb_kthread+0x1ae/0x580 [ 192.181174] kthread+0x133/0x150 [ 192.183305] ? rcu_eqs_enter_common.constprop.68+0x1c0/0x1c0 [ 192.185813] ? kthread_create_on_node+0x70/0x70 [ 192.188110] ret_from_fork+0x31/0x40 [ 192.190338] Code: ff ff ff 80 3d a6 71 b9 00 00 0f 85 c1 fe ff ff 49 8b 55 d8 49 8b 75 e8 48 c7 c7 f0 2b ce 8b c6 05 8a 71 b9 00 ...|01 e8 89 95 d4 ff <0f> ff e9 9f fe ff ff f0 49 83 6d d8 01 0f 85 05 ff ff ff 48 89 [ 192.196842] ---[ end trace fbf926294c454ffa ]--- [ 192.199425] [ cut here ] [ 192.200346] kernel BUG at lib/percpu-refcount.c:192! [ 192.200346] invalid opcode: [#1] SMP [ 192.200346] Modules linked in: dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq ...|dm_thin_pool dm_cache_smq dm_cache dm_persistent_data dm_bio_prison libcrc32c loop crct10dif_pclmul crc32_pclmul crc32c_intel ...|ppdev ghash_clmulni_intel acpi_cpufreq virtio_net virtio_balloon tpm_tis joydev tpm_tis_core parport_pc parport tpm i2c_piix4 ...|virtio_blk cirrus drm_kms_helper ttm drm serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi [ 192.211647] CPU: 0 PID: 13767 Comm: mdX_raid5 Tainted: GW 4.12.0-0.rc1.git1.1.fc27.x86_64 #1 [ 192.211647] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 192.211647] task: 8d5bb15f3100 task.stack: a9a3c0d1 [ 192.211647] RIP: 0010:__percpu_ref_switch_mode+0x278/0x280 [ 192.211647] RSP: 0018:a9a3c0d13ce8 EFLAGS: 00010046 [ 192.211647] RAX: 0001 RBX: 8d5bb5252558 RCX: [ 192.211647] RDX: 0001 RSI: 8c05a098 RDI: 0046 [ 192.211647] RBP: a9a3c0d13d30 R08: 0001 R09: [ 192.211647] R10: a9a3c0d13ca0 R11: 8c05a098 R12: [ 192.211647] R13: R14: 8d5bb15f3100 R15: 8d5bb508f000 [ 192.211647] FS: () GS:8d5bb780() knlGS: [ 192.211647] CS: 0010 DS: ES: CR0: 80050033 [ 192.211647] CR2: 7ffc83d87d98 CR3: 13e11000 CR4: 06f0 [ 192.211647] Call Trace: [ 192.211647] percpu_ref_switch_to_percpu+0x27/0x40 [ 192.211647] set_in_sync+0xd4/0xe0 [ 192.211647] md_check_recovery+0x1f5/0x4e0 [ 192.211647] raid5d+0x56/0x6a0 [raid456] [ 192.211647] ? finish_wait+0x72/0x90 [ 192.211647] ? trace_hardirqs_on_caller+0xf4/0x190 [ 192.211647] ? trace_hardirqs_on+0xd/0x10 [ 192.211647] md_thread+0x138/0x180 [ 192.211647] ? md_thread+0x138/0x180 [ 192.211647] ? finish_wait+0x90/0x90 [ 192.211647] kthread+0x133/0x150 [ 192.211647] ? find_pers+0x70/0x70 [ 192.211647] ? kthread_create_on
Re: [dm-devel] dm-verity OOPSes on kernel 4.12rc
On Thu, May 18, 2017 at 10:18 AM, Milan Broz wrote: > On 05/18/2017 08:55 AM, Gilad Ben-Yossef wrote: >> On Thu, May 18, 2017 at 9:37 AM, Milan Broz wrote: >>> Hi Gilad, >>> >>> seems this OOPs is caused by async crypto hash changes in 4.12 for >>> dm-verity. >>> >> >> Oy, that is not good. >> >>> Could you please check if it is some known problem? >>> >> >> I am not aware of any problem but of course I will try to reproduce an debug. >> >>> Fedora rawhide x86_64 (with 4.12rc patches) crashes always, >>> running verity-compat-test from cryptsetup testsuite is enough to trigger >>> this. >>> >>> I am not able to reproduce it on other distro but I guess it is just >>> some kernel debugging switch that is enabled in Fedora by default. >>> >> >> hmm... looking at the trace - there seems to be buffer io errors *before* the >> BUG assert. That is surprising for a KVM run. >> >> Also the code path triggering this I believe is when we read hash blocks. >> Possibly this is triggered by an IO request failing and than we continue >> with trying to hash the block. This is just a guess... I will take a look. > > Thanks! > > FYI, I am able to reproduce it on 32bit kernel as well, > git from Linus' tree updated today. OK, I know what the problem is. It's triggered by having no salt value. Testing fix... Thanks all, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker "If you take a class in large-scale robotics, can you end up in a situation where the homework eats your dog?" -- Jean-Baptiste Queru -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] dm verity: fix no salt used use case
DM-Verity has an (undocumented) mode where no salt is used. This was never handled directly by the DM-Verity code, instead working due to the fact that calling crypto_shash_update() with a zero length data is an implicit noop. This is no longer the case now that we have switched to crypto_ahash_update(). Fix the issue by introducing an explicit handling of the no salt use case to DM-Verity. Signed-off-by: Gilad Ben-Yossef Reported-by: Marian Csontos Fixes: d1ac3ff ("dm verity: switch to using asynchronous hash crypto API") CC: Milan Broz --- drivers/md/dm-verity-target.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index 97de961..1ec9b2c 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -166,7 +166,7 @@ static int verity_hash_init(struct dm_verity *v, struct ahash_request *req, return r; } - if (likely(v->version >= 1)) + if (likely(v->salt_size && (v->version >= 1))) r = verity_hash_update(v, req, v->salt, v->salt_size, res); return r; @@ -177,7 +177,7 @@ static int verity_hash_final(struct dm_verity *v, struct ahash_request *req, { int r; - if (unlikely(!v->version)) { + if (unlikely(v->salt_size && (!v->version))) { r = verity_hash_update(v, req, v->salt, v->salt_size, res); if (r < 0) { -- 2.1.4 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] dm verity: fix no salt used use case
On 05/18/2017 12:47 PM, Gilad Ben-Yossef wrote: > DM-Verity has an (undocumented) mode where no salt is used. > This was never handled directly by the DM-Verity code, instead working due > to the fact that calling crypto_shash_update() with a zero length data is > an implicit noop. > > This is no longer the case now that we have switched to > crypto_ahash_update(). Fix the issue by introducing an explicit handling > of the no salt use case to DM-Verity. > > Signed-off-by: Gilad Ben-Yossef > Reported-by: Marian Csontos > Fixes: d1ac3ff ("dm verity: switch to using asynchronous hash crypto API") > CC: Milan Broz Tested-by: Milan Broz Thanks for quick fix! Mike: this must go to 4.12rc (only). m. > --- > drivers/md/dm-verity-target.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index 97de961..1ec9b2c 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -166,7 +166,7 @@ static int verity_hash_init(struct dm_verity *v, struct > ahash_request *req, > return r; > } > > - if (likely(v->version >= 1)) > + if (likely(v->salt_size && (v->version >= 1))) > r = verity_hash_update(v, req, v->salt, v->salt_size, res); > > return r; > @@ -177,7 +177,7 @@ static int verity_hash_final(struct dm_verity *v, struct > ahash_request *req, > { > int r; > > - if (unlikely(!v->version)) { > + if (unlikely(v->salt_size && (!v->version))) { > r = verity_hash_update(v, req, v->salt, v->salt_size, res); > > if (r < 0) { > -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 04/15] dm: fix REQ_RAHEAD handling
A few (but not all) dm targets use a special EWOULDBLOCK error code for failing REQ_RAHEAD requests that fail due to a lack of available resources. But no one else knows about this magic code, and lower level drivers also don't generate it when failing read-ahead requests for similar reasons. So remove this special casing and ignore all additional error handling for REQ_RAHEAD - if this was a real underlying error we'd get a normal read once the real read comes in. Signed-off-by: Christoph Hellwig --- drivers/md/dm-raid1.c | 4 ++-- drivers/md/dm-stripe.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index a95cbb80fb34..5e30b08b91d9 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c @@ -1214,7 +1214,7 @@ static int mirror_map(struct dm_target *ti, struct bio *bio) */ if (!r || (r == -EWOULDBLOCK)) { if (bio->bi_opf & REQ_RAHEAD) - return -EWOULDBLOCK; + return -EIO; queue_bio(ms, bio, rw); return DM_MAPIO_SUBMITTED; @@ -1258,7 +1258,7 @@ static int mirror_end_io(struct dm_target *ti, struct bio *bio, int error) if (error == -EOPNOTSUPP) return error; - if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD)) + if (bio->bi_opf & REQ_RAHEAD) return error; if (unlikely(error)) { diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c index 75152482f3ad..780e95889a7c 100644 --- a/drivers/md/dm-stripe.c +++ b/drivers/md/dm-stripe.c @@ -384,7 +384,7 @@ static int stripe_end_io(struct dm_target *ti, struct bio *bio, int error) if (!error) return 0; /* I/O complete */ - if ((error == -EWOULDBLOCK) && (bio->bi_opf & REQ_RAHEAD)) + if (bio->bi_opf & REQ_RAHEAD) return error; if (error == -EOPNOTSUPP) -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 07/15] block_dev: propagate bio_iov_iter_get_pages error in __blkdev_direct_IO
Once we move the block layer to its own status code we'll still want to propagate the bio_iov_iter_get_pages, so restructure __blkdev_direct_IO to take ret into account when returning the errno. --- fs/block_dev.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 51959936..c1dc393ad6b9 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -334,7 +334,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) bool is_read = (iov_iter_rw(iter) == READ), is_sync; loff_t pos = iocb->ki_pos; blk_qc_t qc = BLK_QC_T_NONE; - int ret; + int ret = 0; if ((pos | iov_iter_alignment(iter)) & (bdev_logical_block_size(bdev) - 1)) @@ -363,7 +363,7 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) ret = bio_iov_iter_get_pages(bio, iter); if (unlikely(ret)) { - bio->bi_error = ret; + bio->bi_error = -EIO; bio_endio(bio); break; } @@ -412,7 +412,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) } __set_current_state(TASK_RUNNING); - ret = dio->bio.bi_error; + if (!ret) + ret = dio->bio.bi_error; if (likely(!ret)) ret = dio->size; -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 02/15] scsi/osd: don't save block errors into req_results
We will only have sense data if the command exectured and got a SCSI result, so this is pointless. Signed-off-by: Christoph Hellwig --- drivers/scsi/osd/osd_initiator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index 8a1b94816419..14785177ce7b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -477,7 +477,7 @@ static void _set_error_resid(struct osd_request *or, struct request *req, int error) { or->async_error = error; - or->req_errors = scsi_req(req)->result ? : error; + or->req_errors = scsi_req(req)->result; or->sense_len = scsi_req(req)->sense_len; if (or->sense_len) memcpy(or->sense, scsi_req(req)->sense, or->sense_len); -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 03/15] gfs2: remove the unused sd_log_error field
Signed-off-by: Christoph Hellwig --- fs/gfs2/incore.h | 1 - fs/gfs2/lops.c | 4 +--- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index b7cf65d13561..aa3d44527fa2 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -815,7 +815,6 @@ struct gfs2_sbd { atomic_t sd_log_in_flight; struct bio *sd_log_bio; wait_queue_head_t sd_log_flush_wait; - int sd_log_error; atomic_t sd_reserving_log; wait_queue_head_t sd_reserving_log_wait; diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index b1f9144b42c7..13ebf15a4db0 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -209,10 +209,8 @@ static void gfs2_end_log_write(struct bio *bio) struct page *page; int i; - if (bio->bi_error) { - sdp->sd_log_error = bio->bi_error; + if (bio->bi_error) fs_err(sdp, "Error %d writing to log\n", bio->bi_error); - } bio_for_each_segment_all(bvec, bio, i) { page = bvec->bv_page; -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 14/15] blk-mq: switch ->queue_rq return value to blk_status_t
Use the same values for use for request completion errors as the return value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause a requeue, and all the others are completed as-is. Signed-off-by: Christoph Hellwig --- block/blk-mq.c| 37 -- drivers/block/loop.c | 6 +++--- drivers/block/mtip32xx/mtip32xx.c | 17 drivers/block/nbd.c | 12 --- drivers/block/null_blk.c | 4 ++-- drivers/block/rbd.c | 4 ++-- drivers/block/virtio_blk.c| 10 +- drivers/block/xen-blkfront.c | 8 drivers/md/dm-rq.c| 8 drivers/mtd/ubi/block.c | 6 +++--- drivers/nvme/host/core.c | 14 ++--- drivers/nvme/host/fc.c| 23 +++-- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/host/pci.c | 42 +++ drivers/nvme/host/rdma.c | 26 +--- drivers/nvme/target/loop.c| 17 drivers/scsi/scsi_lib.c | 30 ++-- include/linux/blk-mq.h| 7 ++- 18 files changed, 131 insertions(+), 142 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 1430939ecc0e..fdaad9fbebf6 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -987,7 +987,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) { struct blk_mq_hw_ctx *hctx; struct request *rq; - int errors, queued, ret = BLK_MQ_RQ_QUEUE_OK; + int errors, queued; if (list_empty(list)) return false; @@ -998,6 +998,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) errors = queued = 0; do { struct blk_mq_queue_data bd; + blk_status_t ret; rq = list_first_entry(list, struct request, queuelist); if (!blk_mq_get_driver_tag(rq, &hctx, false)) { @@ -1038,25 +1039,20 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) } ret = q->mq_ops->queue_rq(hctx, &bd); - switch (ret) { - case BLK_MQ_RQ_QUEUE_OK: - queued++; - break; - case BLK_MQ_RQ_QUEUE_BUSY: + if (ret == BLK_STS_RESOURCE) { blk_mq_put_driver_tag_hctx(hctx, rq); list_add(&rq->queuelist, list); __blk_mq_requeue_request(rq); break; - default: - pr_err("blk-mq: bad return on queue: %d\n", ret); - case BLK_MQ_RQ_QUEUE_ERROR: + } + + if (unlikely(ret != BLK_STS_OK)) { errors++; blk_mq_end_request(rq, BLK_STS_IOERR); - break; + continue; } - if (ret == BLK_MQ_RQ_QUEUE_BUSY) - break; + queued++; } while (!list_empty(list)); hctx->dispatched[queued_to_index(queued)]++; @@ -1094,7 +1090,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) * - blk_mq_run_hw_queue() checks whether or not a queue has * been stopped before rerunning a queue. * - Some but not all block drivers stop a queue before -* returning BLK_MQ_RQ_QUEUE_BUSY. Two exceptions are scsi-mq +* returning BLK_STS_RESOURCE. Two exceptions are scsi-mq * and dm-rq. */ if (!blk_mq_sched_needs_restart(hctx) && @@ -1490,7 +1486,7 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, }; struct blk_mq_hw_ctx *hctx; blk_qc_t new_cookie; - int ret; + blk_status_t ret; if (q->elevator) goto insert; @@ -1506,18 +1502,19 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, * would have done */ ret = q->mq_ops->queue_rq(hctx, &bd); - if (ret == BLK_MQ_RQ_QUEUE_OK) { + switch (ret) { + case BLK_STS_OK: *cookie = new_cookie; return; - } - - if (ret == BLK_MQ_RQ_QUEUE_ERROR) { + case BLK_STS_RESOURCE: + __blk_mq_requeue_request(rq); + goto insert; + default: *cookie = BLK_QC_T_NONE; - blk_mq_end_request(rq, BLK_STS_IOERR); + blk_mq_end_request(rq, ret); return; } - __blk_mq_requeue_request(rq); insert: blk_mq_sched_insert_request(rq, false, true, false, may_sleep); } diff --git a/drivers/block/loop.c b/drivers/block/loop.
[dm-devel] [PATCH 13/15] block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig --- arch/s390/include/asm/eadm.h| 6 +- arch/um/drivers/ubd_kern.c | 2 +- block/blk-core.c| 156 +--- block/blk-exec.c| 4 +- block/blk-flush.c | 8 +- block/blk-mq.c | 10 +-- block/bsg-lib.c | 4 +- block/bsg.c | 6 +- drivers/block/DAC960.c | 2 +- drivers/block/amiflop.c | 10 +-- drivers/block/aoe/aoecmd.c | 2 +- drivers/block/ataflop.c | 16 ++-- drivers/block/cciss.c | 3 +- drivers/block/floppy.c | 4 +- drivers/block/loop.c| 2 +- drivers/block/mtip32xx/mtip32xx.c | 16 ++-- drivers/block/mtip32xx/mtip32xx.h | 2 +- drivers/block/nbd.c | 14 ++-- drivers/block/null_blk.c| 9 ++- drivers/block/paride/pcd.c | 8 +- drivers/block/paride/pd.c | 2 +- drivers/block/paride/pf.c | 18 ++--- drivers/block/ps3disk.c | 11 +-- drivers/block/rbd.c | 8 +- drivers/block/skd_main.c| 31 --- drivers/block/sunvdc.c | 4 +- drivers/block/swim.c| 6 +- drivers/block/swim3.c | 26 +++--- drivers/block/sx8.c | 20 ++--- drivers/block/virtio_blk.c | 10 +-- drivers/block/xen-blkfront.c| 16 ++-- drivers/block/xsysace.c | 8 +- drivers/block/z2ram.c | 4 +- drivers/cdrom/gdrom.c | 9 ++- drivers/ide/ide-atapi.c | 9 ++- drivers/ide/ide-cd.c| 10 +-- drivers/ide/ide-dma.c | 2 +- drivers/ide/ide-eh.c| 16 ++-- drivers/ide/ide-floppy.c| 6 +- drivers/ide/ide-io.c| 10 +-- drivers/ide/ide-pm.c| 6 +- drivers/ide/ide-tape.c | 2 +- drivers/ide/ide-taskfile.c | 6 +- drivers/ide/siimage.c | 6 +- drivers/md/dm-mpath.c | 27 +++ drivers/md/dm-rq.c | 20 ++--- drivers/md/dm-rq.h | 2 +- drivers/memstick/core/ms_block.c| 7 +- drivers/memstick/core/mspro_block.c | 8 +- drivers/mmc/core/block.c| 37 + drivers/mmc/core/queue.c| 2 +- drivers/mtd/mtd_blkdevs.c | 30 --- drivers/mtd/ubi/block.c | 2 +- drivers/nvme/host/core.c| 29 +++ drivers/nvme/host/lightnvm.c| 2 +- drivers/nvme/host/pci.c | 8 +- drivers/s390/block/dasd.c | 36 + drivers/s390/block/scm_blk.c| 8 +- drivers/s390/block/scm_blk.h| 4 +- drivers/s390/cio/eadm_sch.c | 6 +- drivers/s390/cio/scm.c | 2 +- drivers/sbus/char/jsflash.c | 4 +- drivers/scsi/osd/osd_initiator.c| 20 ++--- drivers/scsi/osst.c | 2 +- drivers/scsi/scsi_error.c | 2 +- drivers/scsi/scsi_lib.c | 51 drivers/scsi/scsi_transport_sas.c | 2 +- drivers/scsi/sg.c | 6 +- drivers/scsi/st.c | 2 +- drivers/target/target_core_pscsi.c | 4 +- include/linux/bio.h | 16 include/linux/blk-mq.h | 4 +- include/linux/blkdev.h | 21 ++--- include/linux/device-mapper.h | 2 +- include/linux/ide.h | 6 +- include/scsi/osd_initiator.h| 2 +- 76 files changed, 475 insertions(+), 429 deletions(-) diff --git a/arch/s390/include/asm/eadm.h b/arch/s390/include/asm/eadm.h index 67026300c88e..144809a3f4f6 100644 --- a/arch/s390/include/asm/eadm.h +++ b/arch/s390/include/asm/eadm.h @@ -3,6 +3,7 @@ #include #include +#include struct a
[dm-devel] [PATCH 10/15] dm: change ->end_io calling convention
Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig --- drivers/md/dm-cache-target.c | 4 ++-- drivers/md/dm-flakey.c| 8 drivers/md/dm-log-writes.c| 4 ++-- drivers/md/dm-mpath.c | 11 ++- drivers/md/dm-raid1.c | 14 +++--- drivers/md/dm-snap.c | 4 ++-- drivers/md/dm-stripe.c| 14 +++--- drivers/md/dm-thin.c | 4 ++-- drivers/md/dm.c | 36 ++-- include/linux/device-mapper.h | 2 +- 10 files changed, 51 insertions(+), 50 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index d682a0511381..c48612e6d525 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2820,7 +2820,7 @@ static int cache_map(struct dm_target *ti, struct bio *bio) return r; } -static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) +static int cache_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct cache *cache = ti->private; unsigned long flags; @@ -2838,7 +2838,7 @@ static int cache_end_io(struct dm_target *ti, struct bio *bio, int error) bio_drop_shared_lock(cache, bio); accounted_complete(cache, bio); - return 0; + return DM_ENDIO_DONE; } static int write_dirty_bitset(struct cache *cache) diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index e8f093b323ce..c9539917a59b 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -358,12 +358,12 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } -static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) +static int flakey_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct flakey_c *fc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); - if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { + if (!*error && pb->bio_submitted && (bio_data_dir(bio) == READ)) { if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) && all_corrupt_bio_flags_match(bio, fc)) { /* @@ -377,11 +377,11 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error) * Error read during the down_interval if drop_writes * and error_writes were not configured. */ - return -EIO; + *error = -EIO; } } - return error; + return DM_ENDIO_DONE; } static void flakey_status(struct dm_target *ti, status_type_t type, diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c index e42264706c59..cc57c7fa1268 100644 --- a/drivers/md/dm-log-writes.c +++ b/drivers/md/dm-log-writes.c @@ -664,7 +664,7 @@ static int log_writes_map(struct dm_target *ti, struct bio *bio) return DM_MAPIO_REMAPPED; } -static int normal_end_io(struct dm_target *ti, struct bio *bio, int error) +static int normal_end_io(struct dm_target *ti, struct bio *bio, int *error) { struct log_writes_c *lc = ti->private; struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data)); @@ -686,7 +686,7 @@ static int normal_end_io(struct dm_target *ti, struct bio *bio, int error) spin_unlock_irqrestore(&lc->blocks_lock, flags); } - return error; + return DM_ENDIO_DONE; } /* diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 94b1aa917fcb..14f5b0d4ecce 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1517,16 +1517,17 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone, return r; } -static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) +static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int *error) { struct multipath *m = ti->private; struct dm_mpath_io *mpio = get_mpio_from_bio(clone); struct pgpath *pgpath = mpio->pgpath; unsigned long flags; + int r = DM_ENDIO_DONE; BUG_ON(!mpio); - if (!error || noretry_error(error)) + if (!*error || noretry_error(*error)) goto done; if (pgpath) @@ -1535,7 +1536,7 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int err if (atomic_read(&m->nr_valid_paths) == 0 && !test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { dm_report_EIO(m); - error = -EIO; + *error = -EIO; goto done; } @@ -1548,7 +1549,7 @@ static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, in
[dm-devel] [PATCH 05/15] fs: remove the unused error argument to dio_end_io()
Signed-off-by: Christoph Hellwig --- fs/btrfs/inode.c | 6 +++--- fs/direct-io.c | 3 +-- include/linux/fs.h | 2 +- 3 files changed, 5 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 17cbe9306faf..758b2666885e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8244,7 +8244,7 @@ static void btrfs_endio_direct_read(struct bio *bio) kfree(dip); dio_bio->bi_error = bio->bi_error; - dio_end_io(dio_bio, bio->bi_error); + dio_end_io(dio_bio); if (io_bio->end_io) io_bio->end_io(io_bio, err); @@ -8304,7 +8304,7 @@ static void btrfs_endio_direct_write(struct bio *bio) kfree(dip); dio_bio->bi_error = bio->bi_error; - dio_end_io(dio_bio, bio->bi_error); + dio_end_io(dio_bio); bio_put(bio); } @@ -8673,7 +8673,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * Releases and cleans up our dio_bio, no need to bio_put() * nor bio_endio()/bio_io_error() against dio_bio. */ - dio_end_io(dio_bio, ret); + dio_end_io(dio_bio); } if (io_bio) bio_put(io_bio); diff --git a/fs/direct-io.c b/fs/direct-io.c index a04ebea77de8..04247a6c3f73 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -348,13 +348,12 @@ static void dio_bio_end_io(struct bio *bio) /** * dio_end_io - handle the end io action for the given bio * @bio: The direct io bio thats being completed - * @error: Error if there was one * * This is meant to be called by any filesystem that uses their own dio_submit_t * so that the DIO specific endio actions are dealt with after the filesystem * has done it's completion work. */ -void dio_end_io(struct bio *bio, int error) +void dio_end_io(struct bio *bio) { struct dio *dio = bio->bi_private; diff --git a/include/linux/fs.h b/include/linux/fs.h index 803e5a9b2654..4388ab58843d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2843,7 +2843,7 @@ enum { DIO_SKIP_DIO_COUNT = 0x08, }; -void dio_end_io(struct bio *bio, int error); +void dio_end_io(struct bio *bio); ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 08/15] dm mpath: merge do_end_io_bio into multipath_end_io_bio
This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig --- drivers/md/dm-mpath.c | 42 -- 1 file changed, 16 insertions(+), 26 deletions(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 3df056b73b66..b1cb0273b081 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1510,24 +1510,26 @@ static int multipath_end_io(struct dm_target *ti, struct request *clone, return r; } -static int do_end_io_bio(struct multipath *m, struct bio *clone, -int error, struct dm_mpath_io *mpio) +static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) { + struct multipath *m = ti->private; + struct dm_mpath_io *mpio = get_mpio_from_bio(clone); + struct pgpath *pgpath = mpio->pgpath; unsigned long flags; - if (!error) - return 0; /* I/O complete */ + BUG_ON(!mpio); - if (noretry_error(error)) - return error; + if (!error || noretry_error(error)) + goto done; - if (mpio->pgpath) - fail_path(mpio->pgpath); + if (pgpath) + fail_path(pgpath); if (atomic_read(&m->nr_valid_paths) == 0 && !test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { dm_report_EIO(m); - return -EIO; + error = -EIO; + goto done; } /* Queue for the daemon to resubmit */ @@ -1539,28 +1541,16 @@ static int do_end_io_bio(struct multipath *m, struct bio *clone, if (!test_bit(MPATHF_QUEUE_IO, &m->flags)) queue_work(kmultipathd, &m->process_queued_bios); - return DM_ENDIO_INCOMPLETE; -} - -static int multipath_end_io_bio(struct dm_target *ti, struct bio *clone, int error) -{ - struct multipath *m = ti->private; - struct dm_mpath_io *mpio = get_mpio_from_bio(clone); - struct pgpath *pgpath; - struct path_selector *ps; - int r; - - BUG_ON(!mpio); - - r = do_end_io_bio(m, clone, error, mpio); - pgpath = mpio->pgpath; + error = DM_ENDIO_INCOMPLETE; +done: if (pgpath) { - ps = &pgpath->pg->ps; + struct path_selector *ps = &pgpath->pg->ps; + if (ps->type->end_io) ps->type->end_io(ps, &pgpath->path, mpio->nr_bytes); } - return r; + return error; } /* -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 11/15] fs-writeback: move wbc_to_write_flags out of line
This way writeback.h doesn't need to pull in blk_types.h. Signed-off-by: Christoph Hellwig --- fs/fs-writeback.c | 13 + include/linux/writeback.h | 11 +-- 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 63ee2940775c..51b65591afe1 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -59,6 +59,19 @@ struct wb_writeback_work { struct wb_completion *done; /* set if the caller waits */ }; +#ifdef CONFIG_BLOCK +int wbc_to_write_flags(struct writeback_control *wbc) +{ + if (wbc->sync_mode == WB_SYNC_ALL) + return REQ_SYNC; + else if (wbc->for_kupdate || wbc->for_background) + return REQ_BACKGROUND; + + return 0; +} +EXPORT_SYMBOL_GPL(wbc_to_write_flags); +#endif + /* * If one wants to wait for one or more wb_writeback_works, each work's * ->done should be set to a wb_completion defined using the following diff --git a/include/linux/writeback.h b/include/linux/writeback.h index d5815794416c..c5ae6ef9b317 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -9,7 +9,6 @@ #include #include #include -#include struct bio; @@ -103,15 +102,7 @@ struct writeback_control { #endif }; -static inline int wbc_to_write_flags(struct writeback_control *wbc) -{ - if (wbc->sync_mode == WB_SYNC_ALL) - return REQ_SYNC; - else if (wbc->for_kupdate || wbc->for_background) - return REQ_BACKGROUND; - - return 0; -} +int wbc_to_write_flags(struct writeback_control *wbc); /* * A wb_domain represents a domain that wb's (bdi_writeback's) belong to -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 01/15] nvme-lightnvm: use blk_execute_rq in nvme_nvm_submit_user_cmd
Instead of reinventing it poorly. Signed-off-by: Christoph Hellwig Reviewed-by: Javier González --- drivers/nvme/host/lightnvm.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index f5df78ed1e10..f3885b5e56bd 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -571,13 +571,6 @@ static struct nvm_dev_ops nvme_nvm_dev_ops = { .max_phys_sect = 64, }; -static void nvme_nvm_end_user_vio(struct request *rq, int error) -{ - struct completion *waiting = rq->end_io_data; - - complete(waiting); -} - static int nvme_nvm_submit_user_cmd(struct request_queue *q, struct nvme_ns *ns, struct nvme_nvm_command *vcmd, @@ -608,7 +601,6 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, rq->timeout = timeout ? timeout : ADMIN_TIMEOUT; rq->cmd_flags &= ~REQ_FAILFAST_DRIVER; - rq->end_io_data = &wait; if (ppa_buf && ppa_len) { ppa_list = dma_pool_alloc(dev->dma_pool, GFP_KERNEL, &ppa_dma); @@ -662,9 +654,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, } submit: - blk_execute_rq_nowait(q, NULL, rq, 0, nvme_nvm_end_user_vio); - - wait_for_completion_io(&wait); + blk_execute_rq(q, NULL, rq, 0); if (nvme_req(rq)->flags & NVME_REQ_CANCELLED) ret = -EINTR; -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] dedicated error codes for the block layer
This series introduces a new blk_status_t error code type for the block layer so that we can have tigher control and explicit semantics for block layer errors. All but the last three patches are cleanups that lead to the new type. The series it mostly limited to the block layer and drivers, and touching file systems a little bit. The only major exception is btrfs, which does funny things with bios and thus sees a larger amount of propagation of the new blk_status_t. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH 09/15] dm: don't return errnos from ->map
Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig --- drivers/md/dm-crypt.c | 4 ++-- drivers/md/dm-flakey.c| 4 ++-- drivers/md/dm-integrity.c | 12 ++-- drivers/md/dm-log-writes.c| 4 ++-- drivers/md/dm-mpath.c | 13 ++--- drivers/md/dm-raid1.c | 6 +++--- drivers/md/dm-snap.c | 8 drivers/md/dm-target.c| 2 +- drivers/md/dm-verity-target.c | 6 +++--- drivers/md/dm-zero.c | 4 ++-- drivers/md/dm.c | 16 +++- 11 files changed, 46 insertions(+), 33 deletions(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index ebf9e72d479b..f4b51809db21 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2795,10 +2795,10 @@ static int crypt_map(struct dm_target *ti, struct bio *bio) * and is aligned to this size as defined in IO hints. */ if (unlikely((bio->bi_iter.bi_sector & ((cc->sector_size >> SECTOR_SHIFT) - 1)) != 0)) - return -EIO; + return DM_MAPIO_KILL; if (unlikely(bio->bi_iter.bi_size & (cc->sector_size - 1))) - return -EIO; + return DM_MAPIO_KILL; io = dm_per_bio_data(bio, cc->per_bio_data_size); crypt_io_init(io, cc, bio, dm_target_offset(ti, bio->bi_iter.bi_sector)); diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c index 13305a182611..e8f093b323ce 100644 --- a/drivers/md/dm-flakey.c +++ b/drivers/md/dm-flakey.c @@ -321,7 +321,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) if (bio_data_dir(bio) == READ) { if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, &fc->flags) && !test_bit(ERROR_WRITES, &fc->flags)) - return -EIO; + return DM_MAPIO_KILL; goto map_bio; } @@ -349,7 +349,7 @@ static int flakey_map(struct dm_target *ti, struct bio *bio) /* * By default, error all I/O. */ - return -EIO; + return DM_MAPIO_KILL; } map_bio: diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index c7f7c8d76576..ee78fb471229 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -1352,13 +1352,13 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) DMERR("Too big sector number: 0x%llx + 0x%x > 0x%llx", (unsigned long long)dio->range.logical_sector, bio_sectors(bio), (unsigned long long)ic->provided_data_sectors); - return -EIO; + return DM_MAPIO_KILL; } if (unlikely((dio->range.logical_sector | bio_sectors(bio)) & (unsigned)(ic->sectors_per_block - 1))) { DMERR("Bio not aligned on %u sectors: 0x%llx, 0x%x", ic->sectors_per_block, (unsigned long long)dio->range.logical_sector, bio_sectors(bio)); - return -EIO; + return DM_MAPIO_KILL; } if (ic->sectors_per_block > 1) { @@ -1368,7 +1368,7 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) if (unlikely((bv.bv_offset | bv.bv_len) & ((ic->sectors_per_block << SECTOR_SHIFT) - 1))) { DMERR("Bio vector (%u,%u) is not aligned on %u-sector boundary", bv.bv_offset, bv.bv_len, ic->sectors_per_block); - return -EIO; + return DM_MAPIO_KILL; } } } @@ -1383,18 +1383,18 @@ static int dm_integrity_map(struct dm_target *ti, struct bio *bio) wanted_tag_size *= ic->tag_size; if (unlikely(wanted_tag_size != bip->bip_iter.bi_size)) { DMERR("Invalid integrity data size %u, expected %u", bip->bip_iter.bi_size, wanted_tag_size); - return -EIO; + return DM_MAPIO_KILL; } } } else { if (unlikely(bip != NULL)) { DMERR("Unexpected integrity data when using internal hash"); - return -EIO; + return DM_MAPIO_KILL; } } if (unlikely(ic->mode == 'R') && unlikely(dio->write)) - return -EIO; + return DM_MAPIO_KILL;
[dm-devel] [PATCH 12/15] block: merge blk_types.h into bio.h
We've cleaned up our headers sufficiently that we don't need this split anymore. Signed-off-by: Christoph Hellwig --- block/blk-wbt.c| 2 +- drivers/target/target_core_pscsi.c | 2 +- include/linux/bio.h| 307 +++- include/linux/blk_types.h | 315 - include/linux/swap.h | 2 +- 5 files changed, 308 insertions(+), 320 deletions(-) delete mode 100644 include/linux/blk_types.h diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 17676f4d7fd1..ad355fcda96c 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -19,7 +19,7 @@ * */ #include -#include +#include #include #include #include diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c index 3e4abb13f8ea..1fd365ed83d6 100644 --- a/drivers/target/target_core_pscsi.c +++ b/drivers/target/target_core_pscsi.c @@ -27,7 +27,7 @@ #include #include #include -#include +#include #include #include #include diff --git a/include/linux/bio.h b/include/linux/bio.h index d1b04b0e99cf..f6a7c9e65ea0 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -25,10 +25,18 @@ #ifdef CONFIG_BLOCK +#include +#include #include -/* struct bio, bio_vec and BIO_* flags are defined in blk_types.h */ -#include +struct bio_set; +struct bio; +struct bio_integrity_payload; +struct page; +struct block_device; +struct io_context; +struct cgroup_subsys_state; +typedef void (bio_end_io_t) (struct bio *); #define BIO_DEBUG @@ -40,6 +48,301 @@ #define BIO_MAX_PAGES 256 +struct blk_issue_stat { + u64 stat; +}; + +/* + * main unit of I/O for the block layer and lower layers (ie drivers and + * stacking drivers) + */ +struct bio { + struct bio *bi_next; /* request queue link */ + struct block_device *bi_bdev; + int bi_error; + unsigned intbi_opf; /* bottom bits req flags, +* top bits REQ_OP. Use +* accessors. +*/ + unsigned short bi_flags; /* status, etc and bvec pool number */ + unsigned short bi_ioprio; + + struct bvec_iterbi_iter; + + /* Number of segments in this BIO after +* physical address coalescing is performed. +*/ + unsigned intbi_phys_segments; + + /* +* To keep track of the max segment size, we account for the +* sizes of the first and last mergeable segments in this bio. +*/ + unsigned intbi_seg_front_size; + unsigned intbi_seg_back_size; + + atomic_t__bi_remaining; + + bio_end_io_t*bi_end_io; + + void*bi_private; +#ifdef CONFIG_BLK_CGROUP + /* +* Optional ioc and css associated with this bio. Put on bio +* release. Read comment on top of bio_associate_current(). +*/ + struct io_context *bi_ioc; + struct cgroup_subsys_state *bi_css; +#ifdef CONFIG_BLK_DEV_THROTTLING_LOW + void*bi_cg_private; + struct blk_issue_stat bi_issue_stat; +#endif +#endif + union { +#if defined(CONFIG_BLK_DEV_INTEGRITY) + struct bio_integrity_payload *bi_integrity; /* data integrity */ +#endif + }; + + unsigned short bi_vcnt;/* how many bio_vec's */ + + /* +* Everything starting with bi_max_vecs will be preserved by bio_reset() +*/ + + unsigned short bi_max_vecs;/* max bvl_vecs we can hold */ + + atomic_t__bi_cnt; /* pin count */ + + struct bio_vec *bi_io_vec; /* the actual vec list */ + + struct bio_set *bi_pool; + + /* +* We can inline a number of vecs at the end of the bio, to avoid +* double allocations for a small number of bio_vecs. This member +* MUST obviously be kept at the very end of the bio. +*/ + struct bio_vec bi_inline_vecs[0]; +}; + +#define BIO_RESET_BYTESoffsetof(struct bio, bi_max_vecs) + +/* + * bio flags + */ +#define BIO_SEG_VALID 1 /* bi_phys_segments valid */ +#define BIO_CLONED 2 /* doesn't own data */ +#define BIO_BOUNCED3 /* bio is a bounce bio */ +#define BIO_USER_MAPPED 4 /* contains user pages */ +#define BIO_NULL_MAPPED 5 /* contains invalid user pages */ +#define BIO_QUIET 6 /* Make BIO Quiet */ +#define BIO_CHAIN 7 /* chained bio, ->bi_remaining in effect */ +#define BIO_REFFED 8 /* bio has elevated ->bi_cnt */ +#define BIO_THROTTLED 9 /* This bio has already been subjected to +* throttling rule
[dm-devel] [PATCH 06/15] fs: simplify dio_bio_complete
Only read bio->bi_error once in the common path. Signed-off-by: Christoph Hellwig --- fs/direct-io.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 04247a6c3f73..bb711e4b86c2 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -477,13 +477,12 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) { struct bio_vec *bvec; unsigned i; - int err; + int err = bio->bi_error; - if (bio->bi_error) + if (err) dio->io_error = -EIO; if (dio->is_async && dio->op == REQ_OP_READ && dio->should_dirty) { - err = bio->bi_error; bio_check_pages_dirty(bio); /* transfers ownership */ } else { bio_for_each_segment_all(bvec, bio, i) { @@ -494,7 +493,6 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio) set_page_dirty_lock(page); put_page(page); } - err = bio->bi_error; bio_put(bio); } return err; -- 2.11.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dm-crypt IV generation (summary)
2017-04-07 8:12 GMT+02:00 Herbert Xu : > On Fri, Mar 10, 2017 at 02:44:26PM +0100, Ondrej Mosnacek wrote: >> >> ISSUES: >> a) The 'keycount' parameter. >> In order to support multi-key modes from Loop-AES, >> dm-crypt accepts a keycount parameter which, if it != 1, causes >> consecutive sectors to be encrypted with a different key. This >> parameter can be specified with any of the cipher modes, which makes >> porting the whole scale of modes supported by dm-crypt to crypto API >> rather messy, since a lot of dm-crypt specific stuff needs to be moved >> into the crypto drivers. > > Actually I think this one can probably easily handled in the crypto > layer. All we need is to add a multikey template that sits on top > of an underlying IV generator. The multikey instance can then accept > a key length that is a multiple of the underlying key length. I thought of that, too, but unfortunately with TCW the key structure is: | KEY 1 | KEY 2 | ... | KEY n | IV SEED (size = IV size) | WHITENING (size = 16 bytes) | So part of the key needs to be processed before it is split into multiple keys. Also with the LMK mode, there is a similar optional key appendix, which is of the same size as the other subkeys. >> b) New AEAD functionality; random IV generator. >> The soon-to-be-added AEAD functionality in dm-crypt >> introduces a new IV generator that generates IVs randomly and stores >> them as sector metadata. This means IV generation cannot be handled >> solely in the driver. Also, additional AEAD implementation of IV >> generators would be eventually needed. > > Again I don't see the problem here. IV generators are supposed > to return the IV to the caller so that it can be transmitted. > > For example, the IV generated in IPsec is explicitly transmitted. > Here we just need to store the IV. My point is that it doesn't make much sense to have a crypto API alg that calls get_random_bytes() as part of its implementation. IMHO, that might tempt HW drivers to replace it with some crappy alternatives, which wouldn't be good... Also, how would we test such alg with testmgr? I would say the best solution in this case would be to treat the 'random' IV mode as a special case in dm-crypt: the IVs would be generated in dm-crypt and passed to a special template (which would have a similar interface as the other IV generation templates) that would just pass the generated IVs to underlying skciphers instead of generating them from the sequence number. This template would need to provide some way to pass the IVs, e.g. by prepending them to the plaintext/ciphertext. Cheers, Ondrej -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 12/15] block: merge blk_types.h into bio.h
On Thu, 2017-05-18 at 15:18 +0200, Christoph Hellwig wrote: > We've cleaned up our headers sufficiently that we don't need this split > anymore. Hello Christoph, Request-based drivers need the structure definitions from and the type definitions from but do not need the definition of struct bio. Have you considered to remove #include from file include/linux/blkdev.h? Do you think that would help to reduce the kernel build time? Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 00/10] dm: zoned block device support
From: Damien Le Moal [ Note: I'm resending this e-mail on Damien's behalf because Damien's ] [ e-mails apparently do not reach the dm-devel mailing list - Bart. ] Mike, On 5/18/17 03:54, Mike Snitzer wrote: > On Tue, May 16 2017 at 4:03pm -0400, > Mike Snitzer wrote: > >> I see quite a few issues with this patchset (only gotten through patches >> 1 - 6). I'll work through it in more detail and share my >> feedback/revisions tomorrow. Mostly just cleanups, renames, etc. But >> "the fun" is obviously once I get to the last patch. > > FYI, couldn't get to this like I planned. And I'm taking some time off, > won't get back to this until next Tuesday (5/23). To be clear, the > things I noticed in the preliminary patches were very benign, but do > need cleaning up. Thank you for the review. Let me know the changes you would like to see and I will send an updated series. > I have every intention of getting this reviewed and staged for 4.13. That's great. Thanks. > But would be useful to understand: > 1) who will be regression testing this target once it is merged? Myself, Bart, and all other members of my team will be involved in maintaining and testing this. It is critical for us as an SMR disk vendor that those disk are supported correctly in Linux. So we will maintain and regression test all aspects of the zoned block device support constantly. > 2) what is needed to test it? (I assume SMR drives?) Yes, SMR drives, but not necessarily physical ones. We are working on adding ZBC support to the SCSI target (that is missing). With that, we are planning to create a tcm (or tcmu) driver to emulate a host-aware or host-managed disk for testing, with a regular disk or file as back-end storage. This was also requested by file system maintainers (BtrFS) to allow testing of zoned block device support even without physical SMR disks available. Since zoned block device support spans the entire block I/O stack (from block layer API down to LLD, with device mapper and SCSI/libata in the middle) we are also starting to design new test cases for the newly released blktests infrastructure. This will allow automated testing, including device mapper targets that supports zoned block devices. Ideally, we will try to release everything for inclusion in 4.13, together with the device mapper support. But all the test parts may get spread over one or two release cycles. But again, the goal is to have a comprehensive automated test suite for zoned block device, similar to what is available for regular block devices. Best regards. -- Damien Le Moal, Western Digital Research -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v7 2/2] dm ioctl: add a device mapper ioctl function.
Add a dm_ioctl_cmd to issue the equivalent of a DM ioctl call in kernel. Signed-off-by: Enric Balletbo i Serra --- drivers/md/dm-ioctl.c | 45 +++ include/linux/device-mapper.h | 6 ++ 2 files changed, 51 insertions(+) diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index 4951bf9..d79a41a 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1954,3 +1954,48 @@ int dm_copy_name_and_uuid(struct mapped_device *md, char *name, char *uuid) return r; } + +/** + * dm_ioctl_cmd - Device mapper ioctl's + * @command: ioctl command + * @param: Pointer to device mapped ioctl struct + */ +int dm_ioctl_cmd(uint command, struct dm_ioctl *param) +{ + int r = 0; + int ioctl_flags; + unsigned int cmd; + ioctl_fn fn = NULL; + size_t input_param_size; + + if (_IOC_TYPE(command) != DM_IOCTL) + return -ENOTTY; + + cmd = _IOC_NR(command); + + /* +* Nothing more to do for the version command. +*/ + if (cmd == DM_VERSION_CMD) + return 0; + + fn = lookup_ioctl(cmd, &ioctl_flags); + if (!fn) { + DMWARN("dm_ioctl: unknown command 0x%x", command); + return -ENOTTY; + } + + input_param_size = param->data_size; + r = validate_params(cmd, param); + if (r) + return r; + + param->data_size = sizeof(*param); + r = fn(param, input_param_size); + + if (unlikely(param->flags & DM_BUFFER_FULL_FLAG) && + unlikely(ioctl_flags & IOCTL_FLAGS_NO_PARAMS)) + DMERR("ioctl %d tried to output some data but has IOCTL_FLAGS_NO_PARAMS set", cmd); + + return r; +} diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 81272c8..efa83ff 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -12,6 +12,7 @@ #include #include #include +#include struct dm_dev; struct dm_target; @@ -444,6 +445,11 @@ int dm_noflush_suspending(struct dm_target *ti); void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors); union map_info *dm_get_rq_mapinfo(struct request *rq); +/* + * Device mapper ioctl function. + */ +int dm_ioctl_cmd(unsigned int command, struct dm_ioctl *param); + struct queue_limits *dm_get_queue_limits(struct mapped_device *md); /* -- 2.9.3 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH v7 1/2] init: add support to directly boot to a mapped device
From: Will Drewry Add a dm= kernel parameter modeled after the md= parameter from do_mounts_md. It allows for device-mapper targets to be configured at boot time for use early in the boot process (as the root device or otherwise). Signed-off-by: Will Drewry Signed-off-by: Kees Cook [rework to use dm_ioctl calls] Signed-off-by: Enric Balletbo i Serra --- Documentation/admin-guide/kernel-parameters.rst | 1 + Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/device-mapper/dm-boot.txt | 65 init/Makefile | 1 + init/do_mounts.c| 1 + init/do_mounts.h| 10 + init/do_mounts_dm.c | 459 7 files changed, 540 insertions(+) create mode 100644 Documentation/device-mapper/dm-boot.txt create mode 100644 init/do_mounts_dm.c diff --git a/Documentation/admin-guide/kernel-parameters.rst b/Documentation/admin-guide/kernel-parameters.rst index 33b90e27..9de2984 100644 --- a/Documentation/admin-guide/kernel-parameters.rst +++ b/Documentation/admin-guide/kernel-parameters.rst @@ -93,6 +93,7 @@ parameter is applicable:: BLACKFIN Blackfin architecture is enabled. CLK Common clock infrastructure is enabled. CMA Contiguous Memory Area support is enabled. + DM Device mapper support is enabled. DRM Direct Rendering Management support is enabled. DYNAMIC_DEBUG Build in debug messages and enable them at runtime EDD BIOS Enhanced Disk Drive Services (EDD) is enabled diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 33a3b54..4415254 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -830,6 +830,9 @@ dis_ucode_ldr [X86] Disable the microcode loader. + dm= [DM] Allows early creation of a device-mapper device. + See Documentation/device-mapper/boot.txt. + dma_debug=off If the kernel is compiled with DMA_API_DEBUG support, this option disables the debugging code at boot. diff --git a/Documentation/device-mapper/dm-boot.txt b/Documentation/device-mapper/dm-boot.txt new file mode 100644 index 000..50f08ec --- /dev/null +++ b/Documentation/device-mapper/dm-boot.txt @@ -0,0 +1,65 @@ +Boot time creation of mapped devices + + +It is possible to configure a device mapper device to act as the root +device for your system in two ways. + +The first is to build an initial ramdisk which boots to a minimal +userspace which configures the device, then pivot_root(8) in to it. + +The second is to possible when the device-mapper and any targets are +compiled into the kernel (not a module), one or more device-mappers may +be created and used as the root device at boot time with the parameters +given with the boot line dm=... + +The format is specified as a simple string of data separated by commas and +optionally semi-colons, where: + - a comma is used to separate fields like name, uuid, flags and table (specifies + one device) + - a semi-colon is used to separate devices. + +So the format will look like this: + + dm=,,,[,+][;,,,[,+]]+ + +Where, + ::= The device name. + ::= ---- | "" +::= "ro" | "rw" +::= + ::= "verity" | "bootcache" | ... + +The dm line may be as normal when using the dmsetup tool when using the +--bootformat argument. + +Unless renamed by udev, the device node created will be dm-0 as the +first minor number for the device-mapper is used during early creation. + +Examples + +An example of booting to a linear array made up of user-mode linux block +devices: + + dm="lroot,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" \ + root=/dev/dm-0 + +This will boot to a rw dm-linear target of 8192 sectors split across two +block devices identified by their major:minor numbers. After boot, udev +will rename this target to /dev/mapper/lroot (depending on the rules). +No uuid was assigned. + +An example of multiple device-mappers, with the dm="..." contents shown +here split on multiple lines for readability: + +vboot,,ro, + 0 1768000 bootcache +aa55b119-2a47-8c45-946a-5ac57765011f+1 +76e9be054b15884a9fa85973e9cb274c93afadb6 +1768000 10 23 2; +vroot,,ro, + 0 1740800 verity 254:0 254:0 1740800 sha1 +76e9be054b15884a9fa85973e9cb274c93afadb6 +5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe; +vram,,rw, + 0 32768 linear 1:0 0, + 32768 32768 linear 1:1 0 diff --git a/init/Makefile b/init/Makefile index c4fb455..30424d7 100644 --- a/init/Makefile +++ b/init/Makefile @@ -20,6 +20,7 @@ mounts-y
[dm-devel] [PATCH v7 0/2] dm: boot a mapped device without an initramfs
Dear all, So here is a new version of the patches to be reviewed, this time as suggested by Alasdair the patches are reworked to match with the new dmsetup bootformat feature [1]. These patches are not reviewed yet but the format was discussed in the IRC and was suggested to send the kernel patches in parallel. Changes since v6: - Add a new function to issue the equivalent of a DM ioctl programatically. - Use the new ioctl interface to create the devices. - Use a comma-delimited and semi-colon delimited dmsetup-like commands. Changes since v5: - https://www.redhat.com/archives/dm-devel/2016-February/msg00112.html [1] https://www.redhat.com/archives/linux-lvm/2017-May/msg00047.html Wating for your feedback, Enric Balletbo i Serra (1): dm ioctl: add a device mapper ioctl function. Will Drewry (1): init: add support to directly boot to a mapped device Documentation/admin-guide/kernel-parameters.rst | 1 + Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/device-mapper/dm-boot.txt | 65 drivers/md/dm-ioctl.c | 45 +++ include/linux/device-mapper.h | 6 + init/Makefile | 1 + init/do_mounts.c| 1 + init/do_mounts.h| 10 + init/do_mounts_dm.c | 459 9 files changed, 591 insertions(+) create mode 100644 Documentation/device-mapper/dm-boot.txt create mode 100644 init/do_mounts_dm.c -- 2.9.3 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools: enable libdmmp installation in alternative directory
On Mon, May 15, 2017 at 10:33:25PM +0200, Martin Wilck wrote: Fine by me. -Ben > Introduce a new Makefile variable, usr_prefix, to be used for > libdmmp and the associated pkgconfig file. > > Some distributions install those libraries which are necessary > for booting (and mounting /usr) in a different location (/lib or > /lib64) than other libraries (/usr/lib or /usr/lib64). On such > distributions, installation to the different paths can be achieved > by setting "usr_prefix=/usr". This will affect only libdmmp at this > time, as all other libaries in the multipath-tools package may be > relevant for booting. > > For distributions on which /lib and /lib64 are just symlinks to their > /usr counterparts, nothing changes. > > Comments are welcome. > > Signed-off-by: Martin Wilck > --- > Makefile.inc | 4 +++- > libdmmp/Makefile | 11 ++- > 2 files changed, 9 insertions(+), 6 deletions(-) > > diff --git a/Makefile.inc b/Makefile.inc > index ad55aa10..ad08f307 100644 > --- a/Makefile.inc > +++ b/Makefile.inc > @@ -47,6 +47,7 @@ endif > > prefix = > exec_prefix = $(prefix) > +usr_prefix = $(prefix) > bindir = $(exec_prefix)/sbin > libudevdir = $(prefix)/$(SYSTEMDPATH)/udev > udevrulesdir = $(libudevdir)/rules.d > @@ -55,6 +56,7 @@ man8dir = $(prefix)/usr/share/man/man8 > man5dir = $(prefix)/usr/share/man/man5 > man3dir = $(prefix)/usr/share/man/man3 > syslibdir= $(prefix)/$(LIB) > +usrlibdir= $(usr_prefix)/$(LIB) > libdir = $(prefix)/$(LIB)/multipath > unitdir = $(prefix)/$(SYSTEMDPATH)/systemd/system > mpathpersistdir = $(TOPDIR)/libmpathpersist > @@ -62,7 +64,7 @@ mpathcmddir = $(TOPDIR)/libmpathcmd > thirdpartydir= $(TOPDIR)/third-party > libdmmpdir = $(TOPDIR)/libdmmp > includedir = $(prefix)/usr/include > -pkgconfdir = $(prefix)/$(LIB)/pkgconfig > +pkgconfdir = $(usrlibdir)/pkgconfig > > GZIP = gzip -9 -c > RM = rm -f > diff --git a/libdmmp/Makefile b/libdmmp/Makefile > index 1c5329aa..908e294b 100644 > --- a/libdmmp/Makefile > +++ b/libdmmp/Makefile > @@ -27,15 +27,16 @@ $(LIBS): $(OBJS) > $(LN) $@ $(DEVLIB) > > install: > - $(INSTALL_PROGRAM) -m 755 $(LIBS) $(DESTDIR)$(syslibdir)/$(LIBS) > + mkdir -p $(DESTDIR)$(usrlibdir) > + $(INSTALL_PROGRAM) -m 755 $(LIBS) $(DESTDIR)$(usrlibdir)/$(LIBS) > $(INSTALL_PROGRAM) -m 644 -D \ > $(HEADERS) $(DESTDIR)$(includedir)/$(HEADERS) > - $(LN) $(LIBS) $(DESTDIR)$(syslibdir)/$(DEVLIB) > + $(LN) $(LIBS) $(DESTDIR)$(usrlibdir)/$(DEVLIB) > $(INSTALL_PROGRAM) -m 644 -D \ > $(PKGFILE).in $(DESTDIR)$(pkgconfdir)/$(PKGFILE) > perl -i -pe 's|__VERSION__|$(LIBDMMP_VERSION)|g' \ > $(DESTDIR)$(pkgconfdir)/$(PKGFILE) > - perl -i -pe 's|__LIBDIR__|$(syslibdir)|g' \ > + perl -i -pe 's|__LIBDIR__|$(usrlibdir)|g' \ > $(DESTDIR)$(pkgconfdir)/$(PKGFILE) > perl -i -pe 's|__INCLUDEDIR__|$(includedir)|g' \ > $(DESTDIR)$(pkgconfdir)/$(PKGFILE) > @@ -46,9 +47,9 @@ install: > done > > uninstall: > - $(RM) $(DESTDIR)$(syslibdir)/$(LIBS) > + $(RM) $(DESTDIR)$(usrlibdir)/$(LIBS) > $(RM) $(DESTDIR)$(includedir)/$(HEADERS) > - $(RM) $(DESTDIR)$(syslibdir)/$(DEVLIB) > + $(RM) $(DESTDIR)$(usrlibdir)/$(DEVLIB) > @for file in $(DESTDIR)$(man3dir)/dmmp_*; do \ > $(RM) $$file; \ > done > -- > 2.12.2 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 5/7] Move the declaration of mpath_mx_alloc_len to a header file
On Wed, May 17, 2017 at 08:43:07AM -0700, Bart Van Assche wrote: > This allows the compiler to verify consistency of declaration and > definition. I get why you did this, but I'm not totally happy with where the extern declaration is. libmpathpersist is actually a public library, unlike libmultipath, and mpath_persist.h is that header for that library. As such, I'd rather not add things to it that users aren't supposed to mess with. So, is your intention to let users set the max alloc length. If so, we should probably give them a function, or at the very least some comments telling them what this variable does. If we only want the mpathpersist program to be able to change this, then we probably should probably put the declaration in a different header file, perhaps mpath_pr_ioctl.c. Now that I'm looking at this, I see that this option (-l) isn't even documented in the mpathpersist manpage. So right now this seems likely a completely unused feature. I guess since it exists, I'd lean towards actually documenting it and putting it in the libmpathpersist API in a more useful way. -Ben > > Signed-off-by: Bart Van Assche > --- > libmpathpersist/mpath_persist.h | 3 +++ > mpathpersist/main.c | 1 - > multipathd/main.c | 2 -- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/libmpathpersist/mpath_persist.h b/libmpathpersist/mpath_persist.h > index 79de5b5b..3faa8c9e 100644 > --- a/libmpathpersist/mpath_persist.h > +++ b/libmpathpersist/mpath_persist.h > @@ -81,6 +81,9 @@ extern "C" { > > > > +extern unsigned int mpath_mx_alloc_len; > + > + > > struct prin_readdescr > { > diff --git a/mpathpersist/main.c b/mpathpersist/main.c > index 2e0aba3c..9de52b98 100644 > --- a/mpathpersist/main.c > +++ b/mpathpersist/main.c > @@ -40,7 +40,6 @@ void mpath_print_transport_id(struct prin_fulldescr *fdesc); > int construct_transportid(const char * inp, struct transportid transid[], > int num_transportids); > > int logsink; > -unsigned int mpath_mx_alloc_len; > struct config *multipath_conf; > > struct config *get_multipath_config(void) > diff --git a/multipathd/main.c b/multipathd/main.c > index b167cb4c..81c76cab 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -104,8 +104,6 @@ struct mpath_event_param > struct multipath *mpp; > }; > > -unsigned int mpath_mx_alloc_len; > - > int logsink; > int verbosity; > int bindings_read_only; > -- > 2.12.2 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 5/7] Move the declaration of mpath_mx_alloc_len to a header file
On Thu, 2017-05-18 at 15:06 -0500, Benjamin Marzinski wrote: > On Wed, May 17, 2017 at 08:43:07AM -0700, Bart Van Assche wrote: > > This allows the compiler to verify consistency of declaration and > > definition. > > Signed-off-by: Bart Van Assche > > --- > > libmpathpersist/mpath_persist.h | 3 +++ > > mpathpersist/main.c | 1 - > > multipathd/main.c | 2 -- > > 3 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/libmpathpersist/mpath_persist.h > > b/libmpathpersist/mpath_persist.h > > index 79de5b5b..3faa8c9e 100644 > > --- a/libmpathpersist/mpath_persist.h > > +++ b/libmpathpersist/mpath_persist.h > > @@ -81,6 +81,9 @@ extern "C" { > > > > > > > > +extern unsigned int mpath_mx_alloc_len; > > + > > + > > > > struct prin_readdescr > > { > > diff --git a/mpathpersist/main.c b/mpathpersist/main.c > > index 2e0aba3c..9de52b98 100644 > > --- a/mpathpersist/main.c > > +++ b/mpathpersist/main.c > > @@ -40,7 +40,6 @@ void mpath_print_transport_id(struct prin_fulldescr > > *fdesc); > > int construct_transportid(const char * inp, struct transportid transid[], > > int num_transportids); > > > > int logsink; > > -unsigned int mpath_mx_alloc_len; > > struct config *multipath_conf; > > > > struct config *get_multipath_config(void) > > diff --git a/multipathd/main.c b/multipathd/main.c > > index b167cb4c..81c76cab 100644 > > --- a/multipathd/main.c > > +++ b/multipathd/main.c > > @@ -104,8 +104,6 @@ struct mpath_event_param > > struct multipath *mpp; > > }; > > > > -unsigned int mpath_mx_alloc_len; > > - > > int logsink; > > int verbosity; > > int bindings_read_only; > > I get why you did this, but I'm not totally happy with where the extern > declaration is. libmpathpersist is actually a public library, unlike > libmultipath, and mpath_persist.h is that header for that library. As > such, I'd rather not add things to it that users aren't supposed to mess > with. So, is your intention to let users set the max alloc length. If > so, we should probably give them a function, or at the very least some > comments telling them what this variable does. If we only want the > mpathpersist program to be able to change this, then we probably > should probably put the declaration in a different header file, perhaps > mpath_pr_ioctl.c. Now that I'm looking at this, I see that this option > (-l) isn't even documented in the mpathpersist manpage. So right now > this seems likely a completely unused feature. I guess since it exists, > I'd lean towards actually documenting it and putting it in the > libmpathpersist API in a more useful way. (changed top-posting into bottom-posting) Hello Ben, It was not my intention to make the mpath_mx_alloc_len variable accessible to users of the libmpathpersist library. From what I can see in libmpathpersist/Makefile the only public header file of the libmpathpersist library is mpath_persist.h. Would you agree to move that declaration into mpathpr.h? Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH 5/7] Move the declaration of mpath_mx_alloc_len to a header file
On Thu, May 18, 2017 at 08:36:10PM +, Bart Van Assche wrote: > On Thu, 2017-05-18 at 15:06 -0500, Benjamin Marzinski wrote: > > On Wed, May 17, 2017 at 08:43:07AM -0700, Bart Van Assche wrote: > > > This allows the compiler to verify consistency of declaration and > > > definition. > > > Signed-off-by: Bart Van Assche > > > --- > > > libmpathpersist/mpath_persist.h | 3 +++ > > > mpathpersist/main.c | 1 - > > > multipathd/main.c | 2 -- > > > 3 files changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/libmpathpersist/mpath_persist.h > > > b/libmpathpersist/mpath_persist.h > > > index 79de5b5b..3faa8c9e 100644 > > > --- a/libmpathpersist/mpath_persist.h > > > +++ b/libmpathpersist/mpath_persist.h > > > @@ -81,6 +81,9 @@ extern "C" { > > > > > > > > > > > > +extern unsigned int mpath_mx_alloc_len; > > > + > > > + > > > > > > struct prin_readdescr > > > { > > > diff --git a/mpathpersist/main.c b/mpathpersist/main.c > > > index 2e0aba3c..9de52b98 100644 > > > --- a/mpathpersist/main.c > > > +++ b/mpathpersist/main.c > > > @@ -40,7 +40,6 @@ void mpath_print_transport_id(struct prin_fulldescr > > > *fdesc); > > > int construct_transportid(const char * inp, struct transportid > > > transid[], int num_transportids); > > > > > > int logsink; > > > -unsigned int mpath_mx_alloc_len; > > > struct config *multipath_conf; > > > > > > struct config *get_multipath_config(void) > > > diff --git a/multipathd/main.c b/multipathd/main.c > > > index b167cb4c..81c76cab 100644 > > > --- a/multipathd/main.c > > > +++ b/multipathd/main.c > > > @@ -104,8 +104,6 @@ struct mpath_event_param > > > struct multipath *mpp; > > > }; > > > > > > -unsigned int mpath_mx_alloc_len; > > > - > > > int logsink; > > > int verbosity; > > > int bindings_read_only; > > > > I get why you did this, but I'm not totally happy with where the extern > > declaration is. libmpathpersist is actually a public library, unlike > > libmultipath, and mpath_persist.h is that header for that library. As > > such, I'd rather not add things to it that users aren't supposed to mess > > with. So, is your intention to let users set the max alloc length. If > > so, we should probably give them a function, or at the very least some > > comments telling them what this variable does. If we only want the > > mpathpersist program to be able to change this, then we probably > > should probably put the declaration in a different header file, perhaps > > mpath_pr_ioctl.c. Now that I'm looking at this, I see that this option > > (-l) isn't even documented in the mpathpersist manpage. So right now > > this seems likely a completely unused feature. I guess since it exists, > > I'd lean towards actually documenting it and putting it in the > > libmpathpersist API in a more useful way. > > (changed top-posting into bottom-posting) > > Hello Ben, > > It was not my intention to make the mpath_mx_alloc_len variable accessible > to users of the libmpathpersist library. From what I can see in > libmpathpersist/Makefile the only public header file of the libmpathpersist > library is mpath_persist.h. Would you agree to move that declaration into > mpathpr.h? Sure -Ben > > Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] libmultipath: add comment about resuming
The reason for the second resume in my commit "libmultipath: fix suspended devs from failed reloads" is not obvious from the multipath code, so add a comment. Signed-off-by: Benjamin Marzinski --- libmultipath/devmapper.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c index 69b634b..ee83e0f 100644 --- a/libmultipath/devmapper.c +++ b/libmultipath/devmapper.c @@ -399,6 +399,9 @@ int dm_addmap_reload(struct multipath *mpp, char *params, int flush) if (r) return r; + /* If the resume failed, dm will leave the device suspended, and +* drop the new table, so doing a second resume will try using +* the original table */ if (dm_is_suspended(mpp->alias)) dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush, 1, udev_flags, 0); -- 1.8.3.1 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
[dm-devel] [PATCH] multipath-tools: replace 64bit archs macros by __LP64__
Rationale: https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html __LP64__ _LP64 These macros are defined, with value 1, if (and only if) the compilation is for a target where long int and pointer both use 64-bits and int uses 32-bit. Cc: Christophe Varoqui Cc: device-mapper development Signed-off-by: Xose Vazquez Perez --- kpartx/lopart.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kpartx/lopart.c b/kpartx/lopart.c index 4e6dab4..3c16eb0 100644 --- a/kpartx/lopart.c +++ b/kpartx/lopart.c @@ -38,11 +38,10 @@ #define LOOP_CTL_GET_FREE 0x4C82 #endif -#if !defined (__alpha__) && !defined (__ia64__) && !defined (__x86_64__) \ - && !defined (__s390x__) -#define int2ptr(x) ((void *) ((int) x)) -#else +#if defined (__LP64__) #define int2ptr(x) ((void *) ((long) x)) +#else +#define int2ptr(x) ((void *) ((int) x)) #endif static char * -- 2.13.0 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH v3 00/10] dm: zoned block device support
Mike, (resending using outlook as I am still having troubles reaching @redhat.com email domain with any other email client. My apologies if multiple copies of this email show up) On 5/18/17 03:54, Mike Snitzer wrote: > On Tue, May 16 2017 at 4:03pm -0400, > Mike Snitzer wrote: > >> I see quite a few issues with this patchset (only gotten through patches >> 1 - 6). I'll work through it in more detail and share my >> feedback/revisions tomorrow. Mostly just cleanups, renames, etc. But >> "the fun" is obviously once I get to the last patch. > > FYI, couldn't get to this like I planned. And I'm taking some time off, > won't get back to this until next Tuesday (5/23). To be clear, the > things I noticed in the preliminary patches were very benign, but do > need cleaning up. Thank you for the review. Let me know the changes you would like to see and I will send an updated series. > I have every intention of getting this reviewed and staged for 4.13. That's great. Thanks. > But would be useful to understand: > 1) who will be regression testing this target once it is merged? Myself, Bart, and all other members of my team will be involved in maintaining and testing this. It is critical for us as an SMR disk vendor that those disks are supported correctly in Linux. So we will maintain and regression test all aspects of the zoned block device support constantly. > 2) what is needed to test it? (I assume SMR drives?) Yes, SMR drives, but not necessarily physical ones. We are working on adding ZBC support to the SCSI target (that is missing). With that, we are planning to create a tcm (or tcmu) driver to emulate a host-aware or host-managed disk for testing, with a regular disk or file as back-end storage. This was also requested by file system maintainers (BtrFS) to allow testing of zoned block device support even without physical SMR disks available. Since zoned block device support spans the entire block I/O stack (from block layer API down to LLD, with device mapper and SCSI/libata in the middle) we are also starting to design new test cases for the newly released blktests infrastructure. This will allow automated testing, including device mapper targets that supports zoned block devices. Ideally, we will try to release everything for inclusion in 4.13, together with the device mapper support. But all the test parts may get spread over one or two release cycles. But again, the goal is to have a comprehensive automated test suite for zoned block device, similar to what is available for regular block devices. Best regards. -- Damien Le Moal, Western Digital Research Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer: This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] [PATCH] multipath-tools: replace 64bit archs macros by __LP64__
On Fri, 2017-05-19 at 03:38 +0200, Xose Vazquez Perez wrote: > Rationale: https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html > > __LP64__ > _LP64 > These macros are defined, with value 1, if (and only if) the compilation is > for a target where long int and pointer both use 64-bits and int uses 32-bit. > > > Cc: Christophe Varoqui > Cc: device-mapper development > Signed-off-by: Xose Vazquez Perez > --- > kpartx/lopart.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/kpartx/lopart.c b/kpartx/lopart.c > index 4e6dab4..3c16eb0 100644 > --- a/kpartx/lopart.c > +++ b/kpartx/lopart.c > @@ -38,11 +38,10 @@ > #define LOOP_CTL_GET_FREE 0x4C82 > #endif > > -#if !defined (__alpha__) && !defined (__ia64__) && !defined (__x86_64__) \ > - && !defined (__s390x__) > -#define int2ptr(x) ((void *) ((int) x)) > -#else > +#if defined (__LP64__) > #define int2ptr(x) ((void *) ((long) x)) > +#else > +#define int2ptr(x) ((void *) ((int) x)) > #endif > > static char * Hello Xose, Since there is only one user of int2ptr(), please remove the macro definitions and use the following construct: (void *)(uintptr_t)(...). Thanks, Bart. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel
Re: [dm-devel] dedicated error codes for the block layer
On Thu, May 18, 2017 at 03:17:57PM +0200, Christoph Hellwig wrote: > This series introduces a new blk_status_t error code type for the block > layer so that we can have tigher control and explicit semantics for > block layer errors. > > All but the last three patches are cleanups that lead to the new type. > > The series it mostly limited to the block layer and drivers, and touching > file systems a little bit. The only major exception is btrfs, which > does funny things with bios and thus sees a larger amount of propagation > of the new blk_status_t. JFYI, the patches 13 and 15 are missing, not in linux-btrfs mailbox and patchwork web. Does not look like a delay, maybe vger refused them completely. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel