Re: [dm-devel] [PATCH 6/6] mm: Add memalloc_nowait

2020-06-30 Thread Michal Hocko
On Wed 01-07-20 05:12:03, Matthew Wilcox wrote:
> On Tue, Jun 30, 2020 at 08:34:36AM +0200, Michal Hocko wrote:
> > On Mon 29-06-20 22:28:30, Matthew Wilcox wrote:
> > [...]
> > > The documentation is hard to add a new case to, so I rewrote it.  What
> > > do you think?  (Obviously I'll split this out differently for submission;
> > > this is just what I have in my tree right now).
> > 
> > I am fine with your changes. Few notes below.
> 
> Thanks!
> 
> > > -It turned out though that above approach has led to
> > > -abuses when the restricted gfp mask is used "just in case" without a
> > > -deeper consideration which leads to problems because an excessive use
> > > -of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
> > > -reclaim issues.
> > 
> > I believe this is an important part because it shows that new people
> > coming to the existing code shouldn't take it as correct and rather
> > question it. Also having a clear indication that overuse is causing real
> > problems that might be not immediately visible to subsystems outside of
> > MM.
> 
> It seemed to say a lot of the same things as this paragraph:
> 
> +You may notice that quite a few allocations in the existing code specify
> +``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to prevent
> +recursion deadlocks caused by direct memory reclaim calling back into
> +the FS or IO paths and blocking on already held resources. Since 4.12
> +the preferred way to address this issue is to use the new scope APIs
> +described below.
> 
> Since this is in core-api/ rather than vm/, I felt that discussion of
> the problems that it causes to the mm was a bit too much detail for the
> people who would be reading this document.  Maybe I could move that
> information into a new Documentation/vm/reclaim.rst file?

Hmm, my experience is that at least some users of NOFS/NOIO use this
flag just to be sure they do not do something wrong without realizing
that this might have a very negative effect on the whole system
operation. That was the main motivation to have an explicit note there.
I am not sure having that in MM internal documentation will make it
stand out for a general reader.

But I will not insist of course.

> Let's see if Our Grumpy Editor has time to give us his advice on this.
> 
> > > -FS/IO code then simply calls the appropriate save function before
> > > -any critical section with respect to the reclaim is started - e.g.
> > > -lock shared with the reclaim context or when a transaction context
> > > -nesting would be possible via reclaim.  
> > 
> > [...]
> > 
> > > +These functions should be called at the point where any memory allocation
> > > +would start to cause problems.  That is, do not simply wrap individual
> > > +memory allocation calls which currently use ``GFP_NOFS`` with a pair
> > > +of calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead,
> > > +find the lock which is taken that would cause problems if memory reclaim
> > > +reentered the filesystem, place a call to memalloc_nofs_save() before it
> > > +is acquired and a call to memalloc_nofs_restore() after it is released.
> > > +Ideally also add a comment explaining why this lock will be problematic.
> > 
> > The above text has mentioned the transaction context nesting as well and
> > that was a hint by Dave IIRC. It is imho good to have an example of
> > other reentrant points than just locks. I believe another useful example
> > would be something like loop device which is mixing IO and FS layers but
> > I am not familiar with all the details to give you an useful text.
> 
> I'll let Mikulas & Dave finish fighting about that before I write any
> text mentioning the loop driver.  How about this for mentioning the
> filesystem transaction possibility?
> 
> @@ -103,12 +103,16 @@ flags specified by any particular call to allocate 
> memory.
>  
>  These functions should be called at the point where any memory allocation
>  would start to cause problems.  That is, do not simply wrap individual
> -memory allocation calls which currently use ``GFP_NOFS`` with a pair
> -of calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead,
> -find the lock which is taken that would cause problems if memory reclaim
> +memory allocation calls which currently use ``GFP_NOFS`` with a pair of
> +calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead, find
> +the resource which is acquired that would cause problems if memory reclaim
>  reentered the filesystem, place a call to memalloc_nofs_save() before it
>  is acquired and a call to memalloc_nofs_restore() after it is released.
>  Ideally also add a comment explaining why this lock will be problematic.
> +A resource might be a lock which would need to be acquired by an attempt
> +to reclaim memory, or it might be starting a transaction that should not
> +nest over a memory reclaim transaction.  Deep knowledge of the filesystem
> +or driver is often needed to place memory scoping calls 

[dm-devel] [PATCH 2/6] crypto: algapi - use common mechanism for inheriting flags

2020-06-30 Thread Eric Biggers
From: Eric Biggers 

The flag CRYPTO_ALG_ASYNC is "inherited" in the sense that when a
template is instantiated, the template will have CRYPTO_ALG_ASYNC set if
any of the algorithms it uses has CRYPTO_ALG_ASYNC set.

We'd like to add a second flag (CRYPTO_ALG_ALLOCATES_MEMORY) that gets
"inherited" in the same way.  This is difficult because the handling of
CRYPTO_ALG_ASYNC is hardcoded everywhere.  Address this by:

  - Add CRYPTO_ALG_INHERITED_FLAGS, which contains the set of flags that
have these inheritance semantics.

  - Add crypto_algt_inherited_mask(), for use by template ->create()
methods.  It returns any of these flags that the user asked to be
unset and thus must be passed in the 'mask' to crypto_grab_*().

  - Make crypto_grab_*() propagate these flags to the template instance
being created so that templates don't have to do this themselves.

Make crypto/simd.c propagate these flags too, since it "wraps" another
algorithm, similar to a template.

Originally-from: Mikulas Patocka 
Signed-off-by: Eric Biggers 
---
 crypto/adiantum.c |  4 +--
 crypto/algapi.c   |  2 ++
 crypto/authenc.c  |  4 +--
 crypto/authencesn.c   |  4 +--
 crypto/ccm.c  | 23 +--
 crypto/chacha20poly1305.c |  4 +--
 crypto/cmac.c | 15 +++---
 crypto/cryptd.c   | 59 ---
 crypto/ctr.c  |  8 ++
 crypto/cts.c  |  3 +-
 crypto/essiv.c| 11 ++--
 crypto/gcm.c  | 10 ++-
 crypto/geniv.c|  4 +--
 crypto/hmac.c | 15 +++---
 crypto/lrw.c  |  3 +-
 crypto/pcrypt.c   | 14 --
 crypto/rsa-pkcs1pad.c |  3 +-
 crypto/simd.c |  6 ++--
 crypto/skcipher.c |  3 +-
 crypto/vmac.c | 15 +++---
 crypto/xcbc.c | 15 +++---
 crypto/xts.c  |  3 +-
 include/crypto/algapi.h   | 21 ++
 23 files changed, 140 insertions(+), 109 deletions(-)

diff --git a/crypto/adiantum.c b/crypto/adiantum.c
index cf2b9f4103dd..6e9aa611992c 100644
--- a/crypto/adiantum.c
+++ b/crypto/adiantum.c
@@ -507,7 +507,7 @@ static int adiantum_create(struct crypto_template *tmpl, 
struct rtattr **tb)
if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_algt_inherited_mask(algt);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
if (!inst)
@@ -565,8 +565,6 @@ static int adiantum_create(struct crypto_template *tmpl, 
struct rtattr **tb)
 hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = streamcipher_alg->base.cra_flags &
-  CRYPTO_ALG_ASYNC;
inst->alg.base.cra_blocksize = BLOCKCIPHER_BLOCK_SIZE;
inst->alg.base.cra_ctxsize = sizeof(struct adiantum_tfm_ctx);
inst->alg.base.cra_alignmask = streamcipher_alg->base.cra_alignmask |
diff --git a/crypto/algapi.c b/crypto/algapi.c
index 92abdf675992..24a56279ca80 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -690,6 +690,8 @@ int crypto_grab_spawn(struct crypto_spawn *spawn, struct 
crypto_instance *inst,
spawn->mask = mask;
spawn->next = inst->spawns;
inst->spawns = spawn;
+   inst->alg.cra_flags |=
+   (alg->cra_flags & CRYPTO_ALG_INHERITED_FLAGS);
err = 0;
}
up_write(_alg_sem);
diff --git a/crypto/authenc.c b/crypto/authenc.c
index 775e7138fd10..ae66561d1af2 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -388,7 +388,7 @@ static int crypto_authenc_create(struct crypto_template 
*tmpl,
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_algt_inherited_mask(algt);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -423,8 +423,6 @@ static int crypto_authenc_create(struct crypto_template 
*tmpl,
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index 149b70df2a91..9847339aa4ef 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -406,7 +406,7 @@ static int crypto_authenc_esn_create(struct crypto_template 
*tmpl,
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)

[dm-devel] [PATCH 3/6] crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
From: Eric Biggers 

Introduce a new algorithm flag CRYPTO_ALG_ALLOCATES_MEMORY.  If this
flag is set, then the driver allocates memory in its request routine.
Such drivers are not suitable for disk encryption because GFP_ATOMIC
allocation can fail anytime (causing random I/O errors) and GFP_KERNEL
allocation can recurse into the block layer, causing a deadlock.

For now, this flag is only implemented for some algorithm types.  We
also assume some usage constraints for it to be meaningful, since there
are lots of edge cases the crypto API allows (e.g., misaligned or
fragmented scatterlists) that mean that nearly any crypto algorithm can
allocate memory in some case.  See the comment for details.

Also add this flag to CRYPTO_ALG_INHERITED_FLAGS so that when a template
is instantiated, this flag is set on the template instance if it is set
on any algorithm the instance uses.

Originally-from: Mikulas Patocka 
Signed-off-by: Eric Biggers 
---
 include/crypto/algapi.h |  3 ++-
 include/linux/crypto.h  | 32 
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index 5385443dcf9b..dce57c89cf98 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -246,7 +246,8 @@ static inline u32 crypto_requires_off(u32 type, u32 mask, 
u32 off)
  * algorithm if any "inner" algorithm has them set.  In some cases other flags
  * are inherited too; these are just the flags that are *always* inherited.
  */
-#define CRYPTO_ALG_INHERITED_FLAGS CRYPTO_ALG_ASYNC
+#define CRYPTO_ALG_INHERITED_FLAGS \
+   (CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY)
 
 /*
  * Given the type and mask that specify the flags restrictions on a template
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 763863dbc079..5e7c04391be8 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -101,6 +101,38 @@
  */
 #define CRYPTO_NOLOAD  0x8000
 
+/*
+ * The algorithm may allocate memory during request processing, i.e. during
+ * encryption, decryption, or hashing.  Users can request an algorithm with 
this
+ * flag unset if they can't handle memory allocation failures.
+ *
+ * This flag is currently only implemented for algorithms of type "skcipher",
+ * "aead", "ahash", "shash", and "cipher".  Algorithms of other types might not
+ * have this flag set even if they allocate memory.
+ *
+ * In some edge cases, algorithms can allocate memory regardless of this flag.
+ * To avoid these cases, users must obey the following usage constraints:
+ *skcipher:
+ * - The IV buffer and all scatterlist elements must be aligned to the
+ *   algorithm's alignmask.
+ * - If the data were to be divided into chunks of size
+ *   crypto_skcipher_walksize() (with any remainder going at the end), no
+ *   chunk can cross a page boundary or a scatterlist element boundary.
+ *aead:
+ * - The IV buffer and all scatterlist elements must be aligned to the
+ *   algorithm's alignmask.
+ * - The first scatterlist element must contain all the associated data,
+ *   and its pages must be !PageHighMem.
+ * - If the plaintext/ciphertext were to be divided into chunks of size
+ *   crypto_aead_walksize() (with the remainder going at the end), no chunk
+ *   can cross a page boundary or a scatterlist element boundary.
+ *ahash:
+ * - The result buffer must be aligned to the algorithm's alignmask.
+ * - crypto_ahash_finup() must not be used unless the algorithm implements
+ *   ->finup() natively.
+ */
+#define CRYPTO_ALG_ALLOCATES_MEMORY0x0001
+
 /*
  * Transform masks and values (for crt_flags).
  */
-- 
2.27.0


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 0/6] crypto: add CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
This series introduces a flag that algorithms can set to indicate that
they allocate memory during processing of typical inputs, and thus
shouldn't be used in cases like dm-crypt where memory allocation
failures aren't acceptable.

Compared to Mikulas's patches, I've made the following improvements:

- Tried to clearly document the semantics of
  CRYPTO_ALG_ALLOCATES_MEMORY.  This includes documenting the usage
  constraints, since there are actually lots of cases that were
  overlooked where algorithms can still allocate memory in some edge
  cases where inputs are misaligned, fragemented, etc.  E.g. see
  crypto/skcipher.c and crypto/ahash.c.  Mikulas, please let me know if
  there are any concerns for dm-crypt.

- Moved the common mechanism for inheriting flags to its own patch.

- crypto_grab_spawn() now handles propagating CRYPTO_ALG_INHERITED_FLAGS
  to the new template instance.

- Inherit the flags in various places that were missed.

- Other cleanups.

Note: Mikulas's patch "crypto: set the flag CRYPTO_ALG_ALLOCATES_MEMORY"
still needs to be checked for cases where the flag no longer needs to be
set due to the usage constraints I documented.

Eric Biggers (4):
  crypto: geniv - remove unneeded arguments from aead_geniv_alloc()
  crypto: algapi - use common mechanism for inheriting flags
  crypto: algapi - introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY
  crypto: algapi - remove crypto_check_attr_type()

Mikulas Patocka (2):
  crypto: set the flag CRYPTO_ALG_ALLOCATES_MEMORY
  dm-crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY

 crypto/adiantum.c |   4 +-
 crypto/algapi.c   |  17 +--
 crypto/authenc.c  |   4 +-
 crypto/authencesn.c   |   4 +-
 crypto/ccm.c  |  23 ++--
 crypto/chacha20poly1305.c |   4 +-
 crypto/cmac.c |  15 ++-
 crypto/cryptd.c   |  59 -
 crypto/ctr.c  |   8 +-
 crypto/cts.c  |   3 +-
 crypto/echainiv.c |   2 +-
 crypto/essiv.c|  11 +-
 crypto/gcm.c  |  10 +-
 crypto/geniv.c|   9 +-
 crypto/hmac.c |  15 ++-
 crypto/lrw.c  |   3 +-
 crypto/pcrypt.c   |  14 +--
 crypto/rsa-pkcs1pad.c |   3 +-
 crypto/seqiv.c|   2 +-
 crypto/simd.c |   6 +-
 crypto/skcipher.c |   3 +-
 crypto/vmac.c |  15 ++-
 crypto/xcbc.c |  15 ++-
 crypto/xts.c  |   3 +-
 .../crypto/allwinner/sun8i-ce/sun8i-ce-core.c |  12 +-
 .../crypto/allwinner/sun8i-ss/sun8i-ss-core.c |  12 +-
 drivers/crypto/amlogic/amlogic-gxl-core.c |   6 +-
 drivers/crypto/axis/artpec6_crypto.c  |  20 ++-
 drivers/crypto/bcm/cipher.c   |  72 ---
 drivers/crypto/caam/caamalg.c |   6 +-
 drivers/crypto/caam/caamalg_qi.c  |   6 +-
 drivers/crypto/caam/caamalg_qi2.c |   8 +-
 drivers/crypto/caam/caamhash.c|   2 +-
 drivers/crypto/cavium/cpt/cptvf_algs.c|  18 ++-
 drivers/crypto/cavium/nitrox/nitrox_aead.c|   4 +-
 .../crypto/cavium/nitrox/nitrox_skcipher.c|  16 +--
 drivers/crypto/ccp/ccp-crypto-aes-cmac.c  |   1 +
 drivers/crypto/ccp/ccp-crypto-aes-galois.c|   1 +
 drivers/crypto/ccp/ccp-crypto-aes-xts.c   |   1 +
 drivers/crypto/ccp/ccp-crypto-aes.c   |   2 +
 drivers/crypto/ccp/ccp-crypto-des3.c  |   1 +
 drivers/crypto/ccp/ccp-crypto-sha.c   |   1 +
 drivers/crypto/chelsio/chcr_algo.c|   7 +-
 drivers/crypto/hisilicon/sec/sec_algs.c   |  24 ++--
 drivers/crypto/hisilicon/sec2/sec_crypto.c|   4 +-
 .../crypto/inside-secure/safexcel_cipher.c|  47 +++
 drivers/crypto/inside-secure/safexcel_hash.c  |  18 +++
 drivers/crypto/ixp4xx_crypto.c|   6 +-
 drivers/crypto/marvell/cesa/cipher.c  |  18 ++-
 drivers/crypto/marvell/cesa/hash.c|   6 +
 .../crypto/marvell/octeontx/otx_cptvf_algs.c  |  30 ++---
 drivers/crypto/n2_core.c  |   3 +-
 drivers/crypto/picoxcell_crypto.c |  17 ++-
 drivers/crypto/qat/qat_common/qat_algs.c  |  12 +-
 drivers/crypto/qce/skcipher.c |   1 +
 drivers/crypto/talitos.c  | 117 --
 drivers/crypto/virtio/virtio_crypto_algs.c|   3 +-
 drivers/crypto/xilinx/zynqmp-aes-gcm.c|   1 +
 drivers/md/dm-crypt.c |  17 ++-
 include/crypto/algapi.h   |  23 +++-
 

[dm-devel] [PATCH 5/6] crypto: set the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
From: Mikulas Patocka 

Set the flag CRYPTO_ALG_ALLOCATES_MEMORY in the crypto drivers that
allocate memory.

drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c: sun8i_ce_cipher
drivers/crypto/allwinner/sun8i-ss/sun8i-ss-core.c: sun8i_ss_cipher
drivers/crypto/amlogic/amlogic-gxl-core.c: meson_cipher
drivers/crypto/axis/artpec6_crypto.c: artpec6_crypto_common_init
drivers/crypto/bcm/cipher.c: spu_skcipher_rx_sg_create
drivers/crypto/caam/caamalg.c: aead_edesc_alloc
drivers/crypto/caam/caamalg_qi.c: aead_edesc_alloc
drivers/crypto/caam/caamalg_qi2.c: aead_edesc_alloc
drivers/crypto/caam/caamhash.c: hash_digest_key
drivers/crypto/cavium/cpt/cptvf_algs.c: process_request
drivers/crypto/cavium/nitrox/nitrox_aead.c: nitrox_process_se_request
drivers/crypto/cavium/nitrox/nitrox_skcipher.c: nitrox_process_se_request
drivers/crypto/ccp/ccp-crypto-aes-cmac.c: ccp_do_cmac_update
drivers/crypto/ccp/ccp-crypto-aes-galois.c: ccp_crypto_enqueue_request
drivers/crypto/ccp/ccp-crypto-aes-xts.c: ccp_crypto_enqueue_request
drivers/crypto/ccp/ccp-crypto-aes.c: ccp_crypto_enqueue_request
drivers/crypto/ccp/ccp-crypto-des3.c: ccp_crypto_enqueue_request
drivers/crypto/ccp/ccp-crypto-sha.c: ccp_crypto_enqueue_request
drivers/crypto/chelsio/chcr_algo.c: create_cipher_wr
drivers/crypto/hisilicon/sec/sec_algs.c: sec_alloc_and_fill_hw_sgl
drivers/crypto/hisilicon/sec2/sec_crypto.c: sec_alloc_req_id
drivers/crypto/inside-secure/safexcel_cipher.c: safexcel_queue_req
drivers/crypto/inside-secure/safexcel_hash.c: safexcel_ahash_enqueue
drivers/crypto/ixp4xx_crypto.c: ablk_perform
drivers/crypto/marvell/cesa/cipher.c: mv_cesa_skcipher_dma_req_init
drivers/crypto/marvell/cesa/hash.c: mv_cesa_ahash_dma_req_init
drivers/crypto/marvell/octeontx/otx_cptvf_algs.c: create_ctx_hdr
drivers/crypto/n2_core.c: n2_compute_chunks
drivers/crypto/picoxcell_crypto.c: spacc_sg_to_ddt
drivers/crypto/qat/qat_common/qat_algs.c: qat_alg_skcipher_encrypt
drivers/crypto/qce/skcipher.c: qce_skcipher_async_req_handle
drivers/crypto/talitos.c : talitos_edesc_alloc
drivers/crypto/virtio/virtio_crypto_algs.c: __virtio_crypto_skcipher_do_req
drivers/crypto/xilinx/zynqmp-aes-gcm.c: zynqmp_aes_aead_cipher

Signed-off-by: Mikulas Patocka 
[EB: avoid overly-long lines]
Signed-off-by: Eric Biggers 
---
 .../crypto/allwinner/sun8i-ce/sun8i-ce-core.c |  12 +-
 .../crypto/allwinner/sun8i-ss/sun8i-ss-core.c |  12 +-
 drivers/crypto/amlogic/amlogic-gxl-core.c |   6 +-
 drivers/crypto/axis/artpec6_crypto.c  |  20 ++-
 drivers/crypto/bcm/cipher.c   |  72 ---
 drivers/crypto/caam/caamalg.c |   6 +-
 drivers/crypto/caam/caamalg_qi.c  |   6 +-
 drivers/crypto/caam/caamalg_qi2.c |   8 +-
 drivers/crypto/caam/caamhash.c|   2 +-
 drivers/crypto/cavium/cpt/cptvf_algs.c|  18 ++-
 drivers/crypto/cavium/nitrox/nitrox_aead.c|   4 +-
 .../crypto/cavium/nitrox/nitrox_skcipher.c|  16 +--
 drivers/crypto/ccp/ccp-crypto-aes-cmac.c  |   1 +
 drivers/crypto/ccp/ccp-crypto-aes-galois.c|   1 +
 drivers/crypto/ccp/ccp-crypto-aes-xts.c   |   1 +
 drivers/crypto/ccp/ccp-crypto-aes.c   |   2 +
 drivers/crypto/ccp/ccp-crypto-des3.c  |   1 +
 drivers/crypto/ccp/ccp-crypto-sha.c   |   1 +
 drivers/crypto/chelsio/chcr_algo.c|   7 +-
 drivers/crypto/hisilicon/sec/sec_algs.c   |  24 ++--
 drivers/crypto/hisilicon/sec2/sec_crypto.c|   4 +-
 .../crypto/inside-secure/safexcel_cipher.c|  47 +++
 drivers/crypto/inside-secure/safexcel_hash.c  |  18 +++
 drivers/crypto/ixp4xx_crypto.c|   6 +-
 drivers/crypto/marvell/cesa/cipher.c  |  18 ++-
 drivers/crypto/marvell/cesa/hash.c|   6 +
 .../crypto/marvell/octeontx/otx_cptvf_algs.c  |  30 ++---
 drivers/crypto/n2_core.c  |   3 +-
 drivers/crypto/picoxcell_crypto.c |  17 ++-
 drivers/crypto/qat/qat_common/qat_algs.c  |  12 +-
 drivers/crypto/qce/skcipher.c |   1 +
 drivers/crypto/talitos.c  | 117 --
 drivers/crypto/virtio/virtio_crypto_algs.c|   3 +-
 drivers/crypto/xilinx/zynqmp-aes-gcm.c|   1 +
 34 files changed, 360 insertions(+), 143 deletions(-)

diff --git a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c 
b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c
index b957061424a1..138759dc8190 100644
--- a/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c
+++ b/drivers/crypto/allwinner/sun8i-ce/sun8i-ce-core.c
@@ -185,7 +185,8 @@ static struct sun8i_ce_alg_template ce_algs[] = {
.cra_priority = 400,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_flags = CRYPTO_ALG_TYPE_SKCIPHER |
-   CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK,
+   CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY |
+   CRYPTO_ALG_NEED_FALLBACK,

[dm-devel] [PATCH 1/6] crypto: geniv - remove unneeded arguments from aead_geniv_alloc()

2020-06-30 Thread Eric Biggers
From: Eric Biggers 

The type and mask arguments to aead_geniv_alloc() are always 0, so
remove them.

Signed-off-by: Eric Biggers 
---
 crypto/echainiv.c   | 2 +-
 crypto/geniv.c  | 7 ---
 crypto/seqiv.c  | 2 +-
 include/crypto/internal/geniv.h | 2 +-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/crypto/echainiv.c b/crypto/echainiv.c
index 4a2f02baba14..69686668625e 100644
--- a/crypto/echainiv.c
+++ b/crypto/echainiv.c
@@ -115,7 +115,7 @@ static int echainiv_aead_create(struct crypto_template 
*tmpl,
struct aead_instance *inst;
int err;
 
-   inst = aead_geniv_alloc(tmpl, tb, 0, 0);
+   inst = aead_geniv_alloc(tmpl, tb);
 
if (IS_ERR(inst))
return PTR_ERR(inst);
diff --git a/crypto/geniv.c b/crypto/geniv.c
index 6a90c52d49ad..07496c8af0ab 100644
--- a/crypto/geniv.c
+++ b/crypto/geniv.c
@@ -39,7 +39,7 @@ static void aead_geniv_free(struct aead_instance *inst)
 }
 
 struct aead_instance *aead_geniv_alloc(struct crypto_template *tmpl,
-  struct rtattr **tb, u32 type, u32 mask)
+  struct rtattr **tb)
 {
struct crypto_aead_spawn *spawn;
struct crypto_attr_type *algt;
@@ -47,6 +47,7 @@ struct aead_instance *aead_geniv_alloc(struct crypto_template 
*tmpl,
struct aead_alg *alg;
unsigned int ivsize;
unsigned int maxauthsize;
+   u32 mask;
int err;
 
algt = crypto_get_attr_type(tb);
@@ -63,10 +64,10 @@ struct aead_instance *aead_geniv_alloc(struct 
crypto_template *tmpl,
spawn = aead_instance_ctx(inst);
 
/* Ignore async algorithms if necessary. */
-   mask |= crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_requires_sync(algt->type, algt->mask);
 
err = crypto_grab_aead(spawn, aead_crypto_instance(inst),
-  crypto_attr_alg_name(tb[1]), type, mask);
+  crypto_attr_alg_name(tb[1]), 0, mask);
if (err)
goto err_free_inst;
 
diff --git a/crypto/seqiv.c b/crypto/seqiv.c
index f124b9b54e15..e48f875a7aac 100644
--- a/crypto/seqiv.c
+++ b/crypto/seqiv.c
@@ -138,7 +138,7 @@ static int seqiv_aead_create(struct crypto_template *tmpl, 
struct rtattr **tb)
struct aead_instance *inst;
int err;
 
-   inst = aead_geniv_alloc(tmpl, tb, 0, 0);
+   inst = aead_geniv_alloc(tmpl, tb);
 
if (IS_ERR(inst))
return PTR_ERR(inst);
diff --git a/include/crypto/internal/geniv.h b/include/crypto/internal/geniv.h
index 229d37681a9d..7fd7126f593a 100644
--- a/include/crypto/internal/geniv.h
+++ b/include/crypto/internal/geniv.h
@@ -20,7 +20,7 @@ struct aead_geniv_ctx {
 };
 
 struct aead_instance *aead_geniv_alloc(struct crypto_template *tmpl,
-  struct rtattr **tb, u32 type, u32 mask);
+  struct rtattr **tb);
 int aead_init_geniv(struct crypto_aead *tfm);
 void aead_exit_geniv(struct crypto_aead *tfm);
 
-- 
2.27.0


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 4/6] crypto: algapi - remove crypto_check_attr_type()

2020-06-30 Thread Eric Biggers
From: Eric Biggers 

Remove crypto_check_attr_type(), since it's no longer used.

Signed-off-by: Eric Biggers 
---
 crypto/algapi.c | 15 ---
 include/crypto/algapi.h |  1 -
 2 files changed, 16 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index 24a56279ca80..58558fae12bd 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -818,21 +818,6 @@ struct crypto_attr_type *crypto_get_attr_type(struct 
rtattr **tb)
 }
 EXPORT_SYMBOL_GPL(crypto_get_attr_type);
 
-int crypto_check_attr_type(struct rtattr **tb, u32 type)
-{
-   struct crypto_attr_type *algt;
-
-   algt = crypto_get_attr_type(tb);
-   if (IS_ERR(algt))
-   return PTR_ERR(algt);
-
-   if ((algt->type ^ type) & algt->mask)
-   return -EINVAL;
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(crypto_check_attr_type);
-
 const char *crypto_attr_alg_name(struct rtattr *rta)
 {
struct crypto_attr_alg *alga;
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index dce57c89cf98..78902563512b 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -116,7 +116,6 @@ struct crypto_tfm *crypto_spawn_tfm(struct crypto_spawn 
*spawn, u32 type,
 void *crypto_spawn_tfm2(struct crypto_spawn *spawn);
 
 struct crypto_attr_type *crypto_get_attr_type(struct rtattr **tb);
-int crypto_check_attr_type(struct rtattr **tb, u32 type);
 const char *crypto_attr_alg_name(struct rtattr *rta);
 int crypto_attr_u32(struct rtattr *rta, u32 *num);
 int crypto_inst_setname(struct crypto_instance *inst, const char *name,
-- 
2.27.0


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 6/6] dm-crypt: don't use drivers that have CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
From: Mikulas Patocka 

Don't use crypto drivers that have the flag CRYPTO_ALG_ALLOCATES_MEMORY
set. These drivers allocate memory and thus they are unsuitable for block
I/O processing.

Signed-off-by: Mikulas Patocka 
---
 drivers/md/dm-crypt.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 000ddfab5ba0..7268faacbdf3 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -421,7 +421,8 @@ static int crypt_iv_lmk_ctr(struct crypt_config *cc, struct 
dm_target *ti,
return -EINVAL;
}
 
-   lmk->hash_tfm = crypto_alloc_shash("md5", 0, 0);
+   lmk->hash_tfm = crypto_alloc_shash("md5", 0,
+  CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(lmk->hash_tfm)) {
ti->error = "Error initializing LMK hash";
return PTR_ERR(lmk->hash_tfm);
@@ -583,7 +584,8 @@ static int crypt_iv_tcw_ctr(struct crypt_config *cc, struct 
dm_target *ti,
return -EINVAL;
}
 
-   tcw->crc32_tfm = crypto_alloc_shash("crc32", 0, 0);
+   tcw->crc32_tfm = crypto_alloc_shash("crc32", 0,
+   CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(tcw->crc32_tfm)) {
ti->error = "Error initializing CRC32 in TCW";
return PTR_ERR(tcw->crc32_tfm);
@@ -770,7 +772,8 @@ static int crypt_iv_elephant_ctr(struct crypt_config *cc, 
struct dm_target *ti,
struct iv_elephant_private *elephant = >iv_gen_private.elephant;
int r;
 
-   elephant->tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0);
+   elephant->tfm = crypto_alloc_skcipher("ecb(aes)", 0,
+ CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(elephant->tfm)) {
r = PTR_ERR(elephant->tfm);
elephant->tfm = NULL;
@@ -2090,7 +2093,8 @@ static int crypt_alloc_tfms_skcipher(struct crypt_config 
*cc, char *ciphermode)
return -ENOMEM;
 
for (i = 0; i < cc->tfms_count; i++) {
-   cc->cipher_tfm.tfms[i] = crypto_alloc_skcipher(ciphermode, 0, 
0);
+   cc->cipher_tfm.tfms[i] = crypto_alloc_skcipher(ciphermode, 0,
+   CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(cc->cipher_tfm.tfms[i])) {
err = PTR_ERR(cc->cipher_tfm.tfms[i]);
crypt_free_tfms(cc);
@@ -2116,7 +2120,8 @@ static int crypt_alloc_tfms_aead(struct crypt_config *cc, 
char *ciphermode)
if (!cc->cipher_tfm.tfms)
return -ENOMEM;
 
-   cc->cipher_tfm.tfms_aead[0] = crypto_alloc_aead(ciphermode, 0, 0);
+   cc->cipher_tfm.tfms_aead[0] = crypto_alloc_aead(ciphermode, 0,
+   CRYPTO_ALG_ALLOCATES_MEMORY);
if (IS_ERR(cc->cipher_tfm.tfms_aead[0])) {
err = PTR_ERR(cc->cipher_tfm.tfms_aead[0]);
crypt_free_tfms(cc);
@@ -2603,7 +2608,7 @@ static int crypt_ctr_auth_cipher(struct crypt_config *cc, 
char *cipher_api)
return -ENOMEM;
strncpy(mac_alg, start, end - start);
 
-   mac = crypto_alloc_ahash(mac_alg, 0, 0);
+   mac = crypto_alloc_ahash(mac_alg, 0, CRYPTO_ALG_ALLOCATES_MEMORY);
kfree(mac_alg);
 
if (IS_ERR(mac))
-- 
2.27.0


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 10/11] block: use block_bio class for getrq and sleeprq

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:13 PM, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:43:13PM -0700, Chaitanya Kulkarni wrote:
>> The only difference in block_get_rq and block_bio was the last param
>> passed  __entry->nr_sector & bio->bi_iter.bi_size respectively. Since
>> that is not the case anymore replace block_get_rq class with block_bio
>> for block_getrq and block_sleeprq events, also adjust the code to handle
>> null bio case in block_bio.
> To me it seems like keeping the NULL bio case separate actually is a
> little simpler..
> 
> 

Keeping it separate will have an extra event class and related
event(s) for only handling null bio case.

Also the block_get_rq class uses 4 comparisons with ?:.
This patch reduces it to only one comparison in fast path.

With above explanation does it make sense to get rid of the
blk_get_rq ?






--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 09/11] block: remove unsed param in blk_fill_rwbs()

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:12 PM, Christoph Hellwig wrote:
>> +extern void blk_fill_rwbs(char *rwbs, unsigned int op);
> You might want to drop the extern while you're at it.
> 

Good point.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 07/11] block: get rid of blk_trace_request_get_cgid()

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:12 PM, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:43:10PM -0700, Chaitanya Kulkarni wrote:
>> Now that we have done cleanup we can safely get rid of the
>> blk_trace_request_get_cgid() and replace it with
>> blk_trace_bio_get_cgid().
> To me the helper actually looks useful compared to open coding the
> check in a bunch of callers.
> 

The helper adds an additional function call which can be avoided easily
since blk_trace_bio_get_cgid() is written nicely, that also maintains
the readability of the code.

Unless the cost of the function call doesn't matter and readability is
not lost can we please not use helper ?



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 02/11] block: rename block_bio_merge class to block_bio

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:10 PM, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:43:05PM -0700, Chaitanya Kulkarni wrote:
>> There are identical TRACE_EVENTS presents which can now take an
>> advantage of the block_bio_merge trace event class.
>>
>> This is a prep patch which renames block_bio_merge to block_bio so
>> that the next patches in this series will be able to resue it.
> The changes look good, but I'd merged it with the patches adding
> actual new users (which also look good to me).
> 
Okay, will merge it.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 05/11] block: get rid of the trace rq insert wrapper

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:11 PM, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:43:08PM -0700, Chaitanya Kulkarni wrote:
>> Get rid of the wrapper for trace_block_rq_insert() and call the function
>> directly.
> I'd mention blk_mq_sched_request_inserted instead of the tracepoint
> in the subject and commit message.  Otherwise this looks fine.
> 
Okay, will change the message.



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 01/11] block: blktrace framework cleanup

2020-06-30 Thread Chaitanya Kulkarni
On 6/29/20 10:10 PM, Christoph Hellwig wrote:
> On Mon, Jun 29, 2020 at 04:43:04PM -0700, Chaitanya Kulkarni wrote:
>> This patch removes the extra variables from the trace events and
>> overall kernel blktrace framework. The removed members can easily be
>> extracted from the remaining argument which reduces the code
>> significantly and allows us to optimize the several tracepoints like
>> the next patch in the series.
> The subject sounds a litle strange, I'd rather say:
> 
> "block: remove superflous arguments from tracepoints"
> 
> The actual changes look good to me.
Okay, will change that.
> 
>> +trace_block_rq_insert(rq);
>>  }
>>   
>>  spin_lock(>lock);
>> @@ -2111,7 +2111,7 @@ blk_qc_t blk_mq_make_request(struct request_queue *q, 
>> struct bio *bio)
>>  goto queue_exit;
>>  }
>>   
>> -trace_block_getrq(q, bio, bio->bi_opf);
>> +trace_block_getrq(bio, bio->bi_opf);
> But now that you remove more than the q argument in some spots you
> might remove the op one here as well.  Or limit the first patch to
> just the queue argument..
> 
> 

Yeah thats is where I had difficulty, and when I tried to do it in one
patch it got complicated to review. I'll keep the first patch for the
q only and patch(s) for rest arguments as needed it easy to review that
way.




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 6/6] mm: Add memalloc_nowait

2020-06-30 Thread Matthew Wilcox
On Tue, Jun 30, 2020 at 08:34:36AM +0200, Michal Hocko wrote:
> On Mon 29-06-20 22:28:30, Matthew Wilcox wrote:
> [...]
> > The documentation is hard to add a new case to, so I rewrote it.  What
> > do you think?  (Obviously I'll split this out differently for submission;
> > this is just what I have in my tree right now).
> 
> I am fine with your changes. Few notes below.

Thanks!

> > -It turned out though that above approach has led to
> > -abuses when the restricted gfp mask is used "just in case" without a
> > -deeper consideration which leads to problems because an excessive use
> > -of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
> > -reclaim issues.
> 
> I believe this is an important part because it shows that new people
> coming to the existing code shouldn't take it as correct and rather
> question it. Also having a clear indication that overuse is causing real
> problems that might be not immediately visible to subsystems outside of
> MM.

It seemed to say a lot of the same things as this paragraph:

+You may notice that quite a few allocations in the existing code specify
+``GFP_NOIO`` or ``GFP_NOFS``. Historically, they were used to prevent
+recursion deadlocks caused by direct memory reclaim calling back into
+the FS or IO paths and blocking on already held resources. Since 4.12
+the preferred way to address this issue is to use the new scope APIs
+described below.

Since this is in core-api/ rather than vm/, I felt that discussion of
the problems that it causes to the mm was a bit too much detail for the
people who would be reading this document.  Maybe I could move that
information into a new Documentation/vm/reclaim.rst file?

Let's see if Our Grumpy Editor has time to give us his advice on this.

> > -FS/IO code then simply calls the appropriate save function before
> > -any critical section with respect to the reclaim is started - e.g.
> > -lock shared with the reclaim context or when a transaction context
> > -nesting would be possible via reclaim.  
> 
> [...]
> 
> > +These functions should be called at the point where any memory allocation
> > +would start to cause problems.  That is, do not simply wrap individual
> > +memory allocation calls which currently use ``GFP_NOFS`` with a pair
> > +of calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead,
> > +find the lock which is taken that would cause problems if memory reclaim
> > +reentered the filesystem, place a call to memalloc_nofs_save() before it
> > +is acquired and a call to memalloc_nofs_restore() after it is released.
> > +Ideally also add a comment explaining why this lock will be problematic.
> 
> The above text has mentioned the transaction context nesting as well and
> that was a hint by Dave IIRC. It is imho good to have an example of
> other reentrant points than just locks. I believe another useful example
> would be something like loop device which is mixing IO and FS layers but
> I am not familiar with all the details to give you an useful text.

I'll let Mikulas & Dave finish fighting about that before I write any
text mentioning the loop driver.  How about this for mentioning the
filesystem transaction possibility?

@@ -103,12 +103,16 @@ flags specified by any particular call to allocate memory.
 
 These functions should be called at the point where any memory allocation
 would start to cause problems.  That is, do not simply wrap individual
-memory allocation calls which currently use ``GFP_NOFS`` with a pair
-of calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead,
-find the lock which is taken that would cause problems if memory reclaim
+memory allocation calls which currently use ``GFP_NOFS`` with a pair of
+calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead, find
+the resource which is acquired that would cause problems if memory reclaim
 reentered the filesystem, place a call to memalloc_nofs_save() before it
 is acquired and a call to memalloc_nofs_restore() after it is released.
 Ideally also add a comment explaining why this lock will be problematic.
+A resource might be a lock which would need to be acquired by an attempt
+to reclaim memory, or it might be starting a transaction that should not
+nest over a memory reclaim transaction.  Deep knowledge of the filesystem
+or driver is often needed to place memory scoping calls correctly.
 
 Please note that the proper pairing of save/restore functions
 allows nesting so it is safe to call memalloc_noio_save() and

> > @@ -104,16 +134,19 @@ ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a 
> > power of two, the
> >  alignment is also guaranteed to be at least the respective size.
> >  
> >  For large allocations you can use vmalloc() and vzalloc(), or directly
> > -request pages from the page allocator. The memory allocated by `vmalloc`
> > -and related functions is not physically contiguous.
> > +request pages from the page allocator.  The memory allocated by `vmalloc`
> > +and related 

Re: [dm-devel] [PATCH v2] dm crypt: add flags to optionally bypass dm-crypt workqueues

2020-06-30 Thread Damien Le Moal
On 2020/06/30 18:35, Ignat Korchagin wrote:
[...]
>>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>>> index 000ddfab5ba0..6924eb49b1df 100644
>>> --- a/drivers/md/dm-crypt.c
>>> +++ b/drivers/md/dm-crypt.c
>>> @@ -69,6 +69,7 @@ struct dm_crypt_io {
>>>   u8 *integrity_metadata;
>>>   bool integrity_metadata_from_pool;
>>>   struct work_struct work;
>>> + struct tasklet_struct tasklet;
>>>
>>>   struct convert_context ctx;
>>>
>>> @@ -127,7 +128,8 @@ struct iv_elephant_private {
>>>   * and encrypts / decrypts at the same time.
>>>   */
>>>  enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
>>> -  DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD };
>>> +  DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD,
>>> +  DM_CRYPT_NO_READ_WORKQUEUE, DM_CRYPT_NO_WRITE_WORKQUEUE };
>>
>> I liked the "INLINE" naming. What about DM_CRYPT_READ_INLINE and
>> DM_CRYPT_WRITE_INLINE ? Shorter too :)
>>
>> But from the changes below, it looks like your change is now less about being
>> purely inline or synchronous but about bypassing the workqueue.
>> Is this correct ?
> 
> Yes, from the test with the NULL cipher it is clearly visible that
> workqueues are the main cause of the performance degradation. The
> previous patch actually did the same thing with the addition of a
> custom xts-proxy synchronous module, which achieved "full inline"
> processing. But it is clear now, that inline/non-inline Crypto API
> does not change much from a performance point of view.

OK. Understood. So the name DM_CRYPT_NO_READ_WORKQUEUE and
DM_CRYPT_NO_WRITE_WORKQUEUE make sense. They indeed are very descriptive.
I was just wondering how to avoid confusion with the DM_CRYPT_NO_OFFLOAD flag
for writes with better names. But I do not have better ideas :)

> 
>>>
>>>  enum cipher_flags {
>>>   CRYPT_MODE_INTEGRITY_AEAD,  /* Use authenticated mode for cihper 
>>> */
>>> @@ -1449,7 +1451,7 @@ static void kcryptd_async_done(struct 
>>> crypto_async_request *async_req,
>>>  int error);
>>>
>>>  static void crypt_alloc_req_skcipher(struct crypt_config *cc,
>>> -  struct convert_context *ctx)
>>> +  struct convert_context *ctx, bool 
>>> nobacklog)
>>>  {
>>>   unsigned key_index = ctx->cc_sector & (cc->tfms_count - 1);
>>>
>>> @@ -1463,12 +1465,12 @@ static void crypt_alloc_req_skcipher(struct 
>>> crypt_config *cc,
>>>* requests if driver request queue is full.
>>>*/
>>>   skcipher_request_set_callback(ctx->r.req,
>>> - CRYPTO_TFM_REQ_MAY_BACKLOG,
>>> + nobacklog ? 0 : CRYPTO_TFM_REQ_MAY_BACKLOG,
>>>   kcryptd_async_done, dmreq_of_req(cc, ctx->r.req));
>>
>> Will not specifying CRYPTO_TFM_REQ_MAY_BACKLOG always cause the crypto API to
>> return -EBUSY ? From the comment above the skcipher_request_set_callback(), 
>> it
>> seems that this will be the case only if the skcipher diver queue is full. 
>> So in
>> other word, keeping the kcryptd_async_done() callback and executing the 
>> skcipher
>> request through crypt_convert() and crypt_convert_block_skcipher() may still 
>> end
>> up being an asynchronous operation. Can you confirm this and is it what you
>> intended to implement ?
> 
> Yes, so far these flags should bypass dm-crypt workqueues only. I had
> a quick look around CRYPTO_TFM_REQ_MAY_BACKLOG and it seems that both
> generic xts as well as aesni implementations (and other crypto
> involved in disk encryption) do not have any logic related to the
> flag, so we may as well leave it as is.

OK. Sounds good. Less changes :)

>> From my understanding of the crypto API, and from what Eric commented, a 
>> truly
>> synchronous/inline execution of the skcypher needs a call like:
>>
>> crypto_wait_req(crypto_skcipher_encrypt(req), );
>>
>> For SMR use case were we must absolutely preserve the write requests order, 
>> the
>> above change will probably be needed. Will check again.
> 
> I think this is not an "inline" execution, rather blocking the current
> thread and waiting for the potential asynchronous crypto thread to
> finish its operation.

Well, if we block waiting for the crypto execution, crypto use becomes "inline"
in the context of the BIO submitter, so the write request order is preserved.
More a serialization than pure inlining, sure. But in the end, exactly what is
needed for zoned block device writes.

> It seems we have different use-cases here. By bypassing workqueues we
> just want to improve performance, but otherwise do not really care
> about the order of requests.

Yes. Understood. Not using the current workqueue mechanism for writes to zoned
devices is necessary because of write ordering. The performance aspect of that
is the cherry on top of the SMR support cake :)

> Waiting for crypto to complete synchronously may actually decrease
> performance, but required to preserve the order in some cases. Should
> this be a yet another flag?

Yes, 

Re: [dm-devel] [PATCH 1/3 v6] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Herbert Xu
On Tue, Jun 30, 2020 at 02:15:55PM -0400, Mikulas Patocka wrote:
>
> Index: linux-2.6/crypto/pcrypt.c
> ===
> --- linux-2.6.orig/crypto/pcrypt.c2020-06-29 16:03:07.346417000 +0200
> +++ linux-2.6/crypto/pcrypt.c 2020-06-30 20:11:56.636417000 +0200
> @@ -225,19 +225,21 @@ static int pcrypt_init_instance(struct c
>   return 0;
>  }
>  
> -static int pcrypt_create_aead(struct crypto_template *tmpl, struct rtattr 
> **tb,
> -   u32 type, u32 mask)
> +static int pcrypt_create_aead(struct crypto_template *tmpl, struct rtattr 
> **tb)
>  {

Rather than removing these two arguments, I think you should pass
along algt instead.

>   struct pcrypt_instance_ctx *ctx;
>   struct crypto_attr_type *algt;
>   struct aead_instance *inst;
>   struct aead_alg *alg;
> + u32 mask;
>   int err;
>  
>   algt = crypto_get_attr_type(tb);
>   if (IS_ERR(algt))
>   return PTR_ERR(algt);

Then we could remove this bit.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:21 PM, Jens Axboe wrote:
> On 6/30/20 12:19 PM, Christoph Hellwig wrote:
>> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>>> On 6/30/20 7:57 AM, Jens Axboe wrote:
 On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> Hi Jens,
>
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

 Looks good to me, and it's a nice cleanup as well. Applied.
>>>
>>> Dropped, insta-crashes with dm:
>>
>> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
>> Or your .config?
> 
> I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
> it's ->submit_bio == NULL. Let me know if you really need it, and I can
> re-generate the OOPS and have the vmlinux too.

Here's the .config

-- 
Jens Axboe

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.8.0-rc1 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 10.1.0-2ubuntu1~18.04) 10.1.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100100
CONFIG_LD_VERSION=23000
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_SCHED_THERMAL_PRESSURE is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=18
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT 

Re: [dm-devel] [PATCH v2] dm crypt: add flags to optionally bypass dm-crypt workqueues

2020-06-30 Thread Ignat Korchagin
On Tue, Jun 30, 2020 at 3:51 AM Damien Le Moal  wrote:
>
> On 2020/06/27 6:03, Ignat Korchagin wrote:
> > This is a follow up from [1]. Consider the following script:
> >
> > sudo modprobe brd rd_nr=1 rd_size=4194304
> >
> > echo '0 8388608 crypt capi:ecb(cipher_null) - 0 /dev/ram0 0' | \
> > sudo dmsetup create eram0
> >
> > echo '0 8388608 crypt capi:ecb(cipher_null) - 0 /dev/ram0 0 1 
> > no_write_workqueue' | \
> > sudo dmsetup create eram0-inline-write
> >
> > echo '0 8388608 crypt capi:ecb(cipher_null) - 0 /dev/ram0 0 1 
> > no_read_workqueue' | \
> > sudo dmsetup create eram0-inline-read
> >
> > devices="/dev/ram0 /dev/mapper/eram0 /dev/mapper/eram0-inline-read "
> > devices+="/dev/mapper/eram0-inline-write"
> >
> > for dev in $devices; do
> >   echo "reading from $dev"
> >   sudo fio --filename=$dev --readwrite=read --bs=4k --direct=1 \
> >   --loops=100 --runtime=3m --name=plain | grep READ
> > done
> >
> > for dev in $devices; do
> >   echo "writing to $dev"
> >   sudo fio --filename=$dev --readwrite=write --bs=4k --direct=1 \
> >   --loops=100 --runtime=3m --name=plain | grep WRITE
> > done
> >
> > This script creates a ramdisk (to eliminate hardware bias in the benchmark) 
> > and
> > three dm-crypt instances on top. All dm-crypt instances use the NULL cipher
> > to eliminate potentially expensive crypto bias (the NULL cipher just uses 
> > memcpy
> > for "encyption"). The first instance is the current dm-crypt implementation 
> > from
> > 5.8-rc2, the two others have new optional flags enabled, which bypass 
> > kcryptd
> > workqueues for reads and writes respectively and write sorting for writes. 
> > On
> > my VM (Debian in VirtualBox with 4 cores on 2.8 GHz Quad-Core Intel Core 
> > i7) I
> > get the following output (formatted for better readability):
> >
> > reading from /dev/ram0
> >READ: bw=508MiB/s (533MB/s), 508MiB/s-508MiB/s (533MB/s-533MB/s), 
> > io=89.3GiB (95.9GB), run=18-18msec
> >
> > reading from /dev/mapper/eram0
> >READ: bw=80.6MiB/s (84.5MB/s), 80.6MiB/s-80.6MiB/s (84.5MB/s-84.5MB/s), 
> > io=14.2GiB (15.2GB), run=18-18msec
> >
> > reading from /dev/mapper/eram0-inline-read
> >READ: bw=295MiB/s (309MB/s), 295MiB/s-295MiB/s (309MB/s-309MB/s), 
> > io=51.8GiB (55.6GB), run=18-18msec
> >
> > reading from /dev/mapper/eram0-inline-write
> >READ: bw=114MiB/s (120MB/s), 114MiB/s-114MiB/s (120MB/s-120MB/s), 
> > io=20.1GiB (21.5GB), run=18-18msec
> >
> > writing to /dev/ram0
> >   WRITE: bw=516MiB/s (541MB/s), 516MiB/s-516MiB/s (541MB/s-541MB/s), 
> > io=90.7GiB (97.4GB), run=180001-180001msec
> >
> > writing to /dev/mapper/eram0
> >   WRITE: bw=40.4MiB/s (42.4MB/s), 40.4MiB/s-40.4MiB/s (42.4MB/s-42.4MB/s), 
> > io=7271MiB (7624MB), run=180001-180001msec
> >
> > writing to /dev/mapper/eram0-inline-read
> >   WRITE: bw=38.9MiB/s (40.8MB/s), 38.9MiB/s-38.9MiB/s (40.8MB/s-40.8MB/s), 
> > io=7000MiB (7340MB), run=180001-180001msec
> >
> > writing to /dev/mapper/eram0-inline-write
> >   WRITE: bw=277MiB/s (290MB/s), 277MiB/s-277MiB/s (290MB/s-290MB/s), 
> > io=48.6GiB (52.2GB), run=18-18msec
> >
> > Current dm-crypt implementation creates a significant IO performance 
> > overhead
> > (at least on small IO block sizes) for both latency and throughput. We 
> > suspect
> > offloading IO request processing into workqueues and async threads is more
> > harmful these days with the modern fast storage. I also did some digging 
> > into
> > the dm-crypt git history and much of this async processing is not needed
> > anymore, because the reasons it was added are mostly gone from the kernel. 
> > More
> > details can be found in [2] (see "Git archeology" section).
> >
> > This change adds no_(read|write)_workqueue flags separately for read and 
> > write
> > BIOs, which direct dm-crypt not to offload crypto operations into kcryptd
> > workqueues and process everything inline. In addition, writes are not 
> > buffered
> > to be sorted in the dm-crypt red-black tree, but dispatched immediately. For
> > cases, where crypto operations cannot happen inline (hard interrupt context,
> > for example the read path of some NVME drivers), we offload the work to a
> > tasklet rather than a workqueue.
> >
> > These flags ensure inline BIO processing in the dm-crypt module only. It is
> > worth noting that some Crypto API implementations may offload encryption 
> > into
> > their own workqueues, which are independent of the dm-crypt and its
> > configuration. However upon enabling no_(read|write)_workqueue flags 
> > dm-crypt
> > will instruct Crypto API not to backlog crypto requests.
> >
> > [1]: https://www.spinics.net/lists/dm-crypt/msg07516.html
> > [2]: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/
> >
> > Signed-off-by: Ignat Korchagin 
> > ---
> >  drivers/md/dm-crypt.c | 68 +--
> >  1 file changed, 52 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/md/dm-crypt.c 

Re: [dm-devel] rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 12:19 PM, Christoph Hellwig wrote:
> On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
>> On 6/30/20 7:57 AM, Jens Axboe wrote:
>>> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
 Hi Jens,

 this series moves the make_request_fn method into block_device_operations
 with the much more descriptive ->submit_bio name.  It then also gives
 generic_make_request a more descriptive name, and further optimize the
 path to issue to blk-mq, removing the need for the direct_make_request
 bypass.
>>>
>>> Looks good to me, and it's a nice cleanup as well. Applied.
>>
>> Dropped, insta-crashes with dm:
> 
> Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
> Or your .config?

I'd have to apply and compile again. But it's a bad RIP, so I'm guessing
it's ->submit_bio == NULL. Let me know if you really need it, and I can
re-generate the OOPS and have the vmlinux too.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Christoph Hellwig
On Tue, Jun 30, 2020 at 09:43:31AM -0600, Jens Axboe wrote:
> On 6/30/20 7:57 AM, Jens Axboe wrote:
> > On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> >> Hi Jens,
> >>
> >> this series moves the make_request_fn method into block_device_operations
> >> with the much more descriptive ->submit_bio name.  It then also gives
> >> generic_make_request a more descriptive name, and further optimize the
> >> path to issue to blk-mq, removing the need for the direct_make_request
> >> bypass.
> > 
> > Looks good to me, and it's a nice cleanup as well. Applied.
> 
> Dropped, insta-crashes with dm:

Hmm.  Can you send me what is at "submit_bio_noacct+0x1f6" from gdb?
Or your .config?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 1/3 v6] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka
Introduce a new flag CRYPTO_ALG_ALLOCATES_MEMORY and pass it down the
crypto stack.

If the flag is set, then the crypto driver allocates memory in its request
routine. Such drivers are not suitable for disk encryption because
GFP_ATOMIC allocation can fail anytime (causing random I/O errors) and
GFP_KERNEL allocation can recurse into the block layer, causing a
deadlock.

Pass the flag CRYPTO_ALG_ALLOCATES_MEMORY down through the crypto API.

Signed-off-by: Mikulas Patocka 

---
 crypto/adiantum.c |8 +---
 crypto/authenc.c  |7 ---
 crypto/authencesn.c   |7 ---
 crypto/ccm.c  |8 
 crypto/chacha20poly1305.c |7 ---
 crypto/cryptd.c   |   16 +++-
 crypto/ctr.c  |4 ++--
 crypto/cts.c  |4 ++--
 crypto/essiv.c|4 ++--
 crypto/gcm.c  |   15 ---
 crypto/geniv.c|4 ++--
 crypto/lrw.c  |4 ++--
 crypto/pcrypt.c   |   14 +-
 crypto/rsa-pkcs1pad.c |4 ++--
 crypto/seqiv.c|2 ++
 crypto/xts.c  |4 ++--
 include/crypto/algapi.h   |   10 ++
 include/linux/crypto.h|   15 +++
 18 files changed, 86 insertions(+), 51 deletions(-)

Index: linux-2.6/include/linux/crypto.h
===
--- linux-2.6.orig/include/linux/crypto.h   2020-06-29 16:03:07.346417000 
+0200
+++ linux-2.6/include/linux/crypto.h2020-06-29 16:03:07.336417000 +0200
@@ -102,6 +102,21 @@
 #define CRYPTO_NOLOAD  0x8000
 
 /*
+ * The driver may allocate memory during request processing, so it shouldn't be
+ * used in cases where memory allocation failures aren't acceptable, such as
+ * during block device encryption.
+ */
+#define CRYPTO_ALG_ALLOCATES_MEMORY0x0001
+
+/*
+ * When an algorithm uses another algorithm (e.g., if it's an instance of a
+ * template), these are the flags that always get set on the "outer" algorithm
+ * if any "inner" algorithm has them set.  In some cases other flags are
+ * inherited too; these are just the flags that are *always* inherited.
+ */
+#define CRYPTO_ALG_INHERITED_FLAGS (CRYPTO_ALG_ASYNC | 
CRYPTO_ALG_ALLOCATES_MEMORY)
+
+/*
  * Transform masks and values (for crt_flags).
  */
 #define CRYPTO_TFM_NEED_KEY0x0001
Index: linux-2.6/crypto/authenc.c
===
--- linux-2.6.orig/crypto/authenc.c 2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authenc.c  2020-06-30 15:47:56.516417000 +0200
@@ -388,7 +388,7 @@ static int crypto_authenc_create(struct
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -423,8 +423,9 @@ static int crypto_authenc_create(struct
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/authencesn.c
===
--- linux-2.6.orig/crypto/authencesn.c  2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authencesn.c   2020-06-30 15:48:11.996417000 +0200
@@ -406,7 +406,7 @@ static int crypto_authenc_esn_create(str
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -437,8 +437,9 @@ static int crypto_authenc_esn_create(str
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/ccm.c

Re: [dm-devel] [PATCH 1/3 v4] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Eric Biggers wrote:

> On Tue, Jun 30, 2020 at 01:01:16PM -0400, Mikulas Patocka wrote:
> > > diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
> > > index 7240e8bbdebb..643f7f1cc91c 100644
> > > --- a/crypto/pcrypt.c
> > > +++ b/crypto/pcrypt.c
> > > @@ -232,12 +232,15 @@ static int pcrypt_create_aead(struct 
> > > crypto_template *tmpl, struct rtattr **tb,
> > >   struct crypto_attr_type *algt;
> > >   struct aead_instance *inst;
> > >   struct aead_alg *alg;
> > > + u32 mask;
> > >   int err;
> > >  
> > >   algt = crypto_get_attr_type(tb);
> > >   if (IS_ERR(algt))
> > >   return PTR_ERR(algt);
> > >  
> > > + mask = crypto_alg_inherited_mask(algt->type, algt->mask);
> > > +
> > >   inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
> > >   if (!inst)
> > >   return -ENOMEM;
> > > @@ -254,7 +257,7 @@ static int pcrypt_create_aead(struct crypto_template 
> > > *tmpl, struct rtattr **tb,
> > >   goto err_free_inst;
> > >  
> > >   err = crypto_grab_aead(>spawn, aead_crypto_instance(inst),
> > > -crypto_attr_alg_name(tb[1]), 0, 0);
> > > +crypto_attr_alg_name(tb[1]), 0, mask);
> > >   if (err)
> > >   goto err_free_inst;
> > >  
> > 
> > I added "mask" there - but there is still a "mask" argument that is 
> > unused - is it a bug to have two "mask" variables?
> 
> Right, I didn't see that algt->type and algt->mask are already being passed 
> to 
> pcrypt_create_aead().  It's redundant because pcrypt_create_aead() has access 
> to
> those via crypto_get_attr_type() anyway.
> 
> How about just removing those two arguments for now?
> 
> - Eric

Yes.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/3 v4] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
On Tue, Jun 30, 2020 at 01:01:16PM -0400, Mikulas Patocka wrote:
> > diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
> > index 7240e8bbdebb..643f7f1cc91c 100644
> > --- a/crypto/pcrypt.c
> > +++ b/crypto/pcrypt.c
> > @@ -232,12 +232,15 @@ static int pcrypt_create_aead(struct crypto_template 
> > *tmpl, struct rtattr **tb,
> > struct crypto_attr_type *algt;
> > struct aead_instance *inst;
> > struct aead_alg *alg;
> > +   u32 mask;
> > int err;
> >  
> > algt = crypto_get_attr_type(tb);
> > if (IS_ERR(algt))
> > return PTR_ERR(algt);
> >  
> > +   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
> > +
> > inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
> > if (!inst)
> > return -ENOMEM;
> > @@ -254,7 +257,7 @@ static int pcrypt_create_aead(struct crypto_template 
> > *tmpl, struct rtattr **tb,
> > goto err_free_inst;
> >  
> > err = crypto_grab_aead(>spawn, aead_crypto_instance(inst),
> > -  crypto_attr_alg_name(tb[1]), 0, 0);
> > +  crypto_attr_alg_name(tb[1]), 0, mask);
> > if (err)
> > goto err_free_inst;
> >  
> 
> I added "mask" there - but there is still a "mask" argument that is 
> unused - is it a bug to have two "mask" variables?

Right, I didn't see that algt->type and algt->mask are already being passed to 
pcrypt_create_aead().  It's redundant because pcrypt_create_aead() has access to
those via crypto_get_attr_type() anyway.

How about just removing those two arguments for now?

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/3 v4] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Eric Biggers wrote:

> On Tue, Jun 30, 2020 at 09:56:22AM -0400, Mikulas Patocka wrote:
> > Index: linux-2.6/crypto/cryptd.c
> > ===
> > --- linux-2.6.orig/crypto/cryptd.c  2020-06-29 16:03:07.346417000 +0200
> > +++ linux-2.6/crypto/cryptd.c   2020-06-30 15:49:04.206417000 +0200
> > @@ -202,6 +202,7 @@ static inline void cryptd_check_internal
> >  
> > *type |= algt->type & CRYPTO_ALG_INTERNAL;
> > *mask |= algt->mask & CRYPTO_ALG_INTERNAL;
> > +   *mask |= algt->mask & CRYPTO_ALG_INHERITED_FLAGS;
> >  }
> 
> This needs to use the correct logic (crypto_alg_inherited_mask) to decide 
> which
> inherited flags to set in 'mask'.

I added "*mask |= crypto_alg_inherited_mask(algt->type, algt->mask);" 
there.


> > --- linux-2.6.orig/crypto/adiantum.c2020-06-29 16:03:07.346417000 
> > +0200
> > +++ linux-2.6/crypto/adiantum.c 2020-06-29 16:03:07.346417000 +0200
> > @@ -507,7 +507,7 @@ static int adiantum_create(struct crypto
> > if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask)
> > return -EINVAL;
> >  
> > -   mask = crypto_requires_sync(algt->type, algt->mask);
> > +   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
> >  
> > inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
> > if (!inst)
> 
> This is still missing setting the flags correctly on the template instance 
> being
> constructed.  The flags need to be inherited from all "inner" algorithms:
> 
>   inst->alg.base.cra_flags = (streamcipher_alg->base.cra_flags |
>   blockcipher_alg->cra_flags |
>   hash_alg->base.cra_flags) &
>  CRYPTO_ALG_INHERITED_FLAGS;
> 
> If we don't do that, the template instance may allocate memory but not have
> CRYPTO_ALG_ALLOCATES_MEMORY set.

OK


> > Index: linux-2.6/crypto/pcrypt.c
> > ===
> > --- linux-2.6.orig/crypto/pcrypt.c  2020-06-29 16:03:07.346417000 +0200
> > +++ linux-2.6/crypto/pcrypt.c   2020-06-30 15:47:32.776417000 +0200
> > @@ -263,7 +263,9 @@ static int pcrypt_create_aead(struct cry
> > if (err)
> > goto err_free_inst;
> >  
> > -   inst->alg.base.cra_flags = CRYPTO_ALG_ASYNC;
> > +   inst->alg.base.cra_flags =
> > +   CRYPTO_ALG_ASYNC |
> > +   (alg->base.cra_flags & CRYPTO_ALG_INHERITED_FLAGS);
> >  
> > inst->alg.ivsize = crypto_aead_alg_ivsize(alg);
> > inst->alg.maxauthsize = crypto_aead_alg_maxauthsize(alg);

What's wrong with this?


> And this is still missing setting the mask:
> 
> diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
> index 7240e8bbdebb..643f7f1cc91c 100644
> --- a/crypto/pcrypt.c
> +++ b/crypto/pcrypt.c
> @@ -232,12 +232,15 @@ static int pcrypt_create_aead(struct crypto_template 
> *tmpl, struct rtattr **tb,
>   struct crypto_attr_type *algt;
>   struct aead_instance *inst;
>   struct aead_alg *alg;
> + u32 mask;
>   int err;
>  
>   algt = crypto_get_attr_type(tb);
>   if (IS_ERR(algt))
>   return PTR_ERR(algt);
>  
> + mask = crypto_alg_inherited_mask(algt->type, algt->mask);
> +
>   inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
>   if (!inst)
>   return -ENOMEM;
> @@ -254,7 +257,7 @@ static int pcrypt_create_aead(struct crypto_template 
> *tmpl, struct rtattr **tb,
>   goto err_free_inst;
>  
>   err = crypto_grab_aead(>spawn, aead_crypto_instance(inst),
> -crypto_attr_alg_name(tb[1]), 0, 0);
> +crypto_attr_alg_name(tb[1]), 0, mask);
>   if (err)
>   goto err_free_inst;
>  

I added "mask" there - but there is still a "mask" argument that is 
unused - is it a bug to have two "mask" variables?


> Can you please think logically about what you're trying to accomplish?

I am not an expert in crypto code.

> In particular, if someone requests an algorithm that doesn't allocate memory
> (which is indicated by type=0, mask=CRYPTO_ALG_ALLOCATES_MEMORY), that request
> needs to be honored.
> 
> - Eric

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 1/3 v5] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka
Introduce a new flag CRYPTO_ALG_ALLOCATES_MEMORY and pass it down the
crypto stack.

If the flag is set, then the crypto driver allocates memory in its request
routine. Such drivers are not suitable for disk encryption because
GFP_ATOMIC allocation can fail anytime (causing random I/O errors) and
GFP_KERNEL allocation can recurse into the block layer, causing a
deadlock.

Pass the flag CRYPTO_ALG_ALLOCATES_MEMORY down through the crypto API.

Signed-off-by: Mikulas Patocka 

---
 crypto/adiantum.c |8 +---
 crypto/authenc.c  |7 ---
 crypto/authencesn.c   |7 ---
 crypto/ccm.c  |8 
 crypto/chacha20poly1305.c |7 ---
 crypto/cryptd.c   |   16 +++-
 crypto/ctr.c  |4 ++--
 crypto/cts.c  |4 ++--
 crypto/essiv.c|4 ++--
 crypto/gcm.c  |   15 ---
 crypto/geniv.c|4 ++--
 crypto/lrw.c  |4 ++--
 crypto/pcrypt.c   |9 +++--
 crypto/rsa-pkcs1pad.c |4 ++--
 crypto/seqiv.c|2 ++
 crypto/xts.c  |4 ++--
 include/crypto/algapi.h   |   10 ++
 include/linux/crypto.h|   15 +++
 18 files changed, 84 insertions(+), 48 deletions(-)

Index: linux-2.6/include/linux/crypto.h
===
--- linux-2.6.orig/include/linux/crypto.h   2020-06-29 16:03:07.346417000 
+0200
+++ linux-2.6/include/linux/crypto.h2020-06-29 16:03:07.336417000 +0200
@@ -102,6 +102,21 @@
 #define CRYPTO_NOLOAD  0x8000
 
 /*
+ * The driver may allocate memory during request processing, so it shouldn't be
+ * used in cases where memory allocation failures aren't acceptable, such as
+ * during block device encryption.
+ */
+#define CRYPTO_ALG_ALLOCATES_MEMORY0x0001
+
+/*
+ * When an algorithm uses another algorithm (e.g., if it's an instance of a
+ * template), these are the flags that always get set on the "outer" algorithm
+ * if any "inner" algorithm has them set.  In some cases other flags are
+ * inherited too; these are just the flags that are *always* inherited.
+ */
+#define CRYPTO_ALG_INHERITED_FLAGS (CRYPTO_ALG_ASYNC | 
CRYPTO_ALG_ALLOCATES_MEMORY)
+
+/*
  * Transform masks and values (for crt_flags).
  */
 #define CRYPTO_TFM_NEED_KEY0x0001
Index: linux-2.6/crypto/authenc.c
===
--- linux-2.6.orig/crypto/authenc.c 2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authenc.c  2020-06-30 15:47:56.516417000 +0200
@@ -388,7 +388,7 @@ static int crypto_authenc_create(struct
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -423,8 +423,9 @@ static int crypto_authenc_create(struct
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/authencesn.c
===
--- linux-2.6.orig/crypto/authencesn.c  2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authencesn.c   2020-06-30 15:48:11.996417000 +0200
@@ -406,7 +406,7 @@ static int crypto_authenc_esn_create(str
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -437,8 +437,9 @@ static int crypto_authenc_esn_create(str
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/ccm.c

Re: [dm-devel] [PATCH 1/3 v4] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Eric Biggers
On Tue, Jun 30, 2020 at 09:56:22AM -0400, Mikulas Patocka wrote:
> Index: linux-2.6/crypto/cryptd.c
> ===
> --- linux-2.6.orig/crypto/cryptd.c2020-06-29 16:03:07.346417000 +0200
> +++ linux-2.6/crypto/cryptd.c 2020-06-30 15:49:04.206417000 +0200
> @@ -202,6 +202,7 @@ static inline void cryptd_check_internal
>  
>   *type |= algt->type & CRYPTO_ALG_INTERNAL;
>   *mask |= algt->mask & CRYPTO_ALG_INTERNAL;
> + *mask |= algt->mask & CRYPTO_ALG_INHERITED_FLAGS;
>  }

This needs to use the correct logic (crypto_alg_inherited_mask) to decide which
inherited flags to set in 'mask'.

> --- linux-2.6.orig/crypto/adiantum.c  2020-06-29 16:03:07.346417000 +0200
> +++ linux-2.6/crypto/adiantum.c   2020-06-29 16:03:07.346417000 +0200
> @@ -507,7 +507,7 @@ static int adiantum_create(struct crypto
>   if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask)
>   return -EINVAL;
>  
> - mask = crypto_requires_sync(algt->type, algt->mask);
> + mask = crypto_alg_inherited_mask(algt->type, algt->mask);
>  
>   inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
>   if (!inst)

This is still missing setting the flags correctly on the template instance being
constructed.  The flags need to be inherited from all "inner" algorithms:

inst->alg.base.cra_flags = (streamcipher_alg->base.cra_flags |
blockcipher_alg->cra_flags |
hash_alg->base.cra_flags) &
   CRYPTO_ALG_INHERITED_FLAGS;

If we don't do that, the template instance may allocate memory but not have
CRYPTO_ALG_ALLOCATES_MEMORY set.

> Index: linux-2.6/crypto/pcrypt.c
> ===
> --- linux-2.6.orig/crypto/pcrypt.c2020-06-29 16:03:07.346417000 +0200
> +++ linux-2.6/crypto/pcrypt.c 2020-06-30 15:47:32.776417000 +0200
> @@ -263,7 +263,9 @@ static int pcrypt_create_aead(struct cry
>   if (err)
>   goto err_free_inst;
>  
> - inst->alg.base.cra_flags = CRYPTO_ALG_ASYNC;
> + inst->alg.base.cra_flags =
> + CRYPTO_ALG_ASYNC |
> + (alg->base.cra_flags & CRYPTO_ALG_INHERITED_FLAGS);
>  
>   inst->alg.ivsize = crypto_aead_alg_ivsize(alg);
>   inst->alg.maxauthsize = crypto_aead_alg_maxauthsize(alg);

And this is still missing setting the mask:

diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c
index 7240e8bbdebb..643f7f1cc91c 100644
--- a/crypto/pcrypt.c
+++ b/crypto/pcrypt.c
@@ -232,12 +232,15 @@ static int pcrypt_create_aead(struct crypto_template 
*tmpl, struct rtattr **tb,
struct crypto_attr_type *algt;
struct aead_instance *inst;
struct aead_alg *alg;
+   u32 mask;
int err;
 
algt = crypto_get_attr_type(tb);
if (IS_ERR(algt))
return PTR_ERR(algt);
 
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
+
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
return -ENOMEM;
@@ -254,7 +257,7 @@ static int pcrypt_create_aead(struct crypto_template *tmpl, 
struct rtattr **tb,
goto err_free_inst;
 
err = crypto_grab_aead(>spawn, aead_crypto_instance(inst),
-  crypto_attr_alg_name(tb[1]), 0, 0);
+  crypto_attr_alg_name(tb[1]), 0, mask);
if (err)
goto err_free_inst;
 

Can you please think logically about what you're trying to accomplish?

In particular, if someone requests an algorithm that doesn't allocate memory
(which is indicated by type=0, mask=CRYPTO_ALG_ALLOCATES_MEMORY), that request
needs to be honored.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Mikulas Patocka wrote:

> 
> 
> On Tue, 30 Jun 2020, Michal Suchanek wrote:
> 
> > The writecache driver does not handle asynchronous pmem. Reject it when
> > supplied as cache.
> > 
> > Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > 
> > Signed-off-by: Michal Suchanek 
> 
> Acked-by: Mikulas Patocka 

BTW, we should also add

Cc: sta...@vger.kernel.org  # v5.3+

> > ---
> >  drivers/md/dm-writecache.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> > index 30505d70f423..5358894bb9fd 100644
> > --- a/drivers/md/dm-writecache.c
> > +++ b/drivers/md/dm-writecache.c
> > @@ -2266,6 +2266,12 @@ static int writecache_ctr(struct dm_target *ti, 
> > unsigned argc, char **argv)
> > }
> >  
> > if (WC_MODE_PMEM(wc)) {
> > +   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> > +   r = -EOPNOTSUPP;
> > +   ti->error = "Asynchronous persistent memory not 
> > supported as pmem cache";
> > +   goto bad;
> > +   }
> > +
> > r = persistent_memory_claim(wc);
> > if (r) {
> > ti->error = "Unable to map persistent memory for cache";
> > -- 
> > 2.26.2
> > 
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v3] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Michal Suchanek wrote:

> The writecache driver does not handle asynchronous pmem. Reject it when
> supplied as cache.
> 
> Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> 
> Signed-off-by: Michal Suchanek 

Acked-by: Mikulas Patocka 

> ---
>  drivers/md/dm-writecache.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 30505d70f423..5358894bb9fd 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -2266,6 +2266,12 @@ static int writecache_ctr(struct dm_target *ti, 
> unsigned argc, char **argv)
>   }
>  
>   if (WC_MODE_PMEM(wc)) {
> + if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> + r = -EOPNOTSUPP;
> + ti->error = "Asynchronous persistent memory not 
> supported as pmem cache";
> + goto bad;
> + }
> +
>   r = persistent_memory_claim(wc);
>   if (r) {
>   ti->error = "Unable to map persistent memory for cache";
> -- 
> 2.26.2
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v3] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Michal Suchanek
The writecache driver does not handle asynchronous pmem. Reject it when
supplied as cache.

Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")

Signed-off-by: Michal Suchanek 
---
 drivers/md/dm-writecache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 30505d70f423..5358894bb9fd 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -2266,6 +2266,12 @@ static int writecache_ctr(struct dm_target *ti, unsigned 
argc, char **argv)
}
 
if (WC_MODE_PMEM(wc)) {
+   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
+   r = -EOPNOTSUPP;
+   ti->error = "Asynchronous persistent memory not 
supported as pmem cache";
+   goto bad;
+   }
+
r = persistent_memory_claim(wc);
if (r) {
ti->error = "Unable to map persistent memory for cache";
-- 
2.26.2


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/30/20 7:57 AM, Jens Axboe wrote:
> On 6/29/20 1:39 PM, Christoph Hellwig wrote:
>> Hi Jens,
>>
>> this series moves the make_request_fn method into block_device_operations
>> with the much more descriptive ->submit_bio name.  It then also gives
>> generic_make_request a more descriptive name, and further optimize the
>> path to issue to blk-mq, removing the need for the direct_make_request
>> bypass.
> 
> Looks good to me, and it's a nice cleanup as well. Applied.

Dropped, insta-crashes with dm:

[   10.240134] BUG: kernel NULL pointer dereference, address: 
[   10.241000] #PF: supervisor instruction fetch in kernel mode
[   10.241666] #PF: error_code(0x0010) - not-present page
[   10.242280] PGD 0 P4D 0 
[   10.242600] Oops: 0010 [#1] PREEMPT SMP
[   10.243073] CPU: 1 PID: 2110 Comm: systemd-udevd Not tainted 5.8.0-rc3+ #6655
[   10.243939] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.13.0-1ubuntu1 04/01/2014
[   10.245012] RIP: 0010:0x0
[   10.245322] Code: Bad RIP value.
[   10.245695] RSP: 0018:c92f7af8 EFLAGS: 00010246
[   10.246333] RAX: 81c83520 RBX: 8881b805dea8 RCX: 88819e844070
[   10.247227] RDX:  RSI:  RDI: 88819e844070
[   10.248112] RBP: c92f7b48 R08: 8881b6f38800 R09: 88818ff0ea58
[   10.248994] R10:  R11: 88818ff0ea58 R12: 88819e844070
[   10.250077] R13:  R14:  R15: 888107812948
[   10.251168] FS:  7f5c3ed66a80() GS:8881b9c8() 
knlGS:
[   10.252161] CS:  0010 DS:  ES:  CR0: 80050033
[   10.253189] CR2: ffd6 CR3: 0001b2953003 CR4: 001606e0
[   10.254157] DR0:  DR1:  DR2: 
[   10.255279] DR3:  DR6: fffe0ff0 DR7: 0400
[   10.256365] Call Trace:
[   10.256781]  submit_bio_noacct+0x1f6/0x3d0
[   10.257297]  submit_bio+0x37/0x130
[   10.257780]  ? guard_bio_eod+0x2e/0x70
[   10.258418]  mpage_readahead+0x13c/0x180
[   10.259096]  ? blkdev_direct_IO+0x490/0x490
[   10.259654]  read_pages+0x68/0x2d0
[   10.260051]  page_cache_readahead_unbounded+0x1b7/0x220
[   10.260818]  generic_file_buffered_read+0x865/0xc80
[   10.261587]  ? _copy_to_user+0x6d/0x80
[   10.262171]  ? cp_new_stat+0x119/0x130
[   10.262680]  new_sync_read+0xfe/0x170
[   10.263155]  vfs_read+0xc8/0x180
[   10.263647]  ksys_read+0x53/0xc0
[   10.264209]  do_syscall_64+0x3c/0x70
[   10.264759]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.265200] RIP: 0033:0x7f5c3fcc9ab2
[   10.265510] Code: Bad RIP value.
[   10.265775] RSP: 002b:7ffc8e0cf9c8 EFLAGS: 0246 ORIG_RAX: 

[   10.266426] RAX: ffda RBX: 55d5eca76c68 RCX: 7f5c3fcc9ab2
[   10.267012] RDX: 0040 RSI: 55d5eca76c78 RDI: 0006
[   10.267591] RBP: 55d5eca44890 R08: 55d5eca76c50 R09: 7f5c3fd99a40
[   10.268168] R10: 0008 R11: 0246 R12: 3bd9
[   10.268744] R13: 0040 R14: 55d5eca76c50 R15: 55d5eca448e0
[   10.269319] Modules linked in:
[   10.269562] CR2: 
[   10.269845] ---[ end trace f09b8963e5a3593b ]---

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH v2] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Michal Suchanek wrote:

> The writecache driver does not handle asynchronous pmem. Reject it when
> supplied as cache.
> 
> Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> 
> Signed-off-by: Michal Suchanek 

OK. I suggest to move this test before persistent_memory_claim (so that 
you don't try to map the whole device).

Mikulas

> ---
>  drivers/md/dm-writecache.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 30505d70f423..1e4f37249e28 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -2271,6 +2271,12 @@ static int writecache_ctr(struct dm_target *ti, 
> unsigned argc, char **argv)
>   ti->error = "Unable to map persistent memory for cache";
>   goto bad;
>   }
> +
> + if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> + r = -EOPNOTSUPP;
> + ti->error = "Asynchronous persistent memory not 
> supported as pmem cache";
> + goto bad;
> + }
>   } else {
>   size_t n_blocks, n_metadata_blocks;
>   uint64_t n_bitmap_bits;
> -- 
> 2.26.2
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH v2] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Michal Suchanek
The writecache driver does not handle asynchronous pmem. Reject it when
supplied as cache.

Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")

Signed-off-by: Michal Suchanek 
---
 drivers/md/dm-writecache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 30505d70f423..1e4f37249e28 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -2271,6 +2271,12 @@ static int writecache_ctr(struct dm_target *ti, unsigned 
argc, char **argv)
ti->error = "Unable to map persistent memory for cache";
goto bad;
}
+
+   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
+   r = -EOPNOTSUPP;
+   ti->error = "Asynchronous persistent memory not 
supported as pmem cache";
+   goto bad;
+   }
} else {
size_t n_blocks, n_metadata_blocks;
uint64_t n_bitmap_bits;
-- 
2.26.2


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 04/11] block: use block_bio event class for bio_bounce

2020-06-30 Thread Chaitanya Kulkarni
Remove block_bio_bounce trace event which shares the code with block_bio.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 56 
 1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 5e9f46469f50..d7289576f1fd 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -221,44 +221,6 @@ DEFINE_EVENT(block_rq, block_rq_merge,
TP_ARGS(rq)
 );
 
-/**
- * block_bio_bounce - used bounce buffer when processing block operation
- * @bio: block operation
- *
- * A bounce buffer was used to handle the block operation @bio in queue.
- * This occurs when hardware limitations prevent a direct transfer of
- * data between the @bio data memory area and the IO device.  Use of a
- * bounce buffer requires extra copying of data and decreases
- * performance.
- */
-TRACE_EVENT(block_bio_bounce,
-
-   TP_PROTO(struct bio *bio),
-
-   TP_ARGS(bio),
-
-   TP_STRUCT__entry(
-   __field( dev_t, dev )
-   __field( sector_t,  sector  )
-   __field( unsigned int,  nr_sector   )
-   __array( char,  rwbs,   RWBS_LEN)
-   __array( char,  comm,   TASK_COMM_LEN   )
-   ),
-
-   TP_fast_assign(
-   __entry->dev= bio_dev(bio);
-   __entry->sector = bio->bi_iter.bi_sector;
-   __entry->nr_sector  = bio_sectors(bio);
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
-   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
-   ),
-
-   TP_printk("%d,%d %s %llu + %u [%s]",
- MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
- (unsigned long long)__entry->sector,
- __entry->nr_sector, __entry->comm)
-);
-
 /**
  * block_bio_complete - completed all work on the block operation
  * @q: queue holding the block operation
@@ -351,6 +313,24 @@ DEFINE_EVENT(block_bio, block_bio_frontmerge,
TP_ARGS(bio)
 );
 
+/**
+ * block_bio_bounce - used bounce buffer when processing block operation
+ * @bio: block operation
+ *
+ * A bounce buffer was used to handle the block operation @bio in queue.
+ * This occurs when hardware limitations prevent a direct transfer of
+ * data between the @bio data memory area and the IO device.  Use of a
+ * bounce buffer requires extra copying of data and decreases
+ * performance.
+ */
+
+DEFINE_EVENT(block_bio, block_bio_bounce,
+
+   TP_PROTO(struct bio *bio),
+
+   TP_ARGS(bio)
+);
+
 /**
  * block_bio_queue - putting new block IO operation in queue
  * @bio: new block operation
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 07/11] block: get rid of blk_trace_request_get_cgid()

2020-06-30 Thread Chaitanya Kulkarni
Now that we have done cleanup we can safely get rid of the
blk_trace_request_get_cgid() and replace it with
blk_trace_bio_get_cgid().

Signed-off-by: Chaitanya Kulkarni 
---
 kernel/trace/blktrace.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 1d36e6153ab8..bb864a50307f 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -804,15 +804,6 @@ u64 blk_trace_bio_get_cgid(struct request_queue *q, struct 
bio *bio)
 }
 #endif
 
-static u64
-blk_trace_request_get_cgid(struct request_queue *q, struct request *rq)
-{
-   if (!rq->bio)
-   return 0;
-   /* Use the first bio */
-   return blk_trace_bio_get_cgid(q, rq->bio);
-}
-
 /*
  * blktrace probes
  */
@@ -854,32 +845,32 @@ static void blk_add_trace_rq(struct request *rq, int 
error,
 static void blk_add_trace_rq_insert(void *ignore, struct request *rq)
 {
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_INSERT,
-blk_trace_request_get_cgid(rq->q, rq));
+rq->bio ? blk_trace_bio_get_cgid(rq->q, rq->bio) : 0);
 }
 
 static void blk_add_trace_rq_issue(void *ignore, struct request *rq)
 {
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_ISSUE,
-blk_trace_request_get_cgid(rq->q, rq));
+rq->bio ? blk_trace_bio_get_cgid(rq->q, rq->bio) : 0);
 }
 
 static void blk_add_trace_rq_merge(void *ignore, struct request *rq)
 {
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_BACKMERGE,
-blk_trace_request_get_cgid(rq->q, rq));
+rq->bio ? blk_trace_bio_get_cgid(rq->q, rq->bio) : 0);
 }
 
 static void blk_add_trace_rq_requeue(void *ignore, struct request *rq)
 {
blk_add_trace_rq(rq, 0, blk_rq_bytes(rq), BLK_TA_REQUEUE,
-blk_trace_request_get_cgid(rq->q, rq));
+rq->bio ? blk_trace_bio_get_cgid(rq->q, rq->bio) : 0);
 }
 
 static void blk_add_trace_rq_complete(void *ignore, struct request *rq,
int error, unsigned int nr_bytes)
 {
blk_add_trace_rq(rq, error, nr_bytes, BLK_TA_COMPLETE,
-blk_trace_request_get_cgid(rq->q, rq));
+rq->bio ? blk_trace_bio_get_cgid(rq->q, rq->bio) : 0);
 }
 
 /**
@@ -1105,7 +1096,8 @@ static void blk_add_trace_rq_remap(void *ignore,
 
__blk_add_trace(bt, blk_rq_pos(rq), blk_rq_bytes(rq),
rq_data_dir(rq), 0, BLK_TA_REMAP, 0,
-   sizeof(r), , blk_trace_request_get_cgid(q, rq));
+   sizeof(r), ,
+   rq->bio ? blk_trace_bio_get_cgid(q, rq->bio) : 0);
rcu_read_unlock();
 }
 
@@ -1134,8 +1126,8 @@ void blk_add_driver_data(struct request_queue *q,
}
 
__blk_add_trace(bt, blk_rq_trace_sector(rq), blk_rq_bytes(rq), 0, 0,
-   BLK_TA_DRV_DATA, 0, len, data,
-   blk_trace_request_get_cgid(q, rq));
+   BLK_TA_DRV_DATA, 0, len, data,
+   rq->bio ? blk_trace_bio_get_cgid(q, rq->bio) : 0);
rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(blk_add_driver_data);
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Michal Suchánek
On Tue, Jun 30, 2020 at 09:32:01AM -0400, Mikulas Patocka wrote:
> 
> 
> On Tue, 30 Jun 2020, Michal Suchanek wrote:
> 
> > The writecache driver does not handle asynchronous pmem. Reject it when
> > supplied as cache.
> > 
> > Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > 
> > Signed-off-by: Michal Suchanek 
> > ---
> >  drivers/md/dm-writecache.c | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> > index 30505d70f423..57b0a972f6fd 100644
> > --- a/drivers/md/dm-writecache.c
> > +++ b/drivers/md/dm-writecache.c
> > @@ -2277,6 +2277,12 @@ static int writecache_ctr(struct dm_target *ti, 
> > unsigned argc, char **argv)
> >  
> > wc->memory_map_size -= (uint64_t)wc->start_sector << 
> > SECTOR_SHIFT;
> >  
> > +   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> > +   r = -EOPNOTSUPP;
> > +   ti->error = "Asynchronous persistent memory not 
> > supported as pmem cache";
> > +   goto bad;
> > +   }
> > +
> > bio_list_init(>flush_list);
> > wc->flush_thread = kthread_create(writecache_flush_thread, wc, 
> > "dm_writecache_flush");
> > if (IS_ERR(wc->flush_thread)) {
> > -- 
> 
> Hi
> 
> Shouldn't this be in the "if (WC_MODE_PMEM(wc))" block?
That should be always the case at this point.
> 
> WC_MODE_PMEM(wc) retrurns true if we are using persistent memory as a 
> cache device, otherwise we are using generic block device as a cache 
> device.
This is to prevent the situation where we have WC_MODE_PMEM(wc) but
cannot guarantee consistency because the async flush is not handled.

Thanks

Michal

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 11/11] block: remove block_get_rq event class

2020-06-30 Thread Chaitanya Kulkarni
Now that there are no users for the block_get_rq event class remove the
event class.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 28 
 1 file changed, 28 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 21f1daaf012b..dc1834250baa 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -353,34 +353,6 @@ DEFINE_EVENT(block_bio, block_bio_queue,
TP_ARGS(bio)
 );
 
-DECLARE_EVENT_CLASS(block_get_rq,
-
-   TP_PROTO(struct bio *bio),
-
-   TP_ARGS(bio),
-
-   TP_STRUCT__entry(
-   __field( dev_t, dev )
-   __field( sector_t,  sector  )
-   __field( unsigned int,  nr_sector   )
-   __array( char,  rwbs,   RWBS_LEN)
-   __array( char,  comm,   TASK_COMM_LEN   )
-),
-
-   TP_fast_assign(
-   __entry->dev= bio ? bio_dev(bio) : 0;
-   __entry->sector = bio ? bio->bi_iter.bi_sector : 0;
-   __entry->nr_sector  = bio ? bio_sectors(bio) : 0;
-   blk_fill_rwbs(__entry->rwbs, bio ? bio->bi_opf : 0);
-   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
-),
-
-   TP_printk("%d,%d %s %llu + %u [%s]",
- MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
- (unsigned long long)__entry->sector,
- __entry->nr_sector, __entry->comm)
-);
-
 /**
  * block_getrq - get a free request entry in queue for block IO operations
  * @bio: pending block IO operation (can be %NULL)
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 03/11] block: use block_bio event class for bio_queue

2020-06-30 Thread Chaitanya Kulkarni
Remove block_bio_queue trace event which shares the code with block_bio.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 26 +++---
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index b5be387c4115..5e9f46469f50 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -357,32 +357,12 @@ DEFINE_EVENT(block_bio, block_bio_frontmerge,
  *
  * About to place the block IO operation @bio into queue.
  */
-TRACE_EVENT(block_bio_queue,
 
-   TP_PROTO(struct bio *bio),
-
-   TP_ARGS(bio),
-
-   TP_STRUCT__entry(
-   __field( dev_t, dev )
-   __field( sector_t,  sector  )
-   __field( unsigned int,  nr_sector   )
-   __array( char,  rwbs,   RWBS_LEN)
-   __array( char,  comm,   TASK_COMM_LEN   )
-   ),
+DEFINE_EVENT(block_bio, block_bio_queue,
 
-   TP_fast_assign(
-   __entry->dev= bio_dev(bio);
-   __entry->sector = bio->bi_iter.bi_sector;
-   __entry->nr_sector  = bio_sectors(bio);
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
-   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
-   ),
+   TP_PROTO(struct bio *bio),
 
-   TP_printk("%d,%d %s %llu + %u [%s]",
- MAJOR(__entry->dev), MINOR(__entry->dev), __entry->rwbs,
- (unsigned long long)__entry->sector,
- __entry->nr_sector, __entry->comm)
+   TP_ARGS(bio)
 );
 
 DECLARE_EVENT_CLASS(block_get_rq,
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 00/10] block: blktrace framework cleanup

2020-06-30 Thread Chaitanya Kulkarni
Hi,

There are many places where trace API accepts the struct request_queue*
parameter which can be derived from other function parameters.

This patch-series removes the struct request queue parameter from the
blktrace framework and adjusts the tracepoints definition and usage
along with the tracing API itself.

Also with the queue parameter removed now we can merge some of the trace
events with creating generic event class for bio based tarce events.

Finally few of more cleanups to remove the trace_block_rq_insert()
wrapper blk_mq_sched_request_inserted(), get rid of the extra parameter
for trace_block_getrq, remove the blk_trace_request_get_cgid(),
fix tracepoint comment header,  and blk_fill_rwbs() related triggred
cleanup.

I've added patches after first cleanup as I scan the code for more
cleanup. I think patch distribution can be better but it will be great
to have some feedback as the amount of clenaup has grown bigger.

Any comments are welcome.

Regards,
Chaitanya

Chaitanya Kulkarni (11):
  block: blktrace framework cleanup
  block: rename block_bio_merge class to block_bio
  block: use block_bio event class for bio_queue
  block: use block_bio event class for bio_bounce
  block: get rid of the trace rq insert wrapper
  block: remove extra param for trace_block_getrq()
  block: get rid of blk_trace_request_get_cgid()
  block: fix the comments in the trace event block.h
  block: remove unsed param in blk_fill_rwbs()
  block: use block_bio class for getrq and sleeprq
  block: remove block_get_rq event class

 block/bfq-iosched.c   |   4 +-
 block/blk-core.c  |   6 +-
 block/blk-merge.c |   4 +-
 block/blk-mq-sched.c  |   6 -
 block/blk-mq-sched.h  |   1 -
 block/blk-mq.c|  10 +-
 block/bounce.c|   2 +-
 block/kyber-iosched.c |   4 +-
 block/mq-deadline.c   |   4 +-
 drivers/md/dm.c   |   4 +-
 include/linux/blktrace_api.h  |   2 +-
 include/trace/events/bcache.h |  10 +-
 include/trace/events/block.h  | 226 +++---
 kernel/trace/blktrace.c   | 129 +--
 14 files changed, 161 insertions(+), 251 deletions(-)

-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Michal Suchánek
On Tue, Jun 30, 2020 at 09:36:33AM -0400, Mike Snitzer wrote:
> On Tue, Jun 30 2020 at 10:10am -0400,
> Michal Suchánek  wrote:
> 
> > On Tue, Jun 30, 2020 at 09:32:01AM -0400, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Tue, 30 Jun 2020, Michal Suchanek wrote:
> > > 
> > > > The writecache driver does not handle asynchronous pmem. Reject it when
> > > > supplied as cache.
> > > > 
> > > > Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> > > > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > > > 
> > > > Signed-off-by: Michal Suchanek 
> > > > ---
> > > >  drivers/md/dm-writecache.c | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> > > > index 30505d70f423..57b0a972f6fd 100644
> > > > --- a/drivers/md/dm-writecache.c
> > > > +++ b/drivers/md/dm-writecache.c
> > > > @@ -2277,6 +2277,12 @@ static int writecache_ctr(struct dm_target *ti, 
> > > > unsigned argc, char **argv)
> > > >  
> > > > wc->memory_map_size -= (uint64_t)wc->start_sector << 
> > > > SECTOR_SHIFT;
> > > >  
> > > > +   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> > > > +   r = -EOPNOTSUPP;
> > > > +   ti->error = "Asynchronous persistent memory not 
> > > > supported as pmem cache";
> > > > +   goto bad;
> > > > +   }
> > > > +
> > > > bio_list_init(>flush_list);
> > > > wc->flush_thread = 
> > > > kthread_create(writecache_flush_thread, wc, "dm_writecache_flush");
> > > > if (IS_ERR(wc->flush_thread)) {
> > > > -- 
> > > 
> > > Hi
> > > 
> > > Shouldn't this be in the "if (WC_MODE_PMEM(wc))" block?
> > That should be always the case at this point.
> > > 
> > > WC_MODE_PMEM(wc) retrurns true if we are using persistent memory as a 
> > > cache device, otherwise we are using generic block device as a cache 
> > > device.
> >
> > This is to prevent the situation where we have WC_MODE_PMEM(wc) but
> > cannot guarantee consistency because the async flush is not handled.
> 
> The writecache operates in 2 modes.  SSD or PMEM.  Mikulas is saying
> your dax_synchronous() check should go within a WC_MODE_PMEM(wc) block
> because it doesn't make sense to do the check when in SSD mode.

Indeed, it's in the wrong if/else branch.

Thanks

Michal


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 09/11] block: remove unsed param in blk_fill_rwbs()

2020-06-30 Thread Chaitanya Kulkarni
The last parameter for the function blk_fill_rwbs() was added in
5782138e47 ("tracing/events: convert block trace points to
TRACE_EVENT()") in order to signal read request and use of that parameter
was replaced with using switch case REQ_OP_READ with
1b9a9ab78b0 ("blktrace: use op accessors"), but the parameter was never
removed.

This patch removes unused parameter which also allows us to merge existing
trace points in the following patch.

Signed-off-by: Chaitanya Kulkarni 
---
 include/linux/blktrace_api.h  |  2 +-
 include/trace/events/bcache.h | 10 +-
 include/trace/events/block.h  | 19 +--
 kernel/trace/blktrace.c   |  2 +-
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h
index 3b6ff5902edc..80123eebf005 100644
--- a/include/linux/blktrace_api.h
+++ b/include/linux/blktrace_api.h
@@ -120,7 +120,7 @@ struct compat_blk_user_trace_setup {
 
 #endif
 
-extern void blk_fill_rwbs(char *rwbs, unsigned int op, int bytes);
+extern void blk_fill_rwbs(char *rwbs, unsigned int op);
 
 static inline sector_t blk_rq_trace_sector(struct request *rq)
 {
diff --git a/include/trace/events/bcache.h b/include/trace/events/bcache.h
index 0bddea663b3b..41415637e92c 100644
--- a/include/trace/events/bcache.h
+++ b/include/trace/events/bcache.h
@@ -28,7 +28,7 @@ DECLARE_EVENT_CLASS(bcache_request,
__entry->sector = bio->bi_iter.bi_sector;
__entry->orig_sector= bio->bi_iter.bi_sector - 16;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
),
 
TP_printk("%d,%d %s %llu + %u (from %d,%d @ %llu)",
@@ -102,7 +102,7 @@ DECLARE_EVENT_CLASS(bcache_bio,
__entry->dev= bio_dev(bio);
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
),
 
TP_printk("%d,%d  %s %llu + %u",
@@ -137,7 +137,7 @@ TRACE_EVENT(bcache_read,
__entry->dev= bio_dev(bio);
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
__entry->cache_hit = hit;
__entry->bypass = bypass;
),
@@ -168,7 +168,7 @@ TRACE_EVENT(bcache_write,
__entry->inode  = inode;
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
__entry->writeback = writeback;
__entry->bypass = bypass;
),
@@ -238,7 +238,7 @@ TRACE_EVENT(bcache_journal_write,
__entry->sector = bio->bi_iter.bi_sector;
__entry->nr_sector  = bio->bi_iter.bi_size >> 9;
__entry->nr_keys= keys;
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf, bio->bi_iter.bi_size);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
),
 
TP_printk("%d,%d  %s %llu + %u keys %u",
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 6a74fb9f4dba..d191d2cd1070 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -89,7 +89,7 @@ TRACE_EVENT(block_rq_requeue,
__entry->sector= blk_rq_trace_sector(rq);
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
+   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
__get_str(cmd)[0] = '\0';
),
 
@@ -132,7 +132,7 @@ TRACE_EVENT(block_rq_complete,
__entry->nr_sector = nr_bytes >> 9;
__entry->error = error;
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, nr_bytes);
+   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
__get_str(cmd)[0] = '\0';
),
 
@@ -165,7 +165,7 @@ DECLARE_EVENT_CLASS(block_rq,
__entry->nr_sector = blk_rq_trace_nr_sectors(rq);
__entry->bytes = blk_rq_bytes(rq);
 
-   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags, blk_rq_bytes(rq));
+   blk_fill_rwbs(__entry->rwbs, rq->cmd_flags);
__get_str(cmd)[0] = '\0';
memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
),
@@ -248,7 +248,7 @@ 

[dm-devel] [PATCH 02/11] block: rename block_bio_merge class to block_bio

2020-06-30 Thread Chaitanya Kulkarni
There are identical TRACE_EVENTS presents which can now take an
advantage of the block_bio_merge trace event class.

This is a prep patch which renames block_bio_merge to block_bio so
that the next patches in this series will be able to resue it.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 237d40a48429..b5be387c4115 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -295,7 +295,7 @@ TRACE_EVENT(block_bio_complete,
  __entry->nr_sector, __entry->error)
 );
 
-DECLARE_EVENT_CLASS(block_bio_merge,
+DECLARE_EVENT_CLASS(block_bio,
 
TP_PROTO(struct bio *bio),
 
@@ -330,7 +330,7 @@ DECLARE_EVENT_CLASS(block_bio_merge,
  * Merging block request @bio to the end of an existing block request
  * in queue.
  */
-DEFINE_EVENT(block_bio_merge, block_bio_backmerge,
+DEFINE_EVENT(block_bio, block_bio_backmerge,
 
TP_PROTO(struct bio *bio),
 
@@ -344,7 +344,7 @@ DEFINE_EVENT(block_bio_merge, block_bio_backmerge,
  * Merging block IO operation @bio to the beginning of an existing block
  * operation in queue.
  */
-DEFINE_EVENT(block_bio_merge, block_bio_frontmerge,
+DEFINE_EVENT(block_bio, block_bio_frontmerge,
 
TP_PROTO(struct bio *bio),
 
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 06/11] block: remove extra param for trace_block_getrq()

2020-06-30 Thread Chaitanya Kulkarni
Remove the extra parameter for trace_block_getrq() since we can derive
I/O direction from bio->bi_opf.

Signed-off-by: Chaitanya Kulkarni 
---
 block/blk-mq.c   |  2 +-
 include/trace/events/block.h | 14 ++
 kernel/trace/blktrace.c  | 13 ++---
 3 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index dbb98b2bc868..d66bb299d4ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2111,7 +2111,7 @@ blk_qc_t blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
goto queue_exit;
}
 
-   trace_block_getrq(bio, bio->bi_opf);
+   trace_block_getrq(bio);
 
rq_qos_track(q, rq, bio);
 
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index d7289576f1fd..3d8923062fc4 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -347,9 +347,9 @@ DEFINE_EVENT(block_bio, block_bio_queue,
 
 DECLARE_EVENT_CLASS(block_get_rq,
 
-   TP_PROTO(struct bio *bio, int rw),
+   TP_PROTO(struct bio *bio),
 
-   TP_ARGS(bio, rw),
+   TP_ARGS(bio),
 
TP_STRUCT__entry(
__field( dev_t, dev )
@@ -377,22 +377,20 @@ DECLARE_EVENT_CLASS(block_get_rq,
 /**
  * block_getrq - get a free request entry in queue for block IO operations
  * @bio: pending block IO operation (can be %NULL)
- * @rw: low bit indicates a read (%0) or a write (%1)
  *
  * A request struct for queue has been allocated to handle the
  * block IO operation @bio.
  */
 DEFINE_EVENT(block_get_rq, block_getrq,
 
-   TP_PROTO(struct bio *bio, int rw),
+   TP_PROTO(struct bio *bio),
 
-   TP_ARGS(bio, rw)
+   TP_ARGS(bio)
 );
 
 /**
  * block_sleeprq - waiting to get a free request entry in queue for block IO 
operation
  * @bio: pending block IO operation (can be %NULL)
- * @rw: low bit indicates a read (%0) or a write (%1)
  *
  * In the case where a request struct cannot be provided for queue
  * the process needs to wait for an request struct to become
@@ -401,9 +399,9 @@ DEFINE_EVENT(block_get_rq, block_getrq,
  */
 DEFINE_EVENT(block_get_rq, block_sleeprq,
 
-   TP_PROTO(struct bio *bio, int rw),
+   TP_PROTO(struct bio *bio),
 
-   TP_ARGS(bio, rw)
+   TP_ARGS(bio)
 );
 
 /**
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index 7b72781a591d..1d36e6153ab8 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -949,8 +949,7 @@ static void blk_add_trace_bio_queue(void *ignore, struct 
bio *bio)
blk_add_trace_bio(bio, BLK_TA_QUEUE, 0);
 }
 
-static void blk_add_trace_getrq(void *ignore,
-   struct bio *bio, int rw)
+static void blk_add_trace_getrq(void *ignore, struct bio *bio)
 {
if (bio)
blk_add_trace_bio(bio, BLK_TA_GETRQ, 0);
@@ -960,14 +959,14 @@ static void blk_add_trace_getrq(void *ignore,
rcu_read_lock();
bt = rcu_dereference(bio_q(bio)->blk_trace);
if (bt)
-   __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_GETRQ, 0, 0,
-   NULL, 0);
+   __blk_add_trace(bt, 0, 0, bio->bi_opf, 0,
+   BLK_TA_GETRQ, 0, 0, NULL, 0);
rcu_read_unlock();
}
 }
 
 
-static void blk_add_trace_sleeprq(void *ignore, struct bio *bio, int rw)
+static void blk_add_trace_sleeprq(void *ignore, struct bio *bio)
 {
if (bio)
blk_add_trace_bio(bio, BLK_TA_SLEEPRQ, 0);
@@ -977,8 +976,8 @@ static void blk_add_trace_sleeprq(void *ignore, struct bio 
*bio, int rw)
rcu_read_lock();
bt = rcu_dereference(bio_q(bio)->blk_trace);
if (bt)
-   __blk_add_trace(bt, 0, 0, rw, 0, BLK_TA_SLEEPRQ,
-   0, 0, NULL, 0);
+   __blk_add_trace(bt, 0, 0, bio->bi_opf, 0,
+   BLK_TA_SLEEPRQ, 0, 0, NULL, 0);
rcu_read_unlock();
}
 }
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 01/11] block: blktrace framework cleanup

2020-06-30 Thread Chaitanya Kulkarni
This patch removes the extra variables from the trace events and
overall kernel blktrace framework. The removed members can easily be
extracted from the remaining argument which reduces the code
significantly and allows us to optimize the several tracepoints like
the next patch in the series.      

Signed-off-by: Chaitanya Kulkarni 
---
 block/blk-core.c |  6 +--
 block/blk-merge.c|  4 +-
 block/blk-mq-sched.c |  2 +-
 block/blk-mq.c   | 10 ++--
 block/bounce.c   |  2 +-
 drivers/md/dm.c  |  4 +-
 include/trace/events/block.h | 87 +---
 kernel/trace/blktrace.c  | 98 ++--
 8 files changed, 98 insertions(+), 115 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 72b102a389a5..6d4c57ef4533 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -674,7 +674,7 @@ bool bio_attempt_back_merge(struct request *req, struct bio 
*bio,
if (!ll_back_merge_fn(req, bio, nr_segs))
return false;
 
-   trace_block_bio_backmerge(req->q, req, bio);
+   trace_block_bio_backmerge(bio);
rq_qos_merge(req->q, req, bio);
 
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
@@ -698,7 +698,7 @@ bool bio_attempt_front_merge(struct request *req, struct 
bio *bio,
if (!ll_front_merge_fn(req, bio, nr_segs))
return false;
 
-   trace_block_bio_frontmerge(req->q, req, bio);
+   trace_block_bio_frontmerge(bio);
rq_qos_merge(req->q, req, bio);
 
if ((req->cmd_flags & REQ_FAILFAST_MASK) != ff)
@@ -1082,7 +1082,7 @@ generic_make_request_checks(struct bio *bio)
return false;
 
if (!bio_flagged(bio, BIO_TRACE_COMPLETION)) {
-   trace_block_bio_queue(q, bio);
+   trace_block_bio_queue(bio);
/* Now that enqueuing has been traced, we need to trace
 * completion as well.
 */
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9c9fb21584b6..8333ccd60ee1 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -337,7 +337,7 @@ void __blk_queue_split(struct request_queue *q, struct bio 
**bio,
split->bi_opf |= REQ_NOMERGE;
 
bio_chain(split, *bio);
-   trace_block_split(q, split, (*bio)->bi_iter.bi_sector);
+   trace_block_split(split, (*bio)->bi_iter.bi_sector);
generic_make_request(*bio);
*bio = split;
}
@@ -793,7 +793,7 @@ static struct request *attempt_merge(struct request_queue 
*q,
 */
blk_account_io_merge_request(next);
 
-   trace_block_rq_merge(q, next);
+   trace_block_rq_merge(next);
 
/*
 * ownership of bio passed from next to req, return 'next' for
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index fdcc2c1dd178..a3cade16ef80 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -409,7 +409,7 @@ EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge);
 
 void blk_mq_sched_request_inserted(struct request *rq)
 {
-   trace_block_rq_insert(rq->q, rq);
+   trace_block_rq_insert(rq);
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index b8738b3c6d06..dbb98b2bc868 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -742,7 +742,7 @@ void blk_mq_start_request(struct request *rq)
 {
struct request_queue *q = rq->q;
 
-   trace_block_rq_issue(q, rq);
+   trace_block_rq_issue(rq);
 
if (test_bit(QUEUE_FLAG_STATS, >queue_flags)) {
rq->io_start_time_ns = ktime_get_ns();
@@ -769,7 +769,7 @@ static void __blk_mq_requeue_request(struct request *rq)
 
blk_mq_put_driver_tag(rq);
 
-   trace_block_rq_requeue(q, rq);
+   trace_block_rq_requeue(rq);
rq_qos_requeue(q, rq);
 
if (blk_mq_request_started(rq)) {
@@ -1758,7 +1758,7 @@ static inline void __blk_mq_insert_req_list(struct 
blk_mq_hw_ctx *hctx,
 
lockdep_assert_held(>lock);
 
-   trace_block_rq_insert(hctx->queue, rq);
+   trace_block_rq_insert(rq);
 
if (at_head)
list_add(>queuelist, >rq_lists[type]);
@@ -1814,7 +1814,7 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, 
struct blk_mq_ctx *ctx,
 */
list_for_each_entry(rq, list, queuelist) {
BUG_ON(rq->mq_ctx != ctx);
-   trace_block_rq_insert(hctx->queue, rq);
+   trace_block_rq_insert(rq);
}
 
spin_lock(>lock);
@@ -2111,7 +2111,7 @@ blk_qc_t blk_mq_make_request(struct request_queue *q, 
struct bio *bio)
goto queue_exit;
}
 
-   trace_block_getrq(q, bio, bio->bi_opf);
+   trace_block_getrq(bio, bio->bi_opf);
 
rq_qos_track(q, rq, bio);
 
diff --git a/block/bounce.c b/block/bounce.c
index c3aaed070124..9550640b1f86 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -341,7 +341,7 

[dm-devel] [PATCH 05/11] block: get rid of the trace rq insert wrapper

2020-06-30 Thread Chaitanya Kulkarni
Get rid of the wrapper for trace_block_rq_insert() and call the function
directly.

Signed-off-by: Chaitanya Kulkarni 
---
 block/bfq-iosched.c   | 4 +++-
 block/blk-mq-sched.c  | 6 --
 block/blk-mq-sched.h  | 1 -
 block/kyber-iosched.c | 4 +++-
 block/mq-deadline.c   | 4 +++-
 5 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 50c8f034c01c..e2b9b700ed34 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -125,6 +125,8 @@
 #include 
 #include 
 
+#include 
+
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-tag.h"
@@ -5507,7 +5509,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx 
*hctx, struct request *rq,
 
spin_unlock_irq(>lock);
 
-   blk_mq_sched_request_inserted(rq);
+   trace_block_rq_insert(rq);
 
spin_lock_irq(>lock);
bfqq = bfq_init_rq(rq);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index a3cade16ef80..20b6a59fbd5a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -407,12 +407,6 @@ bool blk_mq_sched_try_insert_merge(struct request_queue 
*q, struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_sched_try_insert_merge);
 
-void blk_mq_sched_request_inserted(struct request *rq)
-{
-   trace_block_rq_insert(rq);
-}
-EXPORT_SYMBOL_GPL(blk_mq_sched_request_inserted);
-
 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
   bool has_sched,
   struct request *rq)
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index 126021fc3a11..04c40c695bf0 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -10,7 +10,6 @@ void blk_mq_sched_free_hctx_data(struct request_queue *q,
 
 void blk_mq_sched_assign_ioc(struct request *rq);
 
-void blk_mq_sched_request_inserted(struct request *rq);
 bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **merged_request);
 bool __blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index a38c5ab103d1..e42d78deee90 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+#include 
+
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-debugfs.h"
@@ -602,7 +604,7 @@ static void kyber_insert_requests(struct blk_mq_hw_ctx 
*hctx,
list_move_tail(>queuelist, head);
sbitmap_set_bit(>kcq_map[sched_domain],
rq->mq_ctx->index_hw[hctx->type]);
-   blk_mq_sched_request_inserted(rq);
+   trace_block_rq_insert(rq);
spin_unlock(>lock);
}
 }
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b57470e154c8..f3631a287466 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#include 
+
 #include "blk.h"
 #include "blk-mq.h"
 #include "blk-mq-debugfs.h"
@@ -496,7 +498,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, 
struct request *rq,
if (blk_mq_sched_try_insert_merge(q, rq))
return;
 
-   blk_mq_sched_request_inserted(rq);
+   trace_block_rq_insert(rq);
 
if (at_head || blk_rq_is_passthrough(rq)) {
if (at_head)
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 10/11] block: use block_bio class for getrq and sleeprq

2020-06-30 Thread Chaitanya Kulkarni
The only difference in block_get_rq and block_bio was the last param
passed  __entry->nr_sector & bio->bi_iter.bi_size respectively. Since
that is not the case anymore replace block_get_rq class with block_bio
for block_getrq and block_sleeprq events, also adjust the code to handle
null bio case in block_bio.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index d191d2cd1070..21f1daaf012b 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -272,11 +272,19 @@ DECLARE_EVENT_CLASS(block_bio,
),
 
TP_fast_assign(
-   __entry->dev= bio_dev(bio);
-   __entry->sector = bio->bi_iter.bi_sector;
-   __entry->nr_sector  = bio_sectors(bio);
-   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
-   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+   if (bio) {
+   __entry->dev= bio_dev(bio);
+   __entry->sector = bio->bi_iter.bi_sector;
+   __entry->nr_sector  = bio_sectors(bio);
+   blk_fill_rwbs(__entry->rwbs, bio->bi_opf);
+   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+   } else {
+   __entry->dev= 0;
+   __entry->sector = 0;
+   __entry->nr_sector  = 0;
+   blk_fill_rwbs(__entry->rwbs, 0);
+   memcpy(__entry->comm, current->comm, TASK_COMM_LEN);
+   }
),
 
TP_printk("%d,%d %s %llu + %u [%s]",
@@ -380,7 +388,7 @@ DECLARE_EVENT_CLASS(block_get_rq,
  * A request struct for queue has been allocated to handle the
  * block IO operation @bio.
  */
-DEFINE_EVENT(block_get_rq, block_getrq,
+DEFINE_EVENT(block_bio, block_getrq,
 
TP_PROTO(struct bio *bio),
 
@@ -396,7 +404,7 @@ DEFINE_EVENT(block_get_rq, block_getrq,
  * available.  This tracepoint event is generated each time the
  * process goes to sleep waiting for request struct become available.
  */
-DEFINE_EVENT(block_get_rq, block_sleeprq,
+DEFINE_EVENT(block_bio, block_sleeprq,
 
TP_PROTO(struct bio *bio),
 
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 6/6] mm: Add memalloc_nowait

2020-06-30 Thread Matthew Wilcox
On Mon, Jun 29, 2020 at 04:45:14PM +0300, Mike Rapoport wrote:
> 
> 
> On June 29, 2020 3:52:31 PM GMT+03:00, Michal Hocko  wrote:
> >On Mon 29-06-20 13:18:16, Matthew Wilcox wrote:
> >> On Mon, Jun 29, 2020 at 08:08:51AM +0300, Mike Rapoport wrote:
> >> > > @@ -886,8 +868,12 @@ static struct dm_buffer
> >*__alloc_buffer_wait_no_callback(struct dm_bufio_client
> >> > >return NULL;
> >> > >  
> >> > >if (dm_bufio_cache_size_latch != 1 && 
> >> > > !tried_noio_alloc) {
> >> > > +  unsigned noio_flag;
> >> > > +
> >> > >dm_bufio_unlock(c);
> >> > > -  b = alloc_buffer(c, GFP_NOIO | __GFP_NORETRY |
> >__GFP_NOMEMALLOC | __GFP_NOWARN);
> >> > > +  noio_flag = memalloc_noio_save();
> >> > 
> >> > I've read the series twice and I'm still missing the definition of
> >> > memalloc_noio_save().
> >> > 
> >> > And also it would be nice to have a paragraph about it in
> >> > Documentation/core-api/memory-allocation.rst
> >> 
> >>
> >Documentation/core-api/gfp_mask-from-fs-io.rst:``memalloc_nofs_save``,
> >``memalloc_nofs_restore`` respectively ``memalloc_noio_save``,
> >> Documentation/core-api/gfp_mask-from-fs-io.rst:   :functions:
> >memalloc_noio_save memalloc_noio_restore
> >> Documentation/core-api/gfp_mask-from-fs-io.rst:allows nesting so it
> >is safe to call ``memalloc_noio_save`` or
> > 
> >The patch is adding memalloc_nowait* and I suspect Mike had that in
> >mind, which would be a fair request. 
> 
> Right, sorry misprinted that.
> 
> > Btw. we are missing
> >memalloc_nocma*
> >documentation either - I was just reminded of its existence today..

Heh.  Oops, not sure how those got left out.  I hadn't touched the
documentation either, so -1 points to me.

The documentation is hard to add a new case to, so I rewrote it.  What
do you think?  (Obviously I'll split this out differently for submission;
this is just what I have in my tree right now).

diff --git a/Documentation/core-api/gfp_mask-from-fs-io.rst 
b/Documentation/core-api/gfp_mask-from-fs-io.rst
deleted file mode 100644
index e7c32a8de126..
--- a/Documentation/core-api/gfp_mask-from-fs-io.rst
+++ /dev/null
@@ -1,68 +0,0 @@
-.. _gfp_mask_from_fs_io:
-
-=
-GFP masks used from FS/IO context
-=
-
-:Date: May, 2018
-:Author: Michal Hocko 
-
-Introduction
-
-
-Code paths in the filesystem and IO stacks must be careful when
-allocating memory to prevent recursion deadlocks caused by direct
-memory reclaim calling back into the FS or IO paths and blocking on
-already held resources (e.g. locks - most commonly those used for the
-transaction context).
-
-The traditional way to avoid this deadlock problem is to clear __GFP_FS
-respectively __GFP_IO (note the latter implies clearing the first as well) in
-the gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can be
-used as shortcut. It turned out though that above approach has led to
-abuses when the restricted gfp mask is used "just in case" without a
-deeper consideration which leads to problems because an excessive use
-of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
-reclaim issues.
-
-New API
-
-
-Since 4.12 we do have a generic scope API for both NOFS and NOIO context
-``memalloc_nofs_save``, ``memalloc_nofs_restore`` respectively 
``memalloc_noio_save``,
-``memalloc_noio_restore`` which allow to mark a scope to be a critical
-section from a filesystem or I/O point of view. Any allocation from that
-scope will inherently drop __GFP_FS respectively __GFP_IO from the given
-mask so no memory allocation can recurse back in the FS/IO.
-
-.. kernel-doc:: include/linux/sched/mm.h
-   :functions: memalloc_nofs_save memalloc_nofs_restore
-.. kernel-doc:: include/linux/sched/mm.h
-   :functions: memalloc_noio_save memalloc_noio_restore
-
-FS/IO code then simply calls the appropriate save function before
-any critical section with respect to the reclaim is started - e.g.
-lock shared with the reclaim context or when a transaction context
-nesting would be possible via reclaim. The restore function should be
-called when the critical section ends. All that ideally along with an
-explanation what is the reclaim context for easier maintenance.
-
-Please note that the proper pairing of save/restore functions
-allows nesting so it is safe to call ``memalloc_noio_save`` or
-``memalloc_noio_restore`` respectively from an existing NOIO or NOFS
-scope.
-
-What about __vmalloc(GFP_NOFS)
-==
-
-vmalloc doesn't support GFP_NOFS semantic because there are hardcoded
-GFP_KERNEL allocations deep inside the allocator which are quite non-trivial
-to fix up. That means that calling ``vmalloc`` with GFP_NOFS/GFP_NOIO is
-almost always a bug. The good news is that the NOFS/NOIO semantic can be
-achieved by the scope API.
-
-In the ideal world, upper 

[dm-devel] [PATCH] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Michal Suchanek
The writecache driver does not handle asynchronous pmem. Reject it when
supplied as cache.

Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")

Signed-off-by: Michal Suchanek 
---
 drivers/md/dm-writecache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 30505d70f423..57b0a972f6fd 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -2277,6 +2277,12 @@ static int writecache_ctr(struct dm_target *ti, unsigned 
argc, char **argv)
 
wc->memory_map_size -= (uint64_t)wc->start_sector << 
SECTOR_SHIFT;
 
+   if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
+   r = -EOPNOTSUPP;
+   ti->error = "Asynchronous persistent memory not 
supported as pmem cache";
+   goto bad;
+   }
+
bio_list_init(>flush_list);
wc->flush_thread = kthread_create(writecache_flush_thread, wc, 
"dm_writecache_flush");
if (IS_ERR(wc->flush_thread)) {
-- 
2.26.2


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 08/11] block: fix the comments in the trace event block.h

2020-06-30 Thread Chaitanya Kulkarni
This is purely cleanup patch which fixes the comment in trace event
header for block_rq_issue() and block_rq_merge() events.

Signed-off-by: Chaitanya Kulkarni 
---
 include/trace/events/block.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 3d8923062fc4..6a74fb9f4dba 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -195,7 +195,7 @@ DEFINE_EVENT(block_rq, block_rq_insert,
 
 /**
  * block_rq_issue - issue pending block IO request operation to device driver
- * @rq: block IO operation operation request
+ * @rq: block IO operation request
  *
  * Called when block operation request @rq from queue @q is sent to a
  * device driver for processing.
@@ -209,7 +209,7 @@ DEFINE_EVENT(block_rq, block_rq_issue,
 
 /**
  * block_rq_merge - merge request with another one in the elevator
- * @rq: block IO operation operation request
+ * @rq: block IO operation request
  *
  * Called when block operation request @rq from queuei is merged to another
  * request queued in the elevator.
-- 
2.26.0

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Mike Snitzer
On Tue, Jun 30 2020 at 10:10am -0400,
Michal Suchánek  wrote:

> On Tue, Jun 30, 2020 at 09:32:01AM -0400, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 30 Jun 2020, Michal Suchanek wrote:
> > 
> > > The writecache driver does not handle asynchronous pmem. Reject it when
> > > supplied as cache.
> > > 
> > > Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> > > Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > > 
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > >  drivers/md/dm-writecache.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> > > index 30505d70f423..57b0a972f6fd 100644
> > > --- a/drivers/md/dm-writecache.c
> > > +++ b/drivers/md/dm-writecache.c
> > > @@ -2277,6 +2277,12 @@ static int writecache_ctr(struct dm_target *ti, 
> > > unsigned argc, char **argv)
> > >  
> > >   wc->memory_map_size -= (uint64_t)wc->start_sector << 
> > > SECTOR_SHIFT;
> > >  
> > > + if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> > > + r = -EOPNOTSUPP;
> > > + ti->error = "Asynchronous persistent memory not 
> > > supported as pmem cache";
> > > + goto bad;
> > > + }
> > > +
> > >   bio_list_init(>flush_list);
> > >   wc->flush_thread = kthread_create(writecache_flush_thread, wc, 
> > > "dm_writecache_flush");
> > >   if (IS_ERR(wc->flush_thread)) {
> > > -- 
> > 
> > Hi
> > 
> > Shouldn't this be in the "if (WC_MODE_PMEM(wc))" block?
> That should be always the case at this point.
> > 
> > WC_MODE_PMEM(wc) retrurns true if we are using persistent memory as a 
> > cache device, otherwise we are using generic block device as a cache 
> > device.
>
> This is to prevent the situation where we have WC_MODE_PMEM(wc) but
> cannot guarantee consistency because the async flush is not handled.

The writecache operates in 2 modes.  SSD or PMEM.  Mikulas is saying
your dax_synchronous() check should go within a WC_MODE_PMEM(wc) block
because it doesn't make sense to do the check when in SSD mode.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/3 v2] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka



On Sun, 28 Jun 2020, Eric Biggers wrote:

> On Sun, Jun 28, 2020 at 03:07:49PM -0400, Mikulas Patocka wrote:
> > 
> > > Also, "seqiv" instances can be created without 
> > > CRYPTO_ALG_ALLOCATES_MEMORY set,
> > > despite seqiv_aead_encrypt() allocating memory.
> > > 
> 
> This comment wasn't addressed.
> 
> - Eric

I've sent version 4 of the patch that adds CRYPTO_ALG_ALLOCATES_MEMORY to 
seqiv.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[dm-devel] [PATCH 1/3 v4] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka
Introduce a new flag CRYPTO_ALG_ALLOCATES_MEMORY and pass it down the
crypto stack.

If the flag is set, then the crypto driver allocates memory in its request
routine. Such drivers are not suitable for disk encryption because
GFP_ATOMIC allocation can fail anytime (causing random I/O errors) and
GFP_KERNEL allocation can recurse into the block layer, causing a
deadlock.

Pass the flag CRYPTO_ALG_ALLOCATES_MEMORY down through the crypto API.

Signed-off-by: Mikulas Patocka 

---
 crypto/adiantum.c |2 +-
 crypto/authenc.c  |7 ---
 crypto/authencesn.c   |7 ---
 crypto/ccm.c  |8 
 crypto/chacha20poly1305.c |7 ---
 crypto/cryptd.c   |   16 +++-
 crypto/ctr.c  |4 ++--
 crypto/cts.c  |4 ++--
 crypto/essiv.c|4 ++--
 crypto/gcm.c  |   15 ---
 crypto/geniv.c|4 ++--
 crypto/lrw.c  |4 ++--
 crypto/pcrypt.c   |4 +++-
 crypto/rsa-pkcs1pad.c |4 ++--
 crypto/seqiv.c|2 ++
 crypto/xts.c  |4 ++--
 include/crypto/algapi.h   |   10 ++
 include/linux/crypto.h|   15 +++
 18 files changed, 76 insertions(+), 45 deletions(-)

Index: linux-2.6/include/linux/crypto.h
===
--- linux-2.6.orig/include/linux/crypto.h   2020-06-29 16:03:07.346417000 
+0200
+++ linux-2.6/include/linux/crypto.h2020-06-29 16:03:07.336417000 +0200
@@ -102,6 +102,21 @@
 #define CRYPTO_NOLOAD  0x8000
 
 /*
+ * The driver may allocate memory during request processing, so it shouldn't be
+ * used in cases where memory allocation failures aren't acceptable, such as
+ * during block device encryption.
+ */
+#define CRYPTO_ALG_ALLOCATES_MEMORY0x0001
+
+/*
+ * When an algorithm uses another algorithm (e.g., if it's an instance of a
+ * template), these are the flags that always get set on the "outer" algorithm
+ * if any "inner" algorithm has them set.  In some cases other flags are
+ * inherited too; these are just the flags that are *always* inherited.
+ */
+#define CRYPTO_ALG_INHERITED_FLAGS (CRYPTO_ALG_ASYNC | 
CRYPTO_ALG_ALLOCATES_MEMORY)
+
+/*
  * Transform masks and values (for crt_flags).
  */
 #define CRYPTO_TFM_NEED_KEY0x0001
Index: linux-2.6/crypto/authenc.c
===
--- linux-2.6.orig/crypto/authenc.c 2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authenc.c  2020-06-30 15:47:56.516417000 +0200
@@ -388,7 +388,7 @@ static int crypto_authenc_create(struct
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -423,8 +423,9 @@ static int crypto_authenc_create(struct
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/authencesn.c
===
--- linux-2.6.orig/crypto/authencesn.c  2020-06-29 16:03:07.346417000 +0200
+++ linux-2.6/crypto/authencesn.c   2020-06-30 15:48:11.996417000 +0200
@@ -406,7 +406,7 @@ static int crypto_authenc_esn_create(str
if ((algt->type ^ CRYPTO_ALG_TYPE_AEAD) & algt->mask)
return -EINVAL;
 
-   mask = crypto_requires_sync(algt->type, algt->mask);
+   mask = crypto_alg_inherited_mask(algt->type, algt->mask);
 
inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL);
if (!inst)
@@ -437,8 +437,9 @@ static int crypto_authenc_esn_create(str
 enc->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto err_free_inst;
 
-   inst->alg.base.cra_flags = (auth_base->cra_flags |
-   enc->base.cra_flags) & CRYPTO_ALG_ASYNC;
+   inst->alg.base.cra_flags =
+   (auth_base->cra_flags | enc->base.cra_flags) &
+   CRYPTO_ALG_INHERITED_FLAGS;
inst->alg.base.cra_priority = enc->base.cra_priority * 10 +
  auth_base->cra_priority;
inst->alg.base.cra_blocksize = enc->base.cra_blocksize;
Index: linux-2.6/crypto/ccm.c

Re: [dm-devel] rename ->make_request_fn and move it to the block_device_operations

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> Hi Jens,
> 
> this series moves the make_request_fn method into block_device_operations
> with the much more descriptive ->submit_bio name.  It then also gives
> generic_make_request a more descriptive name, and further optimize the
> path to issue to blk-mq, removing the need for the direct_make_request
> bypass.

Looks good to me, and it's a nice cleanup as well. Applied.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 11/20] fs: remove a weird comment in submit_bh_wbc

2020-06-30 Thread Jens Axboe
On 6/29/20 1:39 PM, Christoph Hellwig wrote:
> All bios can get remapped if submitted to partitions.  No need to
> comment on that.

I'm pretty sure that comment is from me, dating back to when the bio
code was introduced in 2001. The point wasn't the remapping, just
that from here on down the IO was purely bio based, not buffer_heads.
Anyway, totally agree that it should just die, it's not that
interesting or useful anymore.

-- 
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 1/3 v2] crypto: introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY

2020-06-30 Thread Mikulas Patocka



On Mon, 29 Jun 2020, Mikulas Patocka wrote:

> On Sun, 28 Jun 2020, Eric Biggers wrote:
> 
> > On Sun, Jun 28, 2020 at 03:07:49PM -0400, Mikulas Patocka wrote:
> > > > 
> > > > cryptd_create_skcipher(), cryptd_create_hash(), cryptd_create_aead(), 
> > > > and
> > > > crypto_rfc4309_create() are also missing setting the mask.
> > > > 
> > > > pcrypt_create_aead() is missing both setting the mask and inheriting 
> > > > the flags.

pcrypt_create_aead doesn't use "mask" and "type" arguments at all.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH] dm writecache: reject asynchronous pmem.

2020-06-30 Thread Mikulas Patocka



On Tue, 30 Jun 2020, Michal Suchanek wrote:

> The writecache driver does not handle asynchronous pmem. Reject it when
> supplied as cache.
> 
> Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc@linux.ibm.com/
> Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> 
> Signed-off-by: Michal Suchanek 
> ---
>  drivers/md/dm-writecache.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
> index 30505d70f423..57b0a972f6fd 100644
> --- a/drivers/md/dm-writecache.c
> +++ b/drivers/md/dm-writecache.c
> @@ -2277,6 +2277,12 @@ static int writecache_ctr(struct dm_target *ti, 
> unsigned argc, char **argv)
>  
>   wc->memory_map_size -= (uint64_t)wc->start_sector << 
> SECTOR_SHIFT;
>  
> + if (!dax_synchronous(wc->ssd_dev->dax_dev)) {
> + r = -EOPNOTSUPP;
> + ti->error = "Asynchronous persistent memory not 
> supported as pmem cache";
> + goto bad;
> + }
> +
>   bio_list_init(>flush_list);
>   wc->flush_thread = kthread_create(writecache_flush_thread, wc, 
> "dm_writecache_flush");
>   if (IS_ERR(wc->flush_thread)) {
> -- 

Hi

Shouldn't this be in the "if (WC_MODE_PMEM(wc))" block?

WC_MODE_PMEM(wc) retrurns true if we are using persistent memory as a 
cache device, otherwise we are using generic block device as a cache 
device.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] dm-crypt hard lockup

2020-06-30 Thread Artur Paszkiewicz
On 6/28/20 6:50 PM, Bart Van Assche wrote:
> Since considerable time I'm doing kernel builds (make -j8) on an
> openSUSE Tumbleweed system on top of dm-crypt and an NVMe SSD but I have
> not yet encountered any kind of lockup. Maybe another driver, e.g. an
> I/O scheduler, is responsible for the lockups?

It could by another driver. I tried with a different filesystem (xfs),
without any I/O scheduler and even with the sources on a different drive
without dm-crypt, and the results were more or less the same - lockups
involving dm-crypt. Maybe I'm just abusing the system - I'm actually
running "make -j", so the number of tasks is unlimited. With a limited
number it does not hang. I know it's not a good idea to do this but it
used to work, and I think it shouldn't cause the kernel to die like
this.

Artur

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



Re: [dm-devel] [PATCH 6/6] mm: Add memalloc_nowait

2020-06-30 Thread Michal Hocko
On Mon 29-06-20 22:28:30, Matthew Wilcox wrote:
[...]
> The documentation is hard to add a new case to, so I rewrote it.  What
> do you think?  (Obviously I'll split this out differently for submission;
> this is just what I have in my tree right now).

I am fine with your changes. Few notes below.

> -It turned out though that above approach has led to
> -abuses when the restricted gfp mask is used "just in case" without a
> -deeper consideration which leads to problems because an excessive use
> -of GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memory
> -reclaim issues.

I believe this is an important part because it shows that new people
coming to the existing code shouldn't take it as correct and rather
question it. Also having a clear indication that overuse is causing real
problems that might be not immediately visible to subsystems outside of
MM.

> -FS/IO code then simply calls the appropriate save function before
> -any critical section with respect to the reclaim is started - e.g.
> -lock shared with the reclaim context or when a transaction context
> -nesting would be possible via reclaim.  

[...]

> +These functions should be called at the point where any memory allocation
> +would start to cause problems.  That is, do not simply wrap individual
> +memory allocation calls which currently use ``GFP_NOFS`` with a pair
> +of calls to memalloc_nofs_save() and memalloc_nofs_restore().  Instead,
> +find the lock which is taken that would cause problems if memory reclaim
> +reentered the filesystem, place a call to memalloc_nofs_save() before it
> +is acquired and a call to memalloc_nofs_restore() after it is released.
> +Ideally also add a comment explaining why this lock will be problematic.

The above text has mentioned the transaction context nesting as well and
that was a hint by Dave IIRC. It is imho good to have an example of
other reentrant points than just locks. I believe another useful example
would be something like loop device which is mixing IO and FS layers but
I am not familiar with all the details to give you an useful text.

[...]
> @@ -104,16 +134,19 @@ ARCH_KMALLOC_MINALIGN bytes.  For sizes which are a 
> power of two, the
>  alignment is also guaranteed to be at least the respective size.
>  
>  For large allocations you can use vmalloc() and vzalloc(), or directly
> -request pages from the page allocator. The memory allocated by `vmalloc`
> -and related functions is not physically contiguous.
> +request pages from the page allocator.  The memory allocated by `vmalloc`
> +and related functions is not physically contiguous.  The `vmalloc`
> +family of functions don't support the old ``GFP_NOFS`` or ``GFP_NOIO``
> +flags because there are hardcoded ``GFP_KERNEL`` allocations deep inside
> +the allocator which are hard to remove.  However, the scope APIs described
> +above can be used to limit the `vmalloc` functions.

I would reiterate "Do not just wrap vmalloc by the scope api but rather
rely on the real scope for the NOFS/NOIO context". Maybe we want to
stress out that once a scope is defined it is sticky to _all_
allocations and all allocators within that scope. The text is already
saying that but maybe we want to make it explicit and make it stand out.

[...]
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 6484569f50df..9fc091274d1d 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -186,9 +186,10 @@ static inline gfp_t current_gfp_context(gfp_t flags)
>* them.  noio implies neither IO nor FS and it is a weaker
>* context so always make sure it takes precedence.
>*/
> - if (current->memalloc_nowait)
> + if (current->memalloc_nowait) {
>   flags &= ~__GFP_DIRECT_RECLAIM;
> - else if (current->memalloc_noio)
> + flags |= __GFP_NOWARN;

I dunno. I wouldn't make nowait implicitly NOWARN as well. At least not
with the initial implementation. Maybe we will learn later that there is
just too much unhelpful noise in the kernel log and will reconsider but
I wouldn't just start with that. Also we might learn that there will be
other modifiers for atomic (or should I say non-sleeping) scopes to be
defined. E.g. access to memory reserves but let's just wait for real
usecases.


Thanks a lot Matthew!
-- 
Michal Hocko
SUSE Labs

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel