Re: [PATCH 1/2] Block layer filter - second version

2020-10-24 Thread Greg KH
On Wed, Oct 21, 2020 at 12:04:08PM +0300, Sergei Shtepa wrote:
> Signed-off-by: Sergei Shtepa 

I know I don't take patches without any changelog text.

Maybe some maintainers are more lax...

Also, "second version" doesn't belong in the subject line, the
documentation shows how to properly version patch series, please do
that.

thanks,

greg k-h

> ---
>  block/Kconfig   |  11 ++
>  block/Makefile  |   1 +
>  block/blk-core.c|  52 +--
>  block/blk-filter-internal.h |  29 
>  block/blk-filter.c  | 286 
>  block/partitions/core.c |  14 +-
>  fs/block_dev.c  |   6 +-
>  fs/direct-io.c  |   2 +-
>  fs/iomap/direct-io.c|   2 +-
>  include/linux/bio.h |   4 +-
>  include/linux/blk-filter.h  |  76 ++
>  include/linux/genhd.h   |   8 +-
>  kernel/power/swap.c |   2 +-
>  mm/page_io.c|   4 +-
>  14 files changed, 471 insertions(+), 26 deletions(-)
>  create mode 100644 block/blk-filter-internal.h
>  create mode 100644 block/blk-filter.c
>  create mode 100644 include/linux/blk-filter.h
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index bbad5e8bbffe..a308801b4376 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -204,6 +204,17 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
> by falling back to the kernel crypto API when inline
> encryption hardware is not present.
>  
> +config BLK_FILTER
> + bool "Enable support for block layer filters"
> + default y
> + depends on MODULES
> + help
> +   Enabling this lets third-party kernel modules intercept
> +   bio requests for any block device. This allows them to implement
> +   changed block tracking and snapshots without any reconfiguration of
> +   the existing setup. For example, this option allows snapshotting of
> +   a block device without adding it to LVM.
> +
>  menu "Partition Types"
>  
>  source "block/partitions/Kconfig"
> diff --git a/block/Makefile b/block/Makefile
> index 8d841f5f986f..b8ee50b8e031 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -38,3 +38,4 @@ obj-$(CONFIG_BLK_SED_OPAL)  += sed-opal.o
>  obj-$(CONFIG_BLK_PM) += blk-pm.o
>  obj-$(CONFIG_BLK_INLINE_ENCRYPTION)  += keyslot-manager.o blk-crypto.o
>  obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
> +obj-$(CONFIG_BLK_FILTER) += blk-filter.o
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 10c08ac50697..cc06402af695 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1216,23 +1216,20 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
>  EXPORT_SYMBOL(submit_bio_noacct);
>  
>  /**
> - * submit_bio - submit a bio to the block device layer for I/O
> - * @bio: The  bio which describes the I/O
> - *
> - * submit_bio() is used to submit I/O requests to block devices.  It is 
> passed a
> - * fully set up  bio that describes the I/O that needs to be done.  
> The
> - * bio will be send to the device described by the bi_disk and bi_partno 
> fields.
> + * submit_bio_direct - submit a bio to the block device layer for I/O
> + * bypass filter.
> + * @bio:  The bio describing the location in memory and on the device.
>   *
> - * The success/failure status of the request, along with notification of
> - * completion, is delivered asynchronously through the ->bi_end_io() callback
> - * in @bio.  The bio must NOT be touched by thecaller until ->bi_end_io() has
> - * been called.
> + * Description:
> + *This is a version of submit_bio() that shall only be used for I/O
> + *that cannot be intercepted by block layer filters.
> + *All file systems and other upper level users of the block layer
> + *should use submit_bio() instead.
> + *Use this function to access the swap partition and directly access
> + *the block device file.
>   */
> -blk_qc_t submit_bio(struct bio *bio)
> +blk_qc_t submit_bio_direct(struct bio *bio)
>  {
> - if (blkcg_punt_bio_submit(bio))
> - return BLK_QC_T_NONE;
> -
>   /*
>* If it's a regular read/write or a barrier with data attached,
>* go through the normal accounting stuff before submission.
> @@ -1282,8 +1279,35 @@ blk_qc_t submit_bio(struct bio *bio)
>  
>   return submit_bio_noacct(bio);
>  }
> +EXPORT_SYMBOL(submit_bio_direct);
> +
> +/**
> + * submit_bio - submit a bio to the block device layer for I/O
> + * @bio: The  bio which describes the I/O
> + *
> + * submit_bio() is used to submit I/O requests to block devices.  It is 
> passed a
> + * fully set up  bio that describes the I/O that needs to be done.  
> The
> + * bio will be send to the device described by the bi_disk and bi_partno 
> fields.
> + *
> + * The success/failure status of the request, along with notification of
> + * completion, is delivered asynchronously through the ->bi_end_io() callback
> + * in @bio.  The bio must NOT be touched by thecaller until ->bi_end_io() has

Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Randy Dunlap
On 10/21/20 2:04 AM, Sergei Shtepa wrote:
> diff --git a/block/Kconfig b/block/Kconfig
> index bbad5e8bbffe..a308801b4376 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -204,6 +204,17 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
> by falling back to the kernel crypto API when inline
> encryption hardware is not present.
>  
> +config BLK_FILTER
> + bool "Enable support for block layer filters"
> + default y

Drop the default y. We don't add modules to a default build without
some tough justification.

> + depends on MODULES
> + help
> +   Enabling this lets third-party kernel modules intercept

lets loadable kernel modules intercept

> +   bio requests for any block device. This allows them to implement
> +   changed block tracking and snapshots without any reconfiguration of
> +   the existing setup. For example, this option allows snapshotting of
> +   a block device without adding it to LVM.


-- 
~Randy



Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Sergei Shtepa
The 10/21/2020 16:07, Matthew Wilcox wrote:
> On Wed, Oct 21, 2020 at 03:55:55PM +0300, Sergei Shtepa wrote:
> > The 10/21/2020 14:44, Matthew Wilcox wrote:
> > > I don't understand why O_DIRECT gets to bypass the block filter.  Nor do
> > > I understand why anybody would place a block filter on the swap device.
> > > But if somebody did place a filter on the swap device, why should swap
> > > be able to bypass the filter?
> > 
> > Yes, intercepting the swap partition is absurd. But we can't guarantee
> > that the filter won't intercept swap.
> > 
> > Swap operation is related to the memory allocation logic. If a swap on
> > the block device are accessed during memory allocation from filter,
> > a deadlock occurs. We can allow filters to occasionally shoot off their
> > feet, especially under high load. But I think it's better not to do it.
> 
> We already have logic to prevent this in Linux.  Filters need to
> call memalloc_noio_save() while they might cause swap to happen and
> memalloc_noio_restore() once it's safe for them to cause swap again.

Yes, I looked at this function, it can really be useful for the filter.
Then I don't need to enter the submit_bio_direct() function and the wait
loop associated with the queue polling function blk_mq_poll() will have
to be rewritten.

> 
> > "directly access" - it is not O_DIRECT. This means (I think) direct
> > reading from the device file, like "dd if=/dev/sda1".
> > As for intercepting direct reading, I don't know how to do the right thing.
> > 
> > The problem here is that in fs/block_dev.c in function __blkdev_direct_IO()
> > uses the qc - value returned by the submit_bio() function.
> > This value is used below when calling 
> > blk_poll(bdev_get_queue(dev), qc, true).
> > The filter cannot return a meaningful value of the blk_qc_t type when
> > intercepting a request, because at that time it does not know which queue
> > the request will fall into.
> > 
> > If function submit_bio() will always return BLK_QC_T_NONE - I think the
> > algorithm of the __blk dev_direct_IO() will not work correctly.
> > If we need to intercept direct access to a block device, we need to at
> > least redo the __blkdev_direct_IO function, getting rid of blk_pool.
> > I'm not sure it's necessary yet.
> 
> This isn't part of the block layer that I'm familiar with, so I can't
> help solve this problem, but allowing O_DIRECT to bypass the block filter
> is a hole that needs to be fixed before these patches can be considered.

I think there is no such problem, but I will check, of course.

-- 
Sergei Shtepa
Veeam Software developer.


Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Matthew Wilcox
On Wed, Oct 21, 2020 at 03:55:55PM +0300, Sergei Shtepa wrote:
> The 10/21/2020 14:44, Matthew Wilcox wrote:
> > I don't understand why O_DIRECT gets to bypass the block filter.  Nor do
> > I understand why anybody would place a block filter on the swap device.
> > But if somebody did place a filter on the swap device, why should swap
> > be able to bypass the filter?
> 
> Yes, intercepting the swap partition is absurd. But we can't guarantee
> that the filter won't intercept swap.
> 
> Swap operation is related to the memory allocation logic. If a swap on
> the block device are accessed during memory allocation from filter,
> a deadlock occurs. We can allow filters to occasionally shoot off their
> feet, especially under high load. But I think it's better not to do it.

We already have logic to prevent this in Linux.  Filters need to
call memalloc_noio_save() while they might cause swap to happen and
memalloc_noio_restore() once it's safe for them to cause swap again.

> "directly access" - it is not O_DIRECT. This means (I think) direct
> reading from the device file, like "dd if=/dev/sda1".
> As for intercepting direct reading, I don't know how to do the right thing.
> 
> The problem here is that in fs/block_dev.c in function __blkdev_direct_IO()
> uses the qc - value returned by the submit_bio() function.
> This value is used below when calling 
> blk_poll(bdev_get_queue(dev), qc, true).
> The filter cannot return a meaningful value of the blk_qc_t type when
> intercepting a request, because at that time it does not know which queue
> the request will fall into.
> 
> If function submit_bio() will always return BLK_QC_T_NONE - I think the
> algorithm of the __blk dev_direct_IO() will not work correctly.
> If we need to intercept direct access to a block device, we need to at
> least redo the __blkdev_direct_IO function, getting rid of blk_pool.
> I'm not sure it's necessary yet.

This isn't part of the block layer that I'm familiar with, so I can't
help solve this problem, but allowing O_DIRECT to bypass the block filter
is a hole that needs to be fixed before these patches can be considered.


Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Sergei Shtepa
The 10/21/2020 14:44, Matthew Wilcox wrote:
> On Wed, Oct 21, 2020 at 09:21:36AM +, Damien Le Moal wrote:
> > > + * submit_bio_direct - submit a bio to the block device layer for I/O
> > > + * bypass filter.
> > > + * @bio:  The bio describing the location in memory and on the device.
> > >   *
> > > + * Description:
> 
> You don't need this line.
> 
> > > + *This is a version of submit_bio() that shall only be used for I/O
> > > + *that cannot be intercepted by block layer filters.
> > > + *All file systems and other upper level users of the block layer
> > > + *should use submit_bio() instead.
> > > + *Use this function to access the swap partition and directly access
> > > + *the block device file.
> 
> I don't understand why O_DIRECT gets to bypass the block filter.  Nor do
> I understand why anybody would place a block filter on the swap device.
> But if somebody did place a filter on the swap device, why should swap
> be able to bypass the filter?
> 

I am very happy to hear such a question. You are really trying to
understand the algorithm.

Yes, intercepting the swap partition is absurd. But we can't guarantee
that the filter won't intercept swap.

Swap operation is related to the memory allocation logic. If a swap on
the block device are accessed during memory allocation from filter,
a deadlock occurs. We can allow filters to occasionally shoot off their
feet, especially under high load. But I think it's better not to do it.

"directly access" - it is not O_DIRECT. This means (I think) direct
reading from the device file, like "dd if=/dev/sda1".
As for intercepting direct reading, I don't know how to do the right thing.

The problem here is that in fs/block_dev.c in function __blkdev_direct_IO()
uses the qc - value returned by the submit_bio() function.
This value is used below when calling 
blk_poll(bdev_get_queue(dev), qc, true).
The filter cannot return a meaningful value of the blk_qc_t type when
intercepting a request, because at that time it does not know which queue
the request will fall into.

If function submit_bio() will always return BLK_QC_T_NONE - I think the
algorithm of the __blk dev_direct_IO() will not work correctly.
If we need to intercept direct access to a block device, we need to at
least redo the __blkdev_direct_IO function, getting rid of blk_pool.
I'm not sure it's necessary yet.

-- 
Sergei Shtepa
Veeam Software developer.


Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Matthew Wilcox
On Wed, Oct 21, 2020 at 09:21:36AM +, Damien Le Moal wrote:
> > + * submit_bio_direct - submit a bio to the block device layer for I/O
> > + * bypass filter.
> > + * @bio:  The bio describing the location in memory and on the device.
> >   *
> > + * Description:

You don't need this line.

> > + *This is a version of submit_bio() that shall only be used for I/O
> > + *that cannot be intercepted by block layer filters.
> > + *All file systems and other upper level users of the block layer
> > + *should use submit_bio() instead.
> > + *Use this function to access the swap partition and directly access
> > + *the block device file.

I don't understand why O_DIRECT gets to bypass the block filter.  Nor do
I understand why anybody would place a block filter on the swap device.
But if somebody did place a filter on the swap device, why should swap
be able to bypass the filter?



Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Sergei Shtepa
EXPORT_SYMBOL_GPL - ok.

#ifdef CONFIG_BLK_FILTER or IS_ENABLED() - It's a matter of habit.

> double blank line
Ok, I did.
Looks like a candidate for ./scripts/checkpatch.pl.

> Separate into multiple patches: one that introduces the filter
> functions/ops code and another that changes the block layer where needed.

I'll think about it. Personally, it seems to me that this separation
does not make it easier to understand the code. 
It is important for me to know immediately where the function is called,
and this determines its behavior.

-- 
Sergei Shtepa
Veeam Software developer.


Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Sergei Shtepa
The 10/21/2020 12:14, Johannes Thumshirn wrote:
> On 21/10/2020 11:04, Sergei Shtepa wrote:
> > +   help
> > + Enabling this lets third-party kernel modules intercept
> > + bio requests for any block device. This allows them to implement
> 
> The "third-party kernel modules" part sounds a bit worrisome to me. Especially
> as this functionality is based on EXPORT_SYMBOL()s without the GPL suffix.
> 
> I read it as a "allow a proprietary module to mess with bios", which is a big 
> no-no to me.
> 
> Not providing any sort of changelog doesn't help much either.
> 
> Thanks,
>   Johannes
> 

I think the words "third-party" are is not necessary.
In my opinion, creating proprietary kernel modules for Linux is an empty idea.

EXPORT_SYMBOL() -> EXPORT_SYMBOL_GPL() - no problem.

-- 
Sergei Shtepa
Veeam Software developer.


Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Damien Le Moal
On 2020/10/21 18:04, Sergei Shtepa wrote:
> Signed-off-by: Sergei Shtepa 
> ---
>  block/Kconfig   |  11 ++
>  block/Makefile  |   1 +
>  block/blk-core.c|  52 +--
>  block/blk-filter-internal.h |  29 
>  block/blk-filter.c  | 286 
>  block/partitions/core.c |  14 +-
>  fs/block_dev.c  |   6 +-
>  fs/direct-io.c  |   2 +-
>  fs/iomap/direct-io.c|   2 +-
>  include/linux/bio.h |   4 +-
>  include/linux/blk-filter.h  |  76 ++
>  include/linux/genhd.h   |   8 +-
>  kernel/power/swap.c |   2 +-
>  mm/page_io.c|   4 +-
>  14 files changed, 471 insertions(+), 26 deletions(-)
>  create mode 100644 block/blk-filter-internal.h
>  create mode 100644 block/blk-filter.c
>  create mode 100644 include/linux/blk-filter.h
> 
> diff --git a/block/Kconfig b/block/Kconfig
> index bbad5e8bbffe..a308801b4376 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -204,6 +204,17 @@ config BLK_INLINE_ENCRYPTION_FALLBACK
> by falling back to the kernel crypto API when inline
> encryption hardware is not present.
>  
> +config BLK_FILTER
> + bool "Enable support for block layer filters"
> + default y
> + depends on MODULES
> + help
> +   Enabling this lets third-party kernel modules intercept
> +   bio requests for any block device. This allows them to implement
> +   changed block tracking and snapshots without any reconfiguration of
> +   the existing setup. For example, this option allows snapshotting of
> +   a block device without adding it to LVM.
> +
>  menu "Partition Types"
>  
>  source "block/partitions/Kconfig"
> diff --git a/block/Makefile b/block/Makefile
> index 8d841f5f986f..b8ee50b8e031 100644
> --- a/block/Makefile
> +++ b/block/Makefile
> @@ -38,3 +38,4 @@ obj-$(CONFIG_BLK_SED_OPAL)  += sed-opal.o
>  obj-$(CONFIG_BLK_PM) += blk-pm.o
>  obj-$(CONFIG_BLK_INLINE_ENCRYPTION)  += keyslot-manager.o blk-crypto.o
>  obj-$(CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK) += blk-crypto-fallback.o
> +obj-$(CONFIG_BLK_FILTER) += blk-filter.o
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 10c08ac50697..cc06402af695 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1216,23 +1216,20 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
>  EXPORT_SYMBOL(submit_bio_noacct);

Probably best to have this as its own patch last in the series.

>  
>  /**
> - * submit_bio - submit a bio to the block device layer for I/O
> - * @bio: The  bio which describes the I/O
> - *
> - * submit_bio() is used to submit I/O requests to block devices.  It is 
> passed a
> - * fully set up  bio that describes the I/O that needs to be done.  
> The
> - * bio will be send to the device described by the bi_disk and bi_partno 
> fields.
> + * submit_bio_direct - submit a bio to the block device layer for I/O
> + * bypass filter.
> + * @bio:  The bio describing the location in memory and on the device.
>   *
> - * The success/failure status of the request, along with notification of
> - * completion, is delivered asynchronously through the ->bi_end_io() callback
> - * in @bio.  The bio must NOT be touched by thecaller until ->bi_end_io() has
> - * been called.
> + * Description:
> + *This is a version of submit_bio() that shall only be used for I/O
> + *that cannot be intercepted by block layer filters.
> + *All file systems and other upper level users of the block layer
> + *should use submit_bio() instead.
> + *Use this function to access the swap partition and directly access
> + *the block device file.
>   */
> -blk_qc_t submit_bio(struct bio *bio)
> +blk_qc_t submit_bio_direct(struct bio *bio)
>  {
> - if (blkcg_punt_bio_submit(bio))
> - return BLK_QC_T_NONE;
> -
>   /*
>* If it's a regular read/write or a barrier with data attached,
>* go through the normal accounting stuff before submission.
> @@ -1282,8 +1279,35 @@ blk_qc_t submit_bio(struct bio *bio)
>  
>   return submit_bio_noacct(bio);
>  }
> +EXPORT_SYMBOL(submit_bio_direct);

EXPORT_SYMBOL_GPL

> +
> +/**
> + * submit_bio - submit a bio to the block device layer for I/O
> + * @bio: The  bio which describes the I/O
> + *
> + * submit_bio() is used to submit I/O requests to block devices.  It is 
> passed a
> + * fully set up  bio that describes the I/O that needs to be done.  
> The
> + * bio will be send to the device described by the bi_disk and bi_partno 
> fields.
> + *
> + * The success/failure status of the request, along with notification of
> + * completion, is delivered asynchronously through the ->bi_end_io() callback
> + * in @bio.  The bio must NOT be touched by thecaller until ->bi_end_io() has
> + * been called.
> + */
> +void submit_bio(struct bio *bio)
> +{
> + if (blkcg_punt_bio_submit(bio))
> + return;
> +
> +#ifdef CONFIG_BLK_FILTER> +  

Re: [PATCH 1/2] Block layer filter - second version

2020-10-21 Thread Johannes Thumshirn
On 21/10/2020 11:04, Sergei Shtepa wrote:
> + help
> +   Enabling this lets third-party kernel modules intercept
> +   bio requests for any block device. This allows them to implement

The "third-party kernel modules" part sounds a bit worrisome to me. Especially
as this functionality is based on EXPORT_SYMBOL()s without the GPL suffix.

I read it as a "allow a proprietary module to mess with bios", which is a big 
no-no to me.

Not providing any sort of changelog doesn't help much either.

Thanks,
Johannes