Re: [PATCH 1/1] block: move the PAGE_SECTORS definition into

2020-11-19 Thread John Dorminy
Greetings;

There are a lot of uses of PAGE_SIZE/SECTOR_SIZE scattered around, and
it seems like a medium improvement to be able to refer to it as
PAGE_SECTORS everywhere instead of just inside dm, bcache, and
null_blk. Did this change progress forward somewhere?

Thanks!

John Dorminy


On Mon, Sep 7, 2020 at 3:40 AM Leizhen (ThunderTown)
 wrote:
>
> Hi, Jens Axboe, Alasdair Kergon, Mike Snitzer:
>   What's your opinion?
>
>
> On 2020/8/21 15:05, Coly Li wrote:
> > On 2020/8/21 14:48, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 8/21/2020 12:11 PM, Coly Li wrote:
> >>> On 2020/8/21 10:03, Zhen Lei wrote:
> >>>> There are too many PAGE_SECTORS definitions, and all of them are the
> >>>> same. It looks a bit of a mess. So why not move it into ,
> >>>> to achieve a basic and unique definition.
> >>>>
> >>>> Signed-off-by: Zhen Lei 
> >>>
> >>>
> >>> A lazy question about page size > 4KB: currently in bcache code the
> >>> sector size is assumed to be 512 sectors, if kernel page > 4KB, it is
> >>> possible that PAGE_SECTORS in bcache will be a number > 8 ?
> >>
> >> Sorry, I don't fully understand your question. I known that the sector size
> >> can be 512 or 4K, and the PAGE_SIZE can be 4K or 64K. So even if sector 
> >> size
> >> is 4K, isn't it greater than 8 for 64K pages?
> >>
> >> I'm not sure if the question you're asking is the one Matthew Wilcox has
> >> answered before:
> >> https://www.spinics.net/lists/raid/msg64345.html
> >
> > Thank you for the above information. Currently bcache code assumes
> > sector size is always 512 bytes, you may see how many "<< 9" or ">> 9"
> > are used. Therefore I doubt whether current code may stably work on e.g.
> > 4Kn SSDs (this is only doubt because I don't have such SSD).
> >
> > Anyway your patch is fine to me. This change to bcache doesn't make
> > thins worse or better, maybe it can be helpful to trigger my above
> > suspicious early if people do have this kind of problem on 4Kn sector SSDs.
> >
> > For the bcache part of this patch, you may add,
> > Acked-by: Coly Li 
> >
> > Thanks.
> >
> > Coly Li
> >
> >>>> ---
> >>>>  drivers/block/brd.c   | 1 -
> >>>>  drivers/block/null_blk_main.c | 1 -
> >>>>  drivers/md/bcache/util.h  | 2 --
> >>>>  include/linux/blkdev.h| 5 +++--
> >>>>  include/linux/device-mapper.h | 1 -
> >>>>  5 files changed, 3 insertions(+), 7 deletions(-)
> >>>>
> >>>
> >>> [snipped]
> >>>
> >>>> diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h
> >>>> index c029f7443190805..55196e0f37c32c6 100644
> >>>> --- a/drivers/md/bcache/util.h
> >>>> +++ b/drivers/md/bcache/util.h
> >>>> @@ -15,8 +15,6 @@
> >>>>
> >>>>  #include "closure.h"
> >>>>
> >>>> -#define PAGE_SECTORS  (PAGE_SIZE / 512)
> >>>> -
> >>>>  struct closure;
> >>>>
> >>>>  #ifdef CONFIG_BCACHE_DEBUG
> >>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >>>> index bb5636cc17b91a7..b068dfc5f2ef0ab 100644
> >>>> --- a/include/linux/blkdev.h
> >>>> +++ b/include/linux/blkdev.h
> >>>> @@ -949,11 +949,12 @@ static inline struct request_queue 
> >>>> *bdev_get_queue(struct block_device *bdev)
> >>>>   * multiple of 512 bytes. Hence these two constants.
> >>>>   */
> >>>>  #ifndef SECTOR_SHIFT
> >>>> -#define SECTOR_SHIFT 9
> >>>> +#define SECTOR_SHIFT  9
> >>>>  #endif
> >>>>  #ifndef SECTOR_SIZE
> >>>> -#define SECTOR_SIZE (1 << SECTOR_SHIFT)
> >>>> +#define SECTOR_SIZE   (1 << SECTOR_SHIFT)
> >>>>  #endif
> >>>> +#define PAGE_SECTORS  (PAGE_SIZE / SECTOR_SIZE)
> >>>>
> >>>>  /*
> >>>>   * blk_rq_pos()   : the current sector
> >>> [snipped]
> >>>
> >>>
> >>
> >
> >
> > .
> >
>



Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc

2020-04-30 Thread John Dorminy
>> On Tue, Apr 14, 2020 at 03:13:40PM +0200, Christoph Hellwig wrote:
>> > The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
>> > it.

Greetings;

I recently noticed this change via the linux-next tree.

It may not be possible to edit at this late date, but the change
description refers to PROT_KERNEL, which is a symbol which does not
appear to exist; perhaps PAGE_KERNEL was meant? The mismatch caused me
and a couple other folks some confusion briefly until we decided it
was supposed to be PAGE_KERNEL; if it's not too late, editing the
description to clarify so would be nice.

Many thanks.

John Dorminy



Re: block: be more careful about status in __bio_chain_endio

2019-02-22 Thread John Dorminy
I'm also worried about the other two versions, though:

memory-barriers.txt#1724:

1724 (*) The compiler is within its rights to invent stores to a variable,

i.e. the compiler is free to decide __bio_chain_endio looks like this:

static struct bio *__bio_chain_endio(struct bio *bio)
{
  struct bio *parent = bio->bi_private;
  blk_status_t tmp = parent->bi_status;
  parent->bi_status = bio->bi_status;
  if (!bio->bi_status)
parent->bi_status = tmp;
  bio_put(bio);
  return parent;
}

In which case, the read and later store on the two different threads
may overlap in such a way that bio_endio sometimes sees success, even
if one child had an error.

As a result, I believe the setting of parent->bi_status needs to be a
WRITE_ONCE() and the later reading needs to be a READ_ONCE()
[although, since the later reading happens in many different
functions, perhaps some other barrier to make sure all readers get the
correct value is in order.]


Re: block: be more careful about status in __bio_chain_endio

2019-02-22 Thread John Dorminy
I am perhaps not understanding the intricacies here, or not seeing a
barrier protecting it, so forgive me if I'm off base. I think reading
parent->bi_status here is unsafe.
Consider the following sequence of events on two threads.

Thread 0 Thread 1
In __bio_chain_endio:In __bio_chain_endio:
[A] Child 0 reads parent->bi_status,
no error.
 Child bio 1 reads parent, no error seen
 It sets parent->bi_status to an error
 It calls bio_put.
Child bio 0 calls bio_put
[end __bio_chain_endio]  [end __bio_chain_endio]
 In bio_chain_endio(), bio_endio(parent)
 is called, calling bio_remaining_done()
 which decrements __bi_remaining to 1
 and returns false, so no further endio
 stuff is done.
In bio_chain_endio(), bio_endio(parent)
is called, calling bio_remaining_done(),
decrementing parent->__bi_remaining to
 0, and continuing to finish parent.
Either for block tracing or for parent's
bi_end_io(), this thread tries to read
parent->bi_status again.

The compiler or the CPU may cache the read from [A], and since there
are no intervening barriers, parent->bi_status is still believed on
thread 0 to be success. Thus the bio may still be falsely believed to
have completed successfully, even though child 1 set an error in it.

Am I missing a subtlety here?