Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Hannes Reinecke
On 3/31/21 1:28 AM, Keith Busch wrote:
> On Tue, Mar 30, 2021 at 10:34:25AM -0700, Sagi Grimberg wrote:
>>
>>>> It is, but in this situation, the controller is sending a second
>>>> completion that results in a use-after-free, which makes the
>>>> transport irrelevant. Unless there is some other flow (which is
>>>> unclear
>>>> to me) that causes this which is a bug that needs to be fixed rather
>>>> than hidden with a safeguard.
>>>>
>>>
>>> The kernel should not crash regardless of any network traffic that is
>>> sent to the system.  It should not be possible to either intentionally
>>> of mistakenly contruct packets that will deny service in this way.
>>
>> This is not specific to nvme-tcp. I can build an rdma or pci controller
>> that can trigger the same crash... I saw a similar patch from Hannes
>> implemented in the scsi level, and not the individual scsi transports..
> 
> If scsi wants this too, this could be made generic at the blk-mq level.
> We just need to make something like blk_mq_tag_to_rq(), but return NULL
> if the request isn't started.
>  
>> I would also mention, that a crash is not even the scariest issue that
>> we can see here, because if the request happened to be reused we are
>> in the silent data corruption realm...
> 
> If this does happen, I think we have to come up with some way to
> mitigate it. We're not utilizing the full 16 bits of the command_id, so
> maybe we can append something like a generation sequence number that can
> be checked for validity.
> 
... which will be near impossible.
We can protect against crashing on invalid frames.
We can _not_ protect against maliciously crafted packets referencing any
random _existing_ tag; that's what TLS is for.

What we can do, though, is checking the 'state' field in the tcp
request, and only allow completions for commands which are in a state
allowing for completions.

Let's see if I can whip up a patch.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH -next 1/5] block: add disk sequence number

2021-03-26 Thread Hannes Reinecke

On 3/25/21 6:29 PM, Matteo Croce wrote:

On Mon, Mar 15, 2021 at 10:05 PM Matthew Wilcox  wrote:


On Mon, Mar 15, 2021 at 08:18:24PM +, Matthew Wilcox wrote:

On Mon, Mar 15, 2021 at 09:02:38PM +0100, Matteo Croce wrote:

From: Matteo Croce 

Add a sequence number to the disk devices. This number is put in the
uevent so userspace can correlate events when a driver reuses a device,
like the loop one.


Should this be documented as monotonically increasing?  I think this
is actually a media identifier.  Consider (if you will) a floppy disc.
Back when such things were common, it was possible with personal computers
of the era to have multiple floppy discs "in play" and be prompted to
insert them as needed.  So shouldn't it be possible to support something
similar here -- you're really removing the media from the loop device.
With a monotonically increasing number, you're always destroying the
media when you remove it, but in principle, it should be possible to
reinsert the same media and have the same media identifier number.


So ... a lot of devices have UUIDs or similar.  eg:

$ cat /sys/block/nvme0n1/uuid
e8238fa6-bf53-0001-001b-448b49cec94f

https://linux.die.net/man/8/scsi_id (for scsi)



Hi,

I don't have uuid anywhere:

matteo@saturno:~$ ll /dev/sd?
brw-rw 1 root disk 8,  0 feb 16 13:24 /dev/sda
brw-rw 1 root disk 8, 16 feb 16 13:24 /dev/sdb
brw-rw 1 root disk 8, 32 feb 16 13:24 /dev/sdc
brw-rw 1 root disk 8, 48 feb 16 13:24 /dev/sdd
brw-rw 1 root disk 8, 64 mar  4 06:26 /dev/sde
brw-rw 1 root disk 8, 80 feb 16 13:24 /dev/sdf
matteo@saturno:~$ ll /sys/block/*/uuid
ls: cannot access '/sys/block/*/uuid': No such file or directory

mcroce@t490s:~$ ll /dev/nvme0n1
brw-rw. 1 root disk 259, 0 25 mar 14.22 /dev/nvme0n1
mcroce@t490s:~$ ll /sys/block/*/uuid
ls: cannot access '/sys/block/*/uuid': No such file or directory

I find it only on a mdraid array:

$ cat /sys/devices/virtual/block/md127/md/uuid
26117338-4f54-f14e-b5d4-93feb7fe825d

I'm using a vanilla 5.11 kernel.

The 'uuid' is optional for NVMe devices, and indeed not even present for 
other device types.
Use the 'wwid' attribute, which contains a unique identifier for all 
nvme devices:


# cat /sys/block/nvme*/wwid
nvme.8086-4356504436343735303034323430304e474e-564f303430304b45464a42-0001
nvme.8086-4356504436343735303034363430304e474e-564f303430304b45464a42-0001
uuid.3c6500ee-a775-4c89-b223-e9551f5a9f7a

and for SCSI the wwid is part of the SCSI device:
# cat /sys/block/sd*/device/wwid
naa.600508b1001ce2e648a35b6ec14a3996

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] scsi: Fix a double free in myrs_cleanup

2021-03-11 Thread Hannes Reinecke
On 3/11/21 7:30 AM, Lv Yunlong wrote:
> In myrs_cleanup, cs->mmio_base will be freed twice by
> iounmap().
> 
> Fixes: 77266186397c6 ("scsi: myrs: Add Mylex RAID controller (SCSI 
> interface)")
> Signed-off-by: Lv Yunlong 
> ---
>  drivers/scsi/myrs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/myrs.c b/drivers/scsi/myrs.c
> index 4adf9ded296a..329fd025c718 100644
> --- a/drivers/scsi/myrs.c
> +++ b/drivers/scsi/myrs.c
> @@ -2273,12 +2273,12 @@ static void myrs_cleanup(struct myrs_hba *cs)
>   if (cs->mmio_base) {
>   cs->disable_intr(cs);
>   iounmap(cs->mmio_base);
> + cs->mmio_base = NULL;
>   }
>   if (cs->irq)
>   free_irq(cs->irq, cs);
>   if (cs->io_addr)
>   release_region(cs->io_addr, 0x80);
> - iounmap(cs->mmio_base);
>   pci_set_drvdata(pdev, NULL);
>   pci_disable_device(pdev);
>   scsi_host_put(cs->host);
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [RFC PATCH v1 1/6] badblocks: add more helper structure and routines in badblocks.h

2021-03-03 Thread Hannes Reinecke
On 3/2/21 5:02 AM, Coly Li wrote:
> This patch adds the following helper structure and routines into
> badblocks.h,
> - struct bad_context
>   This structure is used in improved badblocks code for bad table
>   iteration.
> - BB_END()
>   The macro to culculate end LBA of a bad range record from bad
>   table.
> - badblocks_full() and badblocks_empty()
>   The inline routines to check whether bad table is full or empty.
> - set_changed() and clear_changed()
>   The inline routines to set and clear 'changed' tag from struct
>   badblocks.
> 
> These new helper structure and routines can help to make the code more
> clear, they will be used in the improved badblocks code in following
> patches.
> 
> Signed-off-by: Coly Li 
> ---
>  include/linux/badblocks.h | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/include/linux/badblocks.h b/include/linux/badblocks.h
> index 2426276b9bd3..166161842d1f 100644
> --- a/include/linux/badblocks.h
> +++ b/include/linux/badblocks.h
> @@ -15,6 +15,7 @@
>  #define BB_OFFSET(x) (((x) & BB_OFFSET_MASK) >> 9)
>  #define BB_LEN(x)(((x) & BB_LEN_MASK) + 1)
>  #define BB_ACK(x)(!!((x) & BB_ACK_MASK))
> +#define BB_END(x)(BB_OFFSET(x) + BB_LEN(x))
>  #define BB_MAKE(a, l, ack) (((a)<<9) | ((l)-1) | ((u64)(!!(ack)) << 63))
>  
>  /* Bad block numbers are stored sorted in a single page.
> @@ -41,6 +42,14 @@ struct badblocks {
>   sector_t size;  /* in sectors */
>  };
>  
> +struct bad_context {
> + sector_tstart;
> + sector_tlen;
> + int ack;
> + sector_t    orig_start;
> + sector_torig_len;
> +};
> +
Maybe rename it to 'badblocks_context'.
It's not the context which is bad ...

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-03-02 Thread Hannes Reinecke
On 3/1/21 9:59 PM, Keith Busch wrote:
> On Mon, Mar 01, 2021 at 05:53:25PM +0100, Hannes Reinecke wrote:
>> On 3/1/21 5:05 PM, Keith Busch wrote:
>>> On Mon, Mar 01, 2021 at 02:55:30PM +0100, Hannes Reinecke wrote:
>>>> On 3/1/21 2:26 PM, Daniel Wagner wrote:
>>>>> On Sat, Feb 27, 2021 at 02:19:01AM +0900, Keith Busch wrote:
>>>>>> Crashing is bad, silent data corruption is worse. Is there truly no
>>>>>> defense against that? If not, why should anyone rely on this?
>>>>>
>>>>> If we receive an response for which we don't have a started request, we
>>>>> know that something is wrong. Couldn't we in just reset the connection
>>>>> in this case? We don't have to pretend nothing has happened and
>>>>> continuing normally. This would avoid a host crash and would not create
>>>>> (more) data corruption. Or I am just too naive?
>>>>>
>>>> This is actually a sensible solution.
>>>> Please send a patch for that.
>>>
>>> Is a bad frame a problem that can be resolved with a reset?
>>>
>>> Even if so, the reset doesn't indicate to the user if previous commands
>>> completed with bad data, so it still seems unreliable.
>>>
>> We need to distinguish two cases here.
>> The one is use receiving a frame with an invalid tag, leading to a crash.
>> This can be easily resolved by issuing a reset, as clearly the command was
>> garbage and we need to invoke error handling (which is reset).
>>
>> The other case is us receiving a frame with a _duplicate_ tag, ie a tag
>> which is _currently_ valid. This is a case which will fail _even now_, as we
>> have simply no way of detecting this.
>>
>> So what again do we miss by fixing the first case?
>> Apart from a system which does _not_ crash?
> 
> I'm just saying each case is a symptom of the same problem. The only
> difference from observing one vs the other is a race with the host's
> dispatch. And since you're proposing this patch, it sounds like this
> condition does happen on tcp compared to other transports where we don't
> observe it. I just thought the implication that data corruption happens
> is a alarming.
> 
Oh yes, it is.
But sadly TCP inherently suffers from this, as literally anyone can
spoof frames on the network.
Other transports like RDMA or FC do not suffer to that extend as
spoofing frames there is far more elaborate, and not really possible
without dedicated hardware equipment.

That's why there is header and data digest; that will protect you
against accidental frame corruption (as this case clearly is; the
remainder of the frame is filled with zeroes).
It would not protect you against deliberate frame corruption; that's why
there is TPAR 8010 (TLS encryption for NVMe-TCP).

Be it as it may, none of these methods are in use here, and none of
these methods can be made mandatory. So we need to deal with the case at
hand.

And in my opinion crashing is the _worst_ options of all.
Tear the connection down, reset the thing, whatever.

But do not crash.
Customers tend to have a very dim view on crashing machines, and have a
very limited capacity for being susceptible to our reasoning in these cases.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-03-01 Thread Hannes Reinecke

On 3/1/21 5:05 PM, Keith Busch wrote:

On Mon, Mar 01, 2021 at 02:55:30PM +0100, Hannes Reinecke wrote:

On 3/1/21 2:26 PM, Daniel Wagner wrote:

On Sat, Feb 27, 2021 at 02:19:01AM +0900, Keith Busch wrote:

Crashing is bad, silent data corruption is worse. Is there truly no
defense against that? If not, why should anyone rely on this?


If we receive an response for which we don't have a started request, we
know that something is wrong. Couldn't we in just reset the connection
in this case? We don't have to pretend nothing has happened and
continuing normally. This would avoid a host crash and would not create
(more) data corruption. Or I am just too naive?


This is actually a sensible solution.
Please send a patch for that.


Is a bad frame a problem that can be resolved with a reset?

Even if so, the reset doesn't indicate to the user if previous commands
completed with bad data, so it still seems unreliable.


We need to distinguish two cases here.
The one is use receiving a frame with an invalid tag, leading to a 
crash. This can be easily resolved by issuing a reset, as clearly the 
command was garbage and we need to invoke error handling (which is reset).


The other case is us receiving a frame with a _duplicate_ tag, ie a tag 
which is _currently_ valid. This is a case which will fail _even now_, 
as we have simply no way of detecting this.


So what again do we miss by fixing the first case?
Apart from a system which does _not_ crash?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-03-01 Thread Hannes Reinecke
On 3/1/21 2:26 PM, Daniel Wagner wrote:
> On Sat, Feb 27, 2021 at 02:19:01AM +0900, Keith Busch wrote:
>> Crashing is bad, silent data corruption is worse. Is there truly no
>> defense against that? If not, why should anyone rely on this?
> 
> If we receive an response for which we don't have a started request, we
> know that something is wrong. Couldn't we in just reset the connection
> in this case? We don't have to pretend nothing has happened and
> continuing normally. This would avoid a host crash and would not create
> (more) data corruption. Or I am just too naive?
> 
This is actually a sensible solution.
Please send a patch for that.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-26 Thread Hannes Reinecke

On 2/26/21 5:13 PM, Keith Busch wrote:

On Fri, Feb 26, 2021 at 01:54:00PM +0100, Hannes Reinecke wrote:

On 2/26/21 1:35 PM, Daniel Wagner wrote:

On Mon, Feb 15, 2021 at 01:29:45PM -0800, Sagi Grimberg wrote:

Well, I think we should probably figure out why that is happening first.


I got my hands on a tcpdump trace. I've trimmed it to this:


[ .. ]

NVM Express Fabrics TCP
  Pdu Type: CapsuleResponse (5)
  Pdu Specific Flags: 0x00
   ...0 = PDU Header Digest: Not set
   ..0. = PDU Data Digest: Not set
   .0.. = PDU Data Last: Not set
   0... = PDU Data Success: Not set
  Pdu Header Length: 24
  Pdu Data Offset: 0
  Packet Length: 24
  Unknown Data: 020004001b001f00

  00 00 0c 9f f5 a8 b4 96 91 41 16 c0 08 00 45 00   .AE.
0010  00 4c 00 00 40 00 40 06 00 00 0a e4 26 af 0a e4   .L..@.@.&...
0020  c2 1e 11 44 88 4f b8 58 90 ec 8e 1b 32 ed 80 18   ...D.O.X2...
0030  01 01 fe d3 00 00 01 01 08 0a e6 ed ac be d6 a3   
0040  5d 0c 05 00 18 00 18 00 00 00 02 00 04 00 00 00   ]...
0050  00 00 1b 00 00 00 1f 00 00 00 ..


As I suspected, we did receive an invalid frame.
Data digest would have saved us, but then it's not enabled.

So we do need to check if the request is valid before processing it.


That's just addressing a symptom. You can't fully verify the request is
valid this way because the host could have started the same command ID
the very moment before the code checks it, incorrectly completing an
in-flight command and getting data corruption.


Oh, I am fully aware.

Bad frames are just that, bad frames.
We can only fully validate that when digests are enabled, but I gather 
that controllers sending out bad frames wouldn't want to enable digests, 
either. So relying on that is possibly not an option.


So really what I'm trying to avoid is the host crashing on a bad frame.
That kind of thing always resonates bad with customers.
And tripping over an uninitialized command is just too stupid IMO.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-26 Thread Hannes Reinecke
 number)]
 Acknowledgment Number: 1(relative ack number)
 Acknowledgment number (raw): 2384147181
 1000  = Header Length: 32 bytes (8)
 Flags: 0x018 (PSH, ACK)
 000.   = Reserved: Not set
 ...0   = Nonce: Not set
  0...  = Congestion Window Reduced (CWR): Not set
  .0..  = ECN-Echo: Not set
  ..0.  = Urgent: Not set
  ...1  = Acknowledgment: Set
   1... = Push: Set
   .0.. = Reset: Not set
   ..0. = Syn: Not set
   ...0 = Fin: Not set
 [TCP Flags: ···AP···]
 Window: 257
 [Calculated window size: 257]
 [Window size scaling factor: -1 (unknown)]
 Checksum: 0xfed3 [unverified]
 [Checksum Status: Unverified]
 Urgent Pointer: 0
 Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
 TCP Option - No-Operation (NOP)
 Kind: No-Operation (1)
 TCP Option - No-Operation (NOP)
 Kind: No-Operation (1)
 TCP Option - Timestamps: TSval 3874335934, TSecr 3601030412
 Kind: Time Stamp Option (8)
 Length: 10
 Timestamp value: 3874335934
 Timestamp echo reply: 3601030412
 [SEQ/ACK analysis]
 [Bytes in flight: 24]
 [Bytes sent since last PSH flag: 24]
 [Timestamps]
 [Time since first frame in this TCP stream: 0.0 seconds]
 [Time since previous frame in this TCP stream: 0.0 seconds]
 TCP payload (24 bytes)
 [PDU Size: 24]
NVM Express Fabrics TCP
 Pdu Type: CapsuleResponse (5)
 Pdu Specific Flags: 0x00
  ...0 = PDU Header Digest: Not set
  ..0. = PDU Data Digest: Not set
  .0.. = PDU Data Last: Not set
  0... = PDU Data Success: Not set
 Pdu Header Length: 24
 Pdu Data Offset: 0
 Packet Length: 24
 Unknown Data: 020004001b001f00

  00 00 0c 9f f5 a8 b4 96 91 41 16 c0 08 00 45 00   .AE.
0010  00 4c 00 00 40 00 40 06 00 00 0a e4 26 af 0a e4   .L..@.@.&...
0020  c2 1e 11 44 88 4f b8 58 90 ec 8e 1b 32 ed 80 18   ...D.O.X2...
0030  01 01 fe d3 00 00 01 01 08 0a e6 ed ac be d6 a3   
0040  5d 0c 05 00 18 00 18 00 00 00 02 00 04 00 00 00   ]...
0050  00 00 1b 00 00 00 1f 00 00 00 ..


As I suspected, we did receive an invalid frame.
Data digest would have saved us, but then it's not enabled.

So we do need to check if the request is valid before processing it.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [RFC/RFT PATCH] scsi: pm8001: Expose HW queues for pm80xx hw

2021-02-16 Thread Hannes Reinecke

On 2/15/21 6:31 PM, viswa...@microchip.com wrote:

Hi John,

We could test this patch and it works fine. Regarding the usage of 
request->tag, We have some challenges there.
Pm80xx driver need tag for internal command as well. Tag is controller wide and 
we need to assign unique tag
for internal command as well. If we use request->tag, how can we get tag for 
internal commands ? If driver
allocate that, how can we make sure it will not conflict with the request->tag ?

I have posted a patchset for internal tags some time ago; at the time it 
was blocked by the missing shared tagset functionality.
But seeing that we're having it now I guess I'll need to repost; that 
addresses precisely the issue you've described.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-16 Thread Hannes Reinecke

On 2/15/21 10:23 PM, Sagi Grimberg wrote:



blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.


How did you get to this one? did the controller send a completion for
a completed/bogus request?


If that is the case, then that must mean it's possible the driver could
have started the command id just before the bogus completion check. 
Data

iorruption, right?


Yes, which is why I don't think this check is very useful..


I actually view that as a valid protection against spoofed frames.
Without it it's easy to crash the machine by injecting fake 
completions with random command ids.


And this doesn't help because the command can have been easily reused
and started... What is this protecting against? Note that none of the
other transports checks that, why should tcp?


Because it's particularly easy to spoof packets on tcp.
All other nvme-of transports are layered on top of other transports 
which do some sanity checks already, so it becomes really hard to inject 
invalid NVMe-oF frames for those.
NVMe-TCP has none of these protections, making it really easy to inject 
faulty frames (or, heaven forbid, running a packet fuzzer).


And crashing the machine on invalid frames is always a bad idea; I would 
have expected NVMe-TCP to drop them.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-13 Thread Hannes Reinecke

On 2/12/21 10:49 PM, Sagi Grimberg wrote:



blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.


How did you get to this one? did the controller send a completion for
a completed/bogus request?


If that is the case, then that must mean it's possible the driver could
have started the command id just before the bogus completion check. Data
iorruption, right?


Yes, which is why I don't think this check is very useful..


I actually view that as a valid protection against spoofed frames.
Without it it's easy to crash the machine by injecting fake completions 
with random command ids.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-13 Thread Hannes Reinecke

On 2/12/21 7:17 PM, Daniel Wagner wrote:

blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.

Signed-off-by: Daniel Wagner 
---

This patch is against nvme-5.12.

There is one blk_mq_tag_to_rq() in nvme_tcp_recv_ddgst() which I
didn't update as I am not sure if it's also needed.

I guess it is; this patch is essentially a protection against invalid 
frames, and as such affects all places.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme/hwmon: Return error code when registration fails

2021-02-12 Thread Hannes Reinecke

On 2/12/21 10:30 AM, Daniel Wagner wrote:

The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().

Fixes: ec420cdcfab4 ("nvme/hwmon: rework to avoid devm allocation")
Cc: Hannes Reinecke 
Signed-off-by: Daniel Wagner 
---

This patch is against linux-block/for-next.

  drivers/nvme/host/hwmon.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 8f9e96986780..0a586d712920 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -248,6 +248,7 @@ int nvme_hwmon_init(struct nvme_ctrl *ctrl)
if (IS_ERR(hwmon)) {
dev_warn(dev, "Failed to instantiate hwmon device\n");
kfree(data);
+   return PTR_ERR(hwmon);
}
ctrl->hwmon_device = hwmon;
    return 0;


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-02-01 Thread Hannes Reinecke

On 2/1/21 10:40 AM, Chao Leng wrote:



On 2021/2/1 16:57, Hannes Reinecke wrote:

On 2/1/21 9:47 AM, Chao Leng wrote:



On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]

Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to 
the current (and my fixed) versions is?

I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not 
NVME_ANA_OPTIMIZED:

head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;


And this is where I have issues with.
Where does 'old' come from?
Clearly it was part of the list at one point; so what happened to it?

I explained this earlier.
In nvme_ns_remove, there is a hole between list_del_rcu and
nvme_mpath_clear_current_path. If head->current_path is the "old", and
the "old" is removing. The "old" is already removed from the list by
list_del_rcu, but head->current_path is not clear to NULL by
nvme_mpath_clear_current_path.
Find path is race with nvme_ns_remove, use the "old" pass to
nvme_round_robin_path to find path.


Ah. So this should be better:

@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct 
nvme_ns_head *head, int node)

 static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
struct nvme_ns *ns)
 {
-   ns = list_next_or_null_rcu(>list, >siblings, struct 
nvme_ns,

-   siblings);
-   if (ns)
-   return ns;
+   if (ns && !test_bit(NVME_NS_REMOVING, >flags)) {
+   ns = list_next_or_null_rcu(>list, >siblings,
+  struct nvme_ns, siblings);
+   if (ns)
+   return ns;
+   }
return list_first_or_null_rcu(>list, struct nvme_ns, 
siblings);

 }

The 'NVME_NS_REMOVING' bit is set before list_del_rcu() is called, so it 
should guard against the issue you mentioned.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-02-01 Thread Hannes Reinecke

On 2/1/21 9:47 AM, Chao Leng wrote:



On 2021/2/1 15:29, Hannes Reinecke wrote:[ .. ]

Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the 
current (and my fixed) versions is?

I'm still not clear where the problem is once we applied both patches.
For example assume the list has three path, and all path is not 
NVME_ANA_OPTIMIZED:

head->next = ns1;
ns1->next = ns2;
ns2->next = head;
old->next = ns2;


And this is where I have issues with.
Where does 'old' come from?
Clearly it was part of the list at one point; so what happened to it?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-31 Thread Hannes Reinecke

On 2/1/21 3:16 AM, Chao Leng wrote:



On 2021/1/29 17:20, Hannes Reinecke wrote:

On 1/29/21 9:46 AM, Chao Leng wrote:



On 2021/1/29 16:33, Hannes Reinecke wrote:

On 1/29/21 8:45 AM, Chao Leng wrote:



On 2021/1/29 15:06, Hannes Reinecke wrote:

On 1/29/21 4:07 AM, Chao Leng wrote:



On 2021/1/29 9:42, Sagi Grimberg wrote:


You can't see exactly where it dies but I followed the 
assembly to
nvme_round_robin_path(). Maybe it's not the initial 
nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL 
eventually

(list_next_or_null_rcu()).

So there is other bug cause nvme_next_ns abormal.
I review the code about head->list and head->current_path, I 
find 2 bugs

may cause the bug:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ 


Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted 
from the

"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.


The reason why we first remove from head->list and only then clear
current_path is because the other way around there is no way
to guarantee that that the ns won't be assigned as current_path
again (because it is in head->list).

ok, I see.


nvme_ns_remove fences continue of deletion of the ns by 
synchronizing

the srcu such that for sure the current_path clearance is visible.

The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;


Where does 'old' pointing to?


This may cause infinite loop in nvme_round_robin_path.
for (ns = nvme_next_ns(head, old);
 ns != old;
 ns = nvme_next_ns(head, ns))
The ns will always be ns1, and then infinite loop.


No. nvme_next_ns() will return NULL.

If there is just one path(the "old") and the "old" is deleted,
nvme_next_ns() will return NULL.
The list like this:
head->next = head;
old->next = head;
If there is two or more path and the "old" is deleted,
"for" will be infinite loop. because nvme_next_ns() will return
the path which in the list except the "old", check condition will
be true for ever.


But that will be caught by the statement above:

if (list_is_singular(>list))

no?

Two path just a sample example.
If there is just two path, will enter it, may cause no path but there is
actually one path. It is falsely assumed that the "old" must be not 
deleted.

If there is more than two path, will cause infinite loop.

So you mean we'll need something like this?

diff --git a/drivers/nvme/host/multipath.c 
b/drivers/nvme/host/multipath.c

index 71696819c228..8ffccaf9c19a 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct 
nvme_ns_head *head, int node)

  static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
 struct nvme_ns *ns)
  {
-   ns = list_next_or_null_rcu(>list, >siblings, struct 
nvme_ns,

-   siblings);
-   if (ns)
-   return ns;
+   if (ns) {
+   ns = list_next_or_null_rcu(>list, >siblings,
+  struct nvme_ns, siblings);
+   if (ns)
+   return ns;
+   }

No, in the scenario, ns should not be NULL.


Why not? 'ns == NULL' is precisely the corner-case this is trying to fix...


May be we can do like this:

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 282b7a4ea9a9..b895011a2cbd 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct 
nvme_ns_head *head, int node)

     return found;
  }

-static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
-   struct nvme_ns *ns)
-{
-   ns = list_next_or_null_rcu(>list, >siblings, struct 
nvme_ns,

-   siblings);
-   if (ns)
-   return ns;
-   return list_first_or_null_rcu(>list, struct nvme_ns, 
siblings);

-}
+#define nvme_next_ns_condition(head, current, condition) \
+({ \
+   struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \
+   &(current)->siblings, struct nvme_ns, siblings); \
+   __ptr ? __ptr : (condition) ? (condition) = false, \
+   list_first_or_null_rcu(&(head)->list, struct nvme_ns, \
+   siblings) : NULL; \
+})


Urgh. Please, no. That is well impossible to debug.
Can you please open-code it to demonstrate where the difference to the 
current (and my fixed) versions is?

I'm still not clear where the problem is once we applied both patches.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-28 Thread Hannes Reinecke

On 1/29/21 4:07 AM, Chao Leng wrote:



On 2021/1/29 9:42, Sagi Grimberg wrote:



You can't see exactly where it dies but I followed the assembly to
nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head,
old) which returns NULL but nvme_next_ns() is returning NULL eventually
(list_next_or_null_rcu()).

So there is other bug cause nvme_next_ns abormal.
I review the code about head->list and head->current_path, I find 2 bugs
may cause the bug:
First, I already send the patch. see:
https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengc...@huawei.com/ 


Second, in nvme_ns_remove, list_del_rcu is before
nvme_mpath_clear_current_path. This may cause "old" is deleted from the
"head", but still use "old". I'm not sure there's any other
consideration here, I will check it and try to fix it.


The reason why we first remove from head->list and only then clear
current_path is because the other way around there is no way
to guarantee that that the ns won't be assigned as current_path
again (because it is in head->list).

ok, I see.


nvme_ns_remove fences continue of deletion of the ns by synchronizing
the srcu such that for sure the current_path clearance is visible.

The list will be like this:
head->next = ns1;
ns1->next = head;
old->next = ns1;


Where does 'old' pointing to?


This may cause infinite loop in nvme_round_robin_path.
for (ns = nvme_next_ns(head, old);
 ns != old;
 ns = nvme_next_ns(head, ns))
The ns will always be ns1, and then infinite loop.


No. nvme_next_ns() will return NULL.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-28 Thread Hannes Reinecke

On 1/28/21 10:18 AM, Chao Leng wrote:



On 2021/1/28 15:58, Daniel Wagner wrote:

On Thu, Jan 28, 2021 at 09:31:30AM +0800, Chao Leng wrote:

--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -221,7 +221,7 @@ static struct nvme_ns 
*nvme_round_robin_path(struct nvme_ns_head *head,

   }
   for (ns = nvme_next_ns(head, old);
- ns != old;
+ ns && ns != old;

nvme_round_robin_path just be called when !"old".
nvme_next_ns should not return NULL when !"old".
It seems unnecessary to add checking "ns".


The problem is when we enter nvme_round_robin_path() and there is no
path available. In this case the initialization ns = nvme_next_ns(head,
old) could return a NULL pointer."old" should not be NULL, so there is 
at least one path that is "old".

It is impossible to return NULL for nvme_next_ns(head, old).


No. list_next_or_null_rcu()/list_first_or_null_rcu() will return NULL 
when then end of the list is reached.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-27 Thread Hannes Reinecke

On 1/27/21 11:30 AM, Daniel Wagner wrote:

nvme_round_robin_path() should test if the return ns pointer is
valid. nvme_next_ns() will return a NULL pointer if there is no path
left.

Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy")
Cc: Hannes Reinecke 
Signed-off-by: Daniel Wagner 
---
v2:
   - moved NULL test into the if conditional statement
   - added Fixes tag

  drivers/nvme/host/multipath.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9ac762b28811..282b7a4ea9a9 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -221,7 +221,7 @@ static struct nvme_ns *nvme_round_robin_path(struct 
nvme_ns_head *head,
}
  
  	for (ns = nvme_next_ns(head, old);

-ns != old;
+ns && ns != old;
 ns = nvme_next_ns(head, ns)) {
if (nvme_path_is_disabled(ns))
continue;


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 2/3] blk-mq: Always complete remote completions requests in softirq

2021-01-25 Thread Hannes Reinecke

On 1/23/21 9:10 PM, Sebastian Andrzej Siewior wrote:

Controllers with multiple queues have their IRQ-handelers pinned to a
CPU. The core shouldn't need to complete the request on a remote CPU.

Remove this case and always raise the softirq to complete the request.

Signed-off-by: Sebastian Andrzej Siewior 
---
  block/blk-mq.c | 14 +-
  1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f285a9123a8b0..90348ae518461 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -628,19 +628,7 @@ static void __blk_mq_complete_request_remote(void *data)
  {
struct request *rq = data;
  
-	/*

-* For most of single queue controllers, there is only one irq vector
-* for handling I/O completion, and the only irq's affinity is set
-* to all possible CPUs.  On most of ARCHs, this affinity means the irq
-* is handled on one specific CPU.
-*
-* So complete I/O requests in softirq context in case of single queue
-* devices to avoid degrading I/O performance due to irqsoff latency.
-*/
-   if (rq->q->nr_hw_queues == 1)
-   blk_mq_trigger_softirq(rq);
-   else
-   rq->q->mq_ops->complete(rq);
+   blk_mq_trigger_softirq(rq);
  }
  
  static inline bool blk_mq_complete_need_ipi(struct request *rq)



I don't get this.
This code is about _avoiding_ having to raise a softirq if the driver 
exports more than one hardware queue.

So where exactly does the remote CPU case come in here?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme: hwmon: fix crash on device teardown

2021-01-11 Thread Hannes Reinecke

On 1/5/21 10:45 AM, Daniel Wagner wrote:

On Mon, Jan 04, 2021 at 06:06:10PM -0300, Enzo Matsumiya wrote:

@Daniel maybe try tweaking your tests to use a smaller controller
loss timeout (-l option)? I do this on my tests because the default
value kicks in about 30min after hot-removal -- i.e. you
have to actually wait for the timeout to expire to trigger the bug.


As far I can tell, the blktests test I am using will trigger the same
bug. The problem is that the lifetime of hwmon sysfs entry should be
aligned to the lifetime of the nvme sysfs entry. Currently, hwmon's
lifetime is bound to the lifetime of the ctl sysfs entry. When the nvme
entry goes away (and obviously also the underlying device), the hwmon
sysfs entry still references it.


Yeah, using the controller node for devm allocations is quite dodgy.
Does this one help?


diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 6fdd07fb3001..7260af028cf7 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -226,7 +226,7 @@ static const struct hwmon_chip_info

 int nvme_hwmon_init(struct nvme_ctrl *ctrl)
 {
-   struct device *dev = ctrl->dev;
+   struct device *dev = ctrl->device;
struct nvme_hwmon_data *data;
struct device *hwmon;
int err;
@@ -240,8 +240,7 @@ int nvme_hwmon_init(struct nvme_ctrl *ctrl)

err = nvme_hwmon_get_smart_log(data);
if (err) {
-   dev_warn(ctrl->device,
-   "Failed to read smart log (error %d)\n", err);
+   dev_warn(dev, "Failed to read smart log (error %d)\n", err);
devm_kfree(dev, data);
return err;
    }


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED

2021-01-10 Thread Hannes Reinecke

On 1/8/21 9:55 AM, John Garry wrote:

Showing the hctx flags for when BLK_MQ_F_TAG_HCTX_SHARED is set gives
something like:

root@debian:/home/john# more /sys/kernel/debug/block/sda/hctx0/flags
alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED|3

Add the decoding for that flag.

Fixes: 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: John Garry 

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3094542e12ae..ea4ab98e6b25 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -245,6 +245,7 @@ static const char *const hctx_flag_name[] = {
HCTX_FLAG_NAME(BLOCKING),
HCTX_FLAG_NAME(NO_SCHED),
HCTX_FLAG_NAME(STACKING),
+   HCTX_FLAG_NAME(TAG_HCTX_SHARED),
  };
  #undef HCTX_FLAG_NAME
  


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]

2020-12-22 Thread Hannes Reinecke

On 12/22/20 3:53 PM, Mike Snitzer wrote:

[added linux-block and dm-devel, if someone replies to this email to
continue "proper discussion" _please_ at least drop sfr and linux-next
from Cc]

On Tue, Dec 22 2020 at  8:15am -0500,
Christoph Hellwig  wrote:


Mike, Hannes,

I think this patch is rather harmful.  Why does device mapper even
mix file system path with a dev_t and all the other weird forms
parsed by name_to_dev_t, which was supposed to be be for the early
init code where no file system is available.


OK, I'll need to revisit (unless someone beats me to it) because this
could've easily been a blind-spot for me when the dm-init code went in.
Any dm-init specific enabling interface shouldn't be used by more
traditional DM interfaces.  So Hannes' change might be treating symptom
rather than the core problem (which would be better treated by factoring
out dm-init requirements for a name_to_dev_t()-like interface?).

DM has supported passing maj:min and blockdev names on DM table lines
forever... so we'll need to be very specific about where/why things
regressed.



Ok. The problem from my perspective is that device-mapper needs to
a) ensure that the arbitrary string passed in with the table definition 
refers to a valid block device

and
b) the block device can be opened with O_EXCL, so that device-mapper can 
then use it.


Originally (ie prior to commit 644bda6f3460) dm_get_device() just 
converted the string to a 'dev_t' representation, and then the block 
device itself was checked and opened in dm_get_table_device().
'lookup_bdev' was just being used to convert the path if the string was 
not in the canonical major:minor format, as then it was assumed that it 
referred to a block device node, and then lookup_bdev kinda makes sense.


However, lookup_bdev() now always recurses into the filesystem, causing 
multipath to stall in an all-paths-down scenario.


So, the real issue is the table definiton; as it also accepts a device 
to be specified by the block device _node_ name, we need to have a way 
of converting that into a dev_t.


If lookup_bdev() is the wrong interface for that, by all means, please, 
do tell me. I'd be happy to draft up a patch.


Alternatively, if Mike says that only major:minor is the valid format 
for a table definition we can kill that code completely. But clearly _I_ 
cannot make the call here.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [LSFMMBPF 2021] A status update

2020-12-15 Thread Hannes Reinecke

On 12/15/20 1:23 PM, Michal Hocko wrote:

On Sat 12-12-20 17:29:57, Matthew Wilcox wrote:

On Fri, Dec 04, 2020 at 10:48:53AM -0500, Josef Bacik wrote:

We on the program committee hope everybody has been able to stay safe and
healthy during this challenging time, and look forward to being able to see
all of you in person again when it is safe.

The current plans for LSFMMBPF 2021 are to schedule an in person conference
in H2 (after June) of 2021.  The tentative plan is to use the same hotel
that we had planned to use for 2020, as we still have contracts with them.
However clearly that is not set in stone.  The Linux Foundation has done a
wonderful job of working with us to formulate a plan and figure out the
logistics that will work the best for everybody, I really can't thank them
enough for their help.


Thank you all for doing your best in the face of this disruption.  I
really appreciate all the work you're putting in, and I can't wait to
see you all again in person.

I hosted a Zoom call yesterday on the topic of Page Folios, and uploaded
the video.


Thanks for organizing this. I couldn't attent directly but I have
watched the video. I think this was a useful meeting.


There was interest expressed in the call on doing a follow-up
call on the topic of GUP (get_user_pages and friends).  It would probably
also be good to have meetings on other topics.


I hope I will have time to join this one.


I don't want this to be seen in any way as taking away from LSFMMBPF.
I see Zoom calls as an interim solution to not having face-to-face
meetings.


Agreed!


I'd like to solicit feedback from this group on:

  - Time of day.  There is no good time that suits everyone around
the world.  With developers in basically every inhabited time zone, the
call will definitely take place in the middle of somebody's night, and
during somebody else's normal family time.  Publishing the recordings
helps ameliorate some of this, but I feel we should shift the time
around.  Having it at the same time of day helps people fit it into
their schedule of other meetings (and meals), but I think the benefits
of allowing more people to participate live outweighs the costs.


Hard question without any good answer. You can rotate preferred timezone
which should spread the suffering.

What I found useful is to have a fixed schedule for the actual talk, but 
keep the recording online for an extended time (say 24h). And require 
the presenters to be available in a break-out room for any questions for 
an extended time, too.
That way any participant can choose which time would suit him best to 
watch the presentation, _and_ being reasonably sure that he can fire up 
questions later on.



  - Schedule.  Friday's probably a bad day to have it, as it ends up
being Saturday for some people.  It can move around the week too.
Also, probably wise to not have it over Christmas as most developers
have that period as family time.


Yes, Friday tends to be not great. I think mid week should work better
as the overalap


  - Topics.  I'm sure there's no shortage of things to discuss!  I'm
happy to organise meetings for people even on topics I have no direct
interest in.


Thanks for organizing this. I am pretty sure poeple will land on topics
either in the call or over email.


And most urgently, when should we have the GUP meeting? On the call,
I suggested Friday the 8th of January, but I'm happy to set something
up for next week if we'd like to talk more urgently.


I am unlikely to be able to join before the end of year so if you ask
me.

Thanks again and fingers crossed we can actually have a face to face
meeting sometimes during next year.

I'd side with Michal here; having something this year will be 
challenging, as quite some ppl (myself included) will be on holiday.
Rather move it to start of next year; mid-February would be a good time, 
giving enough room to organize etc.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/3] block: blk_interposer - Block Layer Interposer

2020-12-14 Thread Hannes Reinecke

On 12/15/20 7:51 AM, Bob Liu wrote:

Hi Folks,

On 12/12/20 12:56 AM, Hannes Reinecke wrote:

On 12/11/20 5:33 PM, Jens Axboe wrote:

On 12/11/20 9:30 AM, Mike Snitzer wrote:

While I still think there needs to be a proper _upstream_ consumer of
blk_interposer as a condition of it going in.. I'll let others make the
call.


That's an unequivocal rule.


As such, I'll defer to Jens, Christoph and others on whether your
minimalist blk_interposer hook is acceptable in the near-term.


I don't think so, we don't do short term bandaids just to plan on
ripping that out when the real functionality is there. IMHO, the dm
approach is the way to go - it provides exactly the functionality that
is needed in an appropriate way, instead of hacking some "interposer"
into the core block layer.


Which is my plan, too.

I'll be working with the Veeam folks to present a joint patchset (including the 
DM bits) for the next round.



Besides the dm approach, do you think Veeam's original requirement is a good
use case of "block/bpf: add eBPF based block layer IO filtering"?
https://lwn.net/ml/bpf/20200812163305.545447-1-leah.ruman...@gmail.com/


That would actually a really cool use-case.
You could also consider a XDP-like functionality for eBPF, to move 
individual requests from one queue to the other; DM on steroids :-)


Should I try to include that patchset?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/3] block: blk_interposer - Block Layer Interposer

2020-12-11 Thread Hannes Reinecke

On 12/11/20 6:04 PM, Jens Axboe wrote:

On 12/11/20 9:56 AM, Hannes Reinecke wrote:

On 12/11/20 5:33 PM, Jens Axboe wrote:

On 12/11/20 9:30 AM, Mike Snitzer wrote:

While I still think there needs to be a proper _upstream_ consumer of
blk_interposer as a condition of it going in.. I'll let others make the
call.


That's an unequivocal rule.


As such, I'll defer to Jens, Christoph and others on whether your
minimalist blk_interposer hook is acceptable in the near-term.


I don't think so, we don't do short term bandaids just to plan on
ripping that out when the real functionality is there. IMHO, the dm
approach is the way to go - it provides exactly the functionality that
is needed in an appropriate way, instead of hacking some "interposer"
into the core block layer.


Which is my plan, too.

I'll be working with the Veeam folks to present a joint patchset
(including the DM bits) for the next round.


Just to be clear, core block additions for something that dm will
consume is obviously fine. Adding this as block layer functionality than
then exposes an application API for setting it up, tearing down, etc -
that is definitely NOT


That was never my intention.
Any consumers of this thing would need to be in-kernel.
If that was your concern.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/3] block: blk_interposer - Block Layer Interposer

2020-12-11 Thread Hannes Reinecke

On 12/11/20 5:33 PM, Jens Axboe wrote:

On 12/11/20 9:30 AM, Mike Snitzer wrote:

While I still think there needs to be a proper _upstream_ consumer of
blk_interposer as a condition of it going in.. I'll let others make the
call.


That's an unequivocal rule.


As such, I'll defer to Jens, Christoph and others on whether your
minimalist blk_interposer hook is acceptable in the near-term.


I don't think so, we don't do short term bandaids just to plan on
ripping that out when the real functionality is there. IMHO, the dm
approach is the way to go - it provides exactly the functionality that
is needed in an appropriate way, instead of hacking some "interposer"
into the core block layer.


Which is my plan, too.

I'll be working with the Veeam folks to present a joint patchset 
(including the DM bits) for the next round.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] nvme: hwmon: fix crash on device teardown

2020-12-11 Thread Hannes Reinecke

On 12/9/20 10:32 PM, Enzo Matsumiya wrote:

Fix a possible NULL pointer dereference when trying to read
hwmon sysfs entries associated to NVMe-oF devices that were
hot-removed or disconnected.

Unregister the NVMe hwmon device upon controller teardown
(nvme_stop_ctrl()).

Signed-off-by: Enzo Matsumiya 
---
  drivers/nvme/host/core.c  | 1 +
  drivers/nvme/host/hwmon.c | 8 
  drivers/nvme/host/nvme.h  | 2 ++
  3 files changed, 11 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9a270e49df17..becc80a0c3b8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4344,6 +4344,7 @@ EXPORT_SYMBOL_GPL(nvme_complete_async_event);
  
  void nvme_stop_ctrl(struct nvme_ctrl *ctrl)

  {
+   nvme_hwmon_exit(ctrl);
nvme_mpath_stop(ctrl);
nvme_stop_keep_alive(ctrl);
flush_work(>async_event_work);
diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 552dbc04567b..7f62cca4c577 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -71,6 +71,9 @@ static int nvme_hwmon_read(struct device *dev, enum 
hwmon_sensor_types type,
int temp;
int err;
  
+	if (data->ctrl->state != NVME_CTRL_LIVE)

+   return -EAGAIN;
+
/*
 * First handle attributes which don't require us to read
 * the smart log.
@@ -253,3 +256,8 @@ int nvme_hwmon_init(struct nvme_ctrl *ctrl)
  
  	return 0;

  }
+
+void nvme_hwmon_exit(struct nvme_ctrl *ctrl)
+{
+   hwmon_device_unregister(ctrl->dev);
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 567f7ad18a91..621e9b1575f6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -807,11 +807,13 @@ static inline struct nvme_ns *nvme_get_ns_from_dev(struct 
device *dev)
  
  #ifdef CONFIG_NVME_HWMON

  int nvme_hwmon_init(struct nvme_ctrl *ctrl);
+void nvme_hwmon_exit(struct nvme_ctrl *ctrl);
  #else
  static inline int nvme_hwmon_init(struct nvme_ctrl *ctrl)
  {
return 0;
  }
+static inline void nvme_hwmon_exit(struct nvme_ctrl *ctrl) { }
  #endif
  
  u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns,



Something's fishy here.
The hwmon attributes should have been created only once for the lifetime 
of the controller (creation is protected by '!initialized').
And the hwmon lifetime should've been controlled by the lifetime of the 
controller (which to my knowledge was the idea behind the devm_* thingies).

So why do we have to deallocate the hwmon attributes?
And why on reset? And who's re-creating them after reset, seeing that 
'initialized' should be true?

Hmm?

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [RFC PATCH v2 0/2] add simple copy support

2020-12-07 Thread Hannes Reinecke

On 12/7/20 11:12 PM, Douglas Gilbert wrote:

On 2020-12-07 9:56 a.m., Hannes Reinecke wrote:

On 12/7/20 3:11 PM, Christoph Hellwig wrote:

So, I'm really worried about:

  a) a good use case.  GC in f2fs or btrfs seem like good use cases, as
 does accelating dm-kcopyd.  I agree with Damien that lifting 
dm-kcopyd
 to common code would also be really nice.  I'm not 100% sure it 
should

 be a requirement, but it sure would be nice to have
 I don't think just adding an ioctl is enough of a use case for 
complex

 kernel infrastructure.
  b) We had a bunch of different attempts at SCSI XCOPY support form 
IIRC

 Martin, Bart and Mikulas.  I think we need to pull them into this
 discussion, and make sure whatever we do covers the SCSI needs.

And we shouldn't forget that the main issue which killed all previous 
implementations was a missing QoS guarantee.
It's nice to have simply copy, but if the implementation is _slower_ 
than doing it by hand from the OS there is very little point in even 
attempting to do so.
I can't see any provisions for that in the TPAR, leading me to the 
assumption that NVMe simple copy will suffer from the same issue.


So if we can't address this I guess this attempt will fail, too.


I have been doing quite a lot of work and testing in my sg driver rewrite
in the copy and compare area. The baselines for performance are dd and
io_uring-cp (in liburing). There are lots of ways to improve on them. Here
are some:
    - the user data need never pass through the user space (could
  mmap it out during the READ if there is a good reason). Only the
  metadata (e.g. NVMe or SCSI commands) needs to come from the user
  space and errors, if any, reported back to the user space.
    - break a large copy (or compare) into segments, with each segment
  a "comfortable" size for the OS to handle, say 256 KB
    - there is one constraint: the READ in each segment must complete
  before its paired WRITE can commence
  - extra constraint for some zoned disks: WRITEs must be
    issued in order (assuming they are applied in that order, if
    not, need to wait until each WRITE completes)
    - arrange for READ WRITE pair in each segment to share the same bio
    - have multiple slots each holding a segment (i.e. a bio and
  metadata to process a READ-WRITE pair)
    - re-use each slot's bio for the following READ-WRITE pair
    - issue the READs in each slot asynchronously and do an interleaved
  (io)poll for completion. Then issue the paired WRITE
  asynchronously
    - the above "slot" algorithm runs in one thread, so there can be
  multiple threads doing the same algorithm. Segment manager needs
  to be locked (or use an atomics) so that each segment (identified
  by its starting LBAs) is issued once and only once when the
  next thread wants a segment to copy

Running multiple threads gives diminishing or even worsening returns.
Runtime metrics on lock contention and storage bus capacity may help
choosing the number of threads. A simpler approach might be add more
threads until the combined throughput increase is less than 10% say.


The 'compare' that I mention is based on the SCSI VERIFY(BYTCHK=1) command
(or NVMe NVM Compare command). Using dd logic, a disk to disk compare can
be implemented with not much more work than changing the WRITE to a VERIFY
command. This is a different approach to the Linux cmp utility which
READs in both sides and does a memcmp() type operation. Using ramdisks
(from the scsi_debug driver) the compare operation (max ~ 10 GB/s) was
actually faster than the copy (max ~ 7 GB/s). I put this down to WRITE
operations taking a write lock over the store while the VERIFY only
needs a read lock so many VERIFY operations can co-exist on the same
store. Unfortunately on real SAS and NVMe SSDs that I tested the
performance of the VERIFY and NVM Compare commands is underwhelming.
For comparison, using scsi_debug ramdisks, dd copy throughput was
< 1 GB/s and io_uring-cp was around 2-3 GB/s. The system was Ryzen
3600 based.


Which is precisely my concern.
Simple copy might be efficient for one particular implementation, but it 
might be completely off the board for others.
But both will be claiming to support it, and us having no idea whether 
choosing simple copy will speed up matters or not.
Without having a programmatic way to figure out the speed of the 
implementation we have to detect the performance ourselves, like the 
benchmarking loop RAID5 does.
I was hoping to avoid that, and just ask the device/controller; but that 
turned out to be in vain.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [RFC PATCH v2 0/2] add simple copy support

2020-12-07 Thread Hannes Reinecke

On 12/7/20 3:11 PM, Christoph Hellwig wrote:

So, I'm really worried about:

  a) a good use case.  GC in f2fs or btrfs seem like good use cases, as
 does accelating dm-kcopyd.  I agree with Damien that lifting dm-kcopyd
 to common code would also be really nice.  I'm not 100% sure it should
 be a requirement, but it sure would be nice to have
 I don't think just adding an ioctl is enough of a use case for complex
 kernel infrastructure.
  b) We had a bunch of different attempts at SCSI XCOPY support form IIRC
 Martin, Bart and Mikulas.  I think we need to pull them into this
 discussion, and make sure whatever we do covers the SCSI needs.

And we shouldn't forget that the main issue which killed all previous 
implementations was a missing QoS guarantee.
It's nice to have simply copy, but if the implementation is _slower_ 
than doing it by hand from the OS there is very little point in even 
attempting to do so.
I can't see any provisions for that in the TPAR, leading me to the 
assumption that NVMe simple copy will suffer from the same issue.


So if we can't address this I guess this attempt will fail, too.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-07 Thread Hannes Reinecke

On 12/4/20 3:26 PM, Brian King wrote:

On 12/2/20 11:27 AM, Tyrel Datwyler wrote:

On 12/2/20 7:14 AM, Brian King wrote:

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:

Introduce several new vhost fields for managing MQ state of the adapter
as well as initial defaults for MQ enablement.

Signed-off-by: Tyrel Datwyler 
---
  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
  2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..f1d677a7423d 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
struct vio_device_id *id)
}
  
  	shost->transportt = ibmvfc_transport_template;

-   shost->can_queue = max_requests;
+   shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);


This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
queue depth.


Our max_requests is the total number commands allowed across all queues. From
what I understand is can_queue is the total number of commands in flight allowed
for each hw queue.

 /*
  * In scsi-mq mode, the number of hardware queues supported by the LLD.
  *
  * Note: it is assumed that each hardware queue has a queue depth of
  * can_queue. In other words, the total queue depth per host
  * is nr_hw_queues * can_queue. However, for when host_tagset is set,
  * the total queue depth is can_queue.
  */

We currently don't use the host wide shared tagset.


Ok. I missed that bit... In that case, since we allocate by default only 100
event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
we end up with only about 6 commands that can be outstanding per queue,
which is going to really hurt performance... I'd suggest bumping up
IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.


Before doing that I'd rather use the host-wide shared tagset.
Increasing the number of requests will increase the memory footprint of 
the driver (as each request will be statically allocated).


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 00/13] ibmvfc: initial MQ development

2020-12-02 Thread Hannes Reinecke

On 11/26/20 2:48 AM, Tyrel Datwyler wrote:

Recent updates in pHyp Firmware and VIOS releases provide new infrastructure
towards enabling Subordinate Command Response Queues (Sub-CRQs) such that each
Sub-CRQ is a channel backed by an actual hardware queue in the FC stack on the
partner VIOS. Sub-CRQs are registered with the firmware via hypercalls and then
negotiated with the VIOS via new Management Datagrams (MADs) for channel setup.

This initial implementation adds the necessary Sub-CRQ framework and implements
the new MADs for negotiating and assigning a set of Sub-CRQs to associated VIOS
HW backed channels. The event pool and locking still leverages the legacy single
queue implementation, and as such lock contention is problematic when increasing
the number of queues. However, this initial work demonstrates a 1.2x factor
increase in IOPs when configured with two HW queues despite lock contention.


Why do you still hold the hold lock during submission?
An initial check on the submission code path didn't reveal anything 
obvious, so it _should_ be possible to drop the host lock there.
Or at least move it into the submission function itself to avoid lock 
contention. Hmm?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH V1] block: Fix use-after-free while iterating over requests

2020-11-29 Thread Hannes Reinecke

On 11/26/20 5:49 PM, John Garry wrote:

On 26/11/2020 16:27, Bart Van Assche wrote:

On 11/26/20 7:02 AM, Pradeep P V K wrote:

Observes below crash while accessing (use-after-free) request queue
member of struct request.

191.784789:   <2> Unable to handle kernel paging request at virtual
address ff81429a4440
...
191.786174:   <2> CPU: 3 PID: 213 Comm: kworker/3:1H Tainted: G S
O  5.4.61-qgki-debug-ge45de39 #1
...
191.786226:   <2> Workqueue: kblockd blk_mq_timeout_work
191.786242:   <2> pstate: 20c5 (nzCv daif +PAN +UAO)
191.786261:   <2> pc : bt_for_each+0x114/0x1a4
191.786274:   <2> lr : bt_for_each+0xe0/0x1a4
...
191.786494:   <2> Call trace:
191.786507:   <2>  bt_for_each+0x114/0x1a4
191.786519:   <2>  blk_mq_queue_tag_busy_iter+0x60/0xd4
191.786532:   <2>  blk_mq_timeout_work+0x54/0xe8
191.786549:   <2>  process_one_work+0x2cc/0x568
191.786562:   <2>  worker_thread+0x28c/0x518
191.786577:   <2>  kthread+0x160/0x170
191.786594:   <2>  ret_from_fork+0x10/0x18
191.786615:   <2> Code: 0b080148 f9404929 f8685921 b4fffe01 (f9400028)
191.786630:   <2> ---[ end trace 0f1f51d79ab3f955 ]---
191.786643:   <2> Kernel panic - not syncing: Fatal exception

Fix this by updating the freed request with NULL.
This could avoid accessing the already free request from other
contexts while iterating over the requests.

Signed-off-by: Pradeep P V K 
---
  block/blk-mq.c | 1 +
  block/blk-mq.h | 1 +
  2 files changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 55bcee5..9996cb1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -492,6 +492,7 @@ static void __blk_mq_free_request(struct request 
*rq)

  blk_crypto_free_request(rq);
  blk_pm_mark_last_busy(rq);
+    hctx->tags->rqs[rq->tag] = NULL;
  rq->mq_hctx = NULL;
  if (rq->tag != BLK_MQ_NO_TAG)
  blk_mq_put_tag(hctx->tags, ctx, rq->tag);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index a52703c..8747bf1 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -224,6 +224,7 @@ static inline int __blk_mq_active_requests(struct 
blk_mq_hw_ctx *hctx)

  static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
 struct request *rq)
  {
+    hctx->tags->rqs[rq->tag] = NULL;
  blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
  rq->tag = BLK_MQ_NO_TAG;


Is this perhaps a block driver bug instead of a block layer core bug? If
this would be a block layer core bug, it would have been reported before.


Isn't this the same issue which as been reported many times:

https://lore.kernel.org/linux-block/20200820180335.3109216-1-ming@redhat.com/ 



https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96f...@huawei.com/ 



But I never saw a crash, just kasan report.

And if that above were a concern, I would have thought one would need to 
use a WRITE_ONCE() here; otherwise we might have a race condition where 
other CPUs still see the old value, no?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/2] block layer filter and block device snapshot module

2020-10-23 Thread Hannes Reinecke

On 10/23/20 11:13 AM, h...@infradead.org wrote:

On Thu, Oct 22, 2020 at 01:54:16PM -0400, Mike Snitzer wrote:

On Thu, Oct 22, 2020 at 11:14 AM Darrick J. Wong

Stupid question: Why don't you change the block layer to make it
possible to insert device mapper devices after the blockdev has been set
up?


Not a stupid question.  Definitely something that us DM developers
have wanted to do for a while.  Devil is in the details but it is the
right way forward.



Yes, I think that is the right thing to do.  And I don't think it should
be all that hard.  All we'd need in the I/O path is something like the
pseudo-patch below, which will allow the interposer driver to resubmit
bios using submit_bio_noacct as long as the driver sets BIO_INTERPOSED.

diff --git a/block/blk-core.c b/block/blk-core.c
index ac00d2fa4eb48d..3f6f1eb565e0a8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1051,6 +1051,9 @@ blk_qc_t submit_bio_noacct(struct bio *bio)
return BLK_QC_T_NONE;
}
  
+	if (blk_has_interposer(bio->bi_disk) &&

+   !(bio->bi_flags & BIO_INTERPOSED))
+   return __submit_bio_interposed(bio);
if (!bio->bi_disk->fops->submit_bio)
return __submit_bio_noacct_mq(bio);
return __submit_bio_noacct(bio);

My thoughts went more into the direction of hooking into ->submit_bio, 
seeing that it's a NULL pointer for most (all?) block drivers.


But sure, I'll check how the interposer approach would turn out.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [RFC] synchronous readpage for buffer_heads

2020-10-23 Thread Hannes Reinecke

On 10/22/20 5:22 PM, Matthew Wilcox wrote:

I'm working on making readpage synchronous so that it can actually return
errors instead of futilely setting PageError.  Something that's common
between most of the block based filesystems is the need to submit N
I/Os and wait for them to all complete (some filesystems don't support
sub-page block size, so they don't have this problem).

I ended up coming up with a fairly nice data structure which I've called
the blk_completion.  It waits for 'n' events to happen, then wakes the
task that cares, unless the task has got bored and wandered off to do
something else.

block_read_full_page() then uses this data structure to submit 'n' buffer
heads and wait for them to all complete.  The fscrypt code doesn't work
in this scheme, so I bailed on that for now.  I have ideas for fixing it,
but they can wait.

Obviously this all needs documentation, but I'd like feedback on the
idea before I do that.  I have given it some light testing, but there
aren't too many filesystems left that use block_read_full_page() so I
haven't done a proper xfstests run.

Hmm. You are aware, of course, that hch et al are working on replacing 
bhs with iomap, right?
So wouldn't it be more useful to concentrate on the iomap code, and 
ensure that _that_ is working correctly?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/2] block layer filter and block device snapshot module

2020-10-21 Thread Hannes Reinecke

On 10/21/20 4:10 PM, Sergei Shtepa wrote:

The 10/21/2020 16:31, Hannes Reinecke wrote:

I do understand where you are coming from, but then we already have a
dm-snap which does exactly what you want to achieve.
Of course, that would require a reconfiguration of the storage stack on
the machine, which is not always possible (or desired).


Yes, reconfiguring the storage stack on a machine is almost impossible.



What I _could_ imagine would be a 'dm-intercept' thingie, which
redirects the current submit_bio() function for any block device, and
re-routes that to a linear device-mapper device pointing back to the
original block device.

That way you could attach it to basically any block device, _and_ can
use the existing device-mapper functionality to do fancy stuff once the
submit_io() callback has been re-routed.

And it also would help in other scenarios, too; with such a
functionality we could seamlessly clone devices without having to move
the whole setup to device-mapper first.


Hm...
Did I understand correctly that the filter itself can be left approximately
as it is, but the blk-snap module can be replaced with 'dm-intercept',
which would use the re-route mechanism from the dm?
I think I may be able to implement it, if you describe your idea in more
detail.


Actually, once we have an dm-intercept, why do you need the block-layer 
filter at all?
From you initial description the block-layer filter was implemented 
such that blk-snap could work; but if we have dm-intercept (and with it 
the ability to use device-mapper functionality even for normal block 
devices) there wouldn't be any need for the block-layer filter, no?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 0/2] block layer filter and block device snapshot module

2020-10-21 Thread Hannes Reinecke
/linux/blk-filter.h

--
2.20.1

I do understand where you are coming from, but then we already have a 
dm-snap which does exactly what you want to achieve.
Of course, that would require a reconfiguration of the storage stack on 
the machine, which is not always possible (or desired).


What I _could_ imagine would be a 'dm-intercept' thingie, which 
redirects the current submit_bio() function for any block device, and 
re-routes that to a linear device-mapper device pointing back to the 
original block device.


That way you could attach it to basically any block device, _and_ can 
use the existing device-mapper functionality to do fancy stuff once the 
submit_io() callback has been re-routed.


And it also would help in other scenarios, too; with such a 
functionality we could seamlessly clone devices without having to move 
the whole setup to device-mapper first.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH] block: switch to pr_warn() in __device_add_disk()

2020-10-11 Thread Hannes Reinecke

On 10/11/20 3:03 PM, Rustam Kovhaev wrote:

syzbot triggered a warning while fuzzing with failslab fault injection
enabled
let's convert WARN_ON() to pr_warn()

Reported-and-tested-by: syzbot+f41893bb8c45cd18c...@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f41893bb8c45cd18cf08
Signed-off-by: Rustam Kovhaev 
---
  block/genhd.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/genhd.c b/block/genhd.c
index 99c64641c314..be9ce35cf0fe 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -822,7 +822,8 @@ static void __device_add_disk(struct device *parent, struct 
gendisk *disk,
/* Register BDI before referencing it from bdev */
dev->devt = devt;
ret = bdi_register(bdi, "%u:%u", MAJOR(devt), MINOR(devt));
-   WARN_ON(ret);
+   if (ret)
+   pr_warn("%s: failed to register backing dev info\n", 
disk->disk_name);
bdi_set_owner(bdi, dev);
blk_register_region(disk_devt(disk), disk->minors, NULL,
exact_match, exact_lock, disk);


Please, don't. Where is the point in continuing here?
I'd rather have it fixed up properly, either by having a return value to 
__device_add_disk() or by allowing the caller to check (eg by checking 
GENHD_FL_UP) if the call succeeded.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [v5 01/12] struct device: Add function callback durable_name

2020-10-07 Thread Hannes Reinecke

On 10/7/20 10:10 PM, Tony Asleson wrote:

On 10/1/20 6:48 AM, Greg Kroah-Hartman wrote:

On Wed, Sep 30, 2020 at 09:35:52AM -0500, Tony Asleson wrote:

On 9/30/20 2:38 AM, Greg Kroah-Hartman wrote:

On Tue, Sep 29, 2020 at 05:04:32PM -0500, Tony Asleson wrote:

I'm trying to figure out a way to positively identify which storage
device an error belongs to over time.


"over time" is not the kernel's responsibility.

This comes up every 5 years or so. The kernel provides you, at runtime,
a mapping between a hardware device and a "logical" device.  It can
provide information to userspace about this mapping, but once that
device goes away, the kernel is free to reuse that logical device again.

If you want to track what logical devices match up to what physical
device, then do it in userspace, by parsing the log files.


I don't understand why people think it's acceptable to ask user space to
parse text that is subject to change.


What text is changing? The format of of the prefix of dev_*() is well
known and has been stable for 15+ years now, right?  What is difficult
in parsing it?


Many of the storage layer messages are using printk, not dev_printk.


So that would be the immediate angle of attack ...


Thank you for supplying some feedback and asking questions.  I've been
asking for suggestions and would very much like to have a discussion on
how this issue is best solved.  I'm not attached to what I've provided.
I'm just trying to get towards a solution.


Again, solve this in userspace, you have the information there at
runtime, why not use it?


We usually don't have the needed information if you remove the
expectation that user space should parse the human readable portion of
the error message.


I don't expect that userspace should have to parse any human readable
portion, if they don't want to.  But if you do want it to, it is pretty
trivial to parse what you have today:

scsi 2:0:0:0: Direct-Access Generic  STORAGE DEVICE   1531 PQ: 0 
ANSI: 6

If you really have a unique identifier, then great, parse it today:

usb 4-1.3.1: Product: USB3.0 Card Reader
usb 4-1.3.1: Manufacturer: Generic
usb 4-1.3.1: SerialNumber: 1531

What's keeping that from working now?


I believe these examples are using dev_printk.  With dev_printk we don't
need to parse the text, we can use the meta data.
So it looks as most of your usecase would be solved by moving to 

dev_printk().
Why not work on that instead?
I do presume this will have immediate benefits for everybody, and will 
have approval from everyone.


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 19/19] block: remove check_disk_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Remove the now unused check_disk_change helper.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  fs/block_dev.c| 20 
>  include/linux/genhd.h |  1 -
>  2 files changed, 21 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 18/19] sr: simplify sr_block_revalidate_disk

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Both callers have a valid CD struture available, so rely on that instead
> of getting another reference.  Also move the function to avoid a forward
> declaration.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/scsi/sr.c | 36 +---
>  1 file changed, 13 insertions(+), 23 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 17/19] sr: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call sr_block_revalidate_disk manually.  Also add an explicit call to
> sr_block_revalidate_disk just before disk_add() to ensure we always
> read check for a ready unit and read the TOC and then stop wiring up
> ->revalidate_disk.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/scsi/sr.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Re: [PATCH 16/19] sd: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call sd_revalidate_disk manually.  As sd also calls sd_revalidate_disk
> manually during probe and open, , the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/scsi/sd.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 15/19] md: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> The md driver does not have a ->revalidate_disk method, so it can just
> use bdev_check_media_change without any additional changes.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/md/md.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 9562ef598ae1f4..27ed61197014ef 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7848,7 +7848,7 @@ static int md_open(struct block_device *bdev, fmode_t 
> mode)
>   atomic_inc(>openers);
>   mutex_unlock(>open_mutex);
>  
> - check_disk_change(bdev);
> + bdev_check_media_change(bdev);
>   out:
>   if (err)
>   mddev_put(mddev);
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 14/19] ide-gd: stop using the disk events mechanism

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> ide-gd is only using the disk events mechanism to be able to force an
> invalidation and partition scan on opening removable media.  Just open
> code the logic without invoving the block layer.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/ide/ide-disk.c   |  5 +
>  drivers/ide/ide-floppy.c |  2 --
>  drivers/ide/ide-gd.c | 48 +---
>  include/linux/ide.h  |  2 --
>  4 files changed, 7 insertions(+), 50 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 13/19] ide-cd: remove idecd_revalidate_disk

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Just merge the trivial function into its only caller.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/ide/ide-cd.c | 17 +
>  1 file changed, 5 insertions(+), 12 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 11/19] gdrom: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> The Sega GD-ROM driver does not have a ->revalidate_disk method, so it
> can just use bdev_check_media_change without any additional changes.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/cdrom/gdrom.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cdrom/gdrom.c b/drivers/cdrom/gdrom.c
> index 09b0cd292720fa..9874fc1c815b53 100644
> --- a/drivers/cdrom/gdrom.c
> +++ b/drivers/cdrom/gdrom.c
> @@ -479,7 +479,7 @@ static int gdrom_bdops_open(struct block_device *bdev, 
> fmode_t mode)
>  {
>   int ret;
>  
> - check_disk_change(bdev);
> + bdev_check_media_change(bdev);
>  
>   mutex_lock(_mutex);
>   ret = cdrom_open(gd.cd_info, bdev, mode);
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 12/19] ide-cd: use bdev_check_media_changed

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_changed instead of check_disk_change and
> call idecd_revalidate_disk manually.  Given that idecd_revalidate_disk
> only re-reads the TOC, and we already do the same at probe time, the
> extra call into ->revalidate_disk from bdev_disk_changed is not required
> either, so stop wiring up the method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/ide/ide-cd.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 10/19] paride/pcd: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> The pcd driver does not have a ->revalidate_disk method, so it can just
> use bdev_check_media_change without any additional changes.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/paride/pcd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/paride/pcd.c b/drivers/block/paride/pcd.c
> index 5124eca90e8337..70da8b86ce587d 100644
> --- a/drivers/block/paride/pcd.c
> +++ b/drivers/block/paride/pcd.c
> @@ -233,7 +233,7 @@ static int pcd_block_open(struct block_device *bdev, 
> fmode_t mode)
>   struct pcd_unit *cd = bdev->bd_disk->private_data;
>   int ret;
>  
> - check_disk_change(bdev);
> + bdev_check_media_change(bdev);
>  
>   mutex_lock(_mutex);
>   ret = cdrom_open(>info, bdev, mode);
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 09/19] xsysace: simplify media change handling

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Pass a struct ace_device to ace_revalidate_disk, move the media changed
> check into the one caller that needs it, and give the routine a better
> name.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/xsysace.c | 26 ++
>  1 file changed, 10 insertions(+), 16 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 08/19] xsysace: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call ace_revalidate_disk manually.  Given that ace_revalidate_disk only
> deals with media change events, the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/xsysace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 07/19] swim3: use bdev_check_media_changed

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_changed instead of check_disk_change and
> call floppy_revalidate manually.  Given that floppy_revalidate only
> deals with media change events, the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/swim3.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 06/19] swim: simplify media change handling

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> floppy_revalidate mostly duplicates work already done in floppy_open
> despite only beeing called from floppy_open.  Remove the function and
> just clear the ->ejected flag directly under the right condition.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/swim.c | 24 ++--
>  1 file changed, 2 insertions(+), 22 deletions(-)
> 
What a convoluted driver.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 03/19] ataflop: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call floppy_revalidate manually.  Given that floppy_revalidate only
> deals with media change events, the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/ataflop.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 01/19] block: add a bdev_check_media_change helper

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Like check_disk_changed, except that it does not call ->revalidate_disk
> but leaves that to the caller.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  block/genhd.c | 29 -
>  fs/block_dev.c| 17 +++--
>  include/linux/genhd.h |  2 +-
>  3 files changed, 32 insertions(+), 16 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 05/19] swim: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call floppy_revalidate manually.  Given that floppy_revalidate only
> deals with media change events, the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/swim.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 04/19] floppy: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> Switch to use bdev_check_media_change instead of check_disk_change and
> call floppy_revalidate manually.  Given that floppy_revalidate only
> deals with media change events, the extra call into ->revalidate_disk
> from bdev_disk_changed is not required either, so stop wiring up the
> method.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/floppy.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 02/19] amiflop: use bdev_check_media_change

2020-09-09 Thread Hannes Reinecke
On 9/8/20 4:53 PM, Christoph Hellwig wrote:
> The Amiga floppy driver does not have a ->revalidate_disk method, so it
> can just use bdev_check_media_change without any additional changes.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> ---
>  drivers/block/amiflop.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c
> index 226219da3da6a7..71c2b156455860 100644
> --- a/drivers/block/amiflop.c
> +++ b/drivers/block/amiflop.c
> @@ -1670,7 +1670,7 @@ static int floppy_open(struct block_device *bdev, 
> fmode_t mode)
>   }
>  
>   if (mode & (FMODE_READ|FMODE_WRITE)) {
> - check_disk_change(bdev);
> + bdev_check_media_change(bdev);
>   if (mode & FMODE_WRITE) {
>   int wrprot;
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v8 00/18] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs

2020-09-08 Thread Hannes Reinecke
On 8/19/20 5:20 PM, John Garry wrote:
> Hi all,
> 
> Here is v8 of the patchset.
> 
> In this version of the series, we keep the shared sbitmap for driver tags,
> and introduce changes to fix up the tag budgeting across request queues.
> We also have a change to count requests per-hctx for when an elevator is
> enabled, as an optimisation. I also dropped the debugfs changes - more on
> that below.
> 
> Some performance figures:
> 
> Using 12x SAS SSDs on hisi_sas v3 hw. mq-deadline results are included,
> but it is not always an appropriate scheduler to use.
> 
> Tag depth 4000 (default)  260**
> 
> Baseline (v5.9-rc1):
> none sched:   2094K IOPS  513K
> mq-deadline sched:2145K IOPS  1336K
> 
> Final, host_tagset=0 in LLDD *, ***:
> none sched:   2120K IOPS  550K
> mq-deadline sched:2121K IOPS  1309K
> 
> Final ***:
> none sched:   2132K IOPS  1185
> mq-deadline sched:2145K IOPS  2097
> 
> * this is relevant as this is the performance in supporting but not
>   enabling the feature
> ** depth=260 is relevant as some point where we are regularly waiting for
>tags to be available. Figures were are a bit unstable here.
> *** Included "[PATCH V4] scsi: core: only re-run queue in
> scsi_end_request() if device queue is busy"
> 
> A copy of the patches can be found here:
> https://github.com/hisilicon/kernel-dev/tree/private-topic-blk-mq-shared-tags-v8
> 
> The hpsa patch depends on:
> https://lore.kernel.org/linux-scsi/20200430131904.5847-1-h...@suse.de/
> 
> And the smartpqi patch is not to be accepted.
> 
> Comments (and testing) welcome, thanks!
> 
> Differences to v7:
> - Add null_blk and scsi_debug support
> - Drop debugfs tags patch - it's too difficult to be the same between
> hostwide and non-hostwide, as discussed:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.ga...@huawei.com/T/#mb3eb462d8be40273718505989abd12f8228c15fd
> And from commit 6bf0eb550452 ("sbitmap: Consider cleared bits in
> sbitmap_bitmap_show()"), I guess not many used this anyway...
> - Add elevator per-hctx request count for optimisation
> - Break up "blk-mq: rename blk_mq_update_tag_set_depth()" into 2x patches
> - Pass flags for avoid per-hq queue tags init/free for hostwide tags
> - Add Don's reviewed-tag and tested-by tags to appropiate patches
>   - (@Don, please let me know if issue with how I did this)
> - Add "scsi: core: Show nr_hw_queues in sysfs"
> - Rework megaraid SAS patch to have module param (Kashyap)
> - rebase
> 
> V7 is here for more info:
> https://lore.kernel.org/linux-scsi/1591810159-240929-1-git-send-email-john.ga...@huawei.com/T/#t
> 
> Hannes Reinecke (5):
>   blk-mq: Rename blk_mq_update_tag_set_depth()
>   blk-mq: Free tags in blk_mq_init_tags() upon error
>   scsi: Add host and host template flag 'host_tagset'
>   hpsa: enable host_tagset and switch to MQ
>   smartpqi: enable host tagset
> 
> John Garry (10):
>   blk-mq: Pass flags for tag init/free
>   blk-mq: Use pointers for blk_mq_tags bitmap tags
>   blk-mq: Facilitate a shared sbitmap per tagset
>   blk-mq: Relocate hctx_may_queue()
>   blk-mq: Record nr_active_requests per queue for when using shared
> sbitmap
>   blk-mq: Record active_queues_shared_sbitmap per tag_set for when using
> shared sbitmap
>   null_blk: Support shared tag bitmap
>   scsi: core: Show nr_hw_queues in sysfs
>   scsi: hisi_sas: Switch v3 hw to MQ
>   scsi: scsi_debug: Support host tagset
> 
> Kashyap Desai (2):
>   blk-mq, elevator: Count requests per hctx to improve performance
>   scsi: megaraid_sas: Added support for shared host tagset for
> cpuhotplug
> 
> Ming Lei (1):
>   blk-mq: Rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED
> 
Now that Jens merged the block bits in his tree, wouldn't it be better
to re-send the SCSI bits only, thereby avoiding a potential merge error
later on?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 19/19] block: switch gendisk lookup to a simple xarray

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Now that bdev_map is only used for finding gendisks, we can use
a simple xarray instead of the regions tracking structure for it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
---
  block/genhd.c | 208 --
  include/linux/genhd.h |   7 --
  2 files changed, 37 insertions(+), 178 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 18/19] z2ram: use separate gendisk for the different modes

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Use separate gendisks (which share a tag_set) for the different operating
modes instead of redirecting the gendisk lookup using a probe callback.
This avoids potential problems with aliased block_device instances and
will eventually allow for removing the blk_register_region framework.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/z2ram.c | 100 --
  1 file changed, 58 insertions(+), 42 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 16/19] ataflop: use a separate gendisk for each media format

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

The Atari floppy driver usually autodetects the media when used with the
ormal /dev/fd? devices, which also are the only nodes created by udev.
But it also supports various aliases that force a given media format.
That is currently supported using the blk_register_region framework
which finds the floppy gendisk even for a 'mismatched' dev_t.  The
problem with this (besides the code complexity) is that it creates
multiple struct block_device instances for the whole device of a
single gendisk, which can lead to interesting issues in code not
aware of that fact.

To fix this just create a separate gendisk for each of the aliases
if they are accessed.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/ataflop.c | 135 +---
  1 file changed, 86 insertions(+), 49 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 17/19] z2ram: reindent

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

reindent the driver using Lident as the code style was far away from
normal Linux code.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/z2ram.c | 493 --
  1 file changed, 236 insertions(+), 257 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 15/19] amiflop: use separate gendisks for Amiga vs MS-DOS mode

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Use separate gendisks (which share a tag_set) for the native Amgiga vs
the MS-DOS mode instead of redirecting the gendisk lookup using a probe
callback.  This avoids potential problems with aliased block_device
instances and will eventually allow for removing the blk_register_region
framework.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/amiflop.c | 98 +++--
  1 file changed, 55 insertions(+), 43 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 14/19] floppy: use a separate gendisk for each media format

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

The floppy driver usually autodetects the media when used with the
normal /dev/fd? devices, which also are the only nodes created by udev.
But it also supports various aliases that force a given media format.
That is currently supported using the blk_register_region framework
which finds the floppy gendisk even for a 'mismatched' dev_t.  The
problem with this (besides the code complexity) is that it creates
multiple struct block_device instances for the whole device of a
single gendisk, which can lead to interesting issues in code not
aware of that fact.

To fix this just create a separate gendisk for each of the aliases
if they are accessed.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/floppy.c | 154 ++---
  1 file changed, 97 insertions(+), 57 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 10/19] brd: use __register_blkdev to allocate devices on demand

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Use the simpler mechanism attached to major_name to allocate a brd device
when a currently unregistered minor is accessed.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/brd.c | 39 +++
  1 file changed, 11 insertions(+), 28 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 13/19] ide: switch to __register_blkdev for command set probing

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

ide is the last user of the blk_register_region framework except for the
tracking of allocated gendisk.  Switch to __register_blkdev, even if that
doesn't allow us to trivially find out which command set to probe for.
That means we now always request all modules when a user tries to access
an unclaimed ide device node, but except for a few potentially loaded
modules for a fringe use case of a deprecated and soon to be removed
driver that doesn't make a difference.

Signed-off-by: Christoph Hellwig 
---
  drivers/ide/ide-probe.c | 34 ++
  1 file changed, 6 insertions(+), 28 deletions(-)


Ceterum censeo ...

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 12/19] md: use __register_blkdev to allocate devices on demand

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Use the simpler mechanism attached to major_name to allocate a brd device


md device? Maybe?


when a currently unregistered minor is accessed.

Signed-off-by: Christoph Hellwig 
Acked-by: Song Liu 
---
  drivers/md/md.c | 21 -
  1 file changed, 8 insertions(+), 13 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 11/19] loop: use __register_blkdev to allocate devices on demand

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Use the simpler mechanism attached to major_name to allocate a brd device
when a currently unregistered minor is accessed.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/loop.c | 30 --
  1 file changed, 8 insertions(+), 22 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 09/19] sd: use __register_blkdev to avoid a modprobe for an unregistered dev_t

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Switch from using blk_register_region to the probe callback passed to
__register_blkdev to disable the request_module call for an unclaimed
dev_t in the SD majors.

Signed-off-by: Christoph Hellwig 
---
  drivers/scsi/sd.c | 19 +--
  1 file changed, 5 insertions(+), 14 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 08/19] swim: don't call blk_register_region

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

The swim driver (unlike various other floppy drivers) doesn't have
magic device nodes for certain modes, and already registers a gendisk
for each of the floppies supported by a device.  Thus the region
registered is a no-op and can be removed.

Signed-off-by: Christoph Hellwig 
---
  drivers/block/swim.c | 17 -
  1 file changed, 17 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 06/19] block: add an optional probe callback to major_names

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Add a callback to the major_names array that allows a driver to override
how to probe for dev_t that doesn't currently have a gendisk registered.
This will help separating the lookup of the gendisk by dev_t vs probe
action for a not currently registered dev_t.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 21 ++---
  include/linux/genhd.h |  5 -
  2 files changed, 22 insertions(+), 4 deletions(-)
Reviewed-by: Hannes Reinecke 


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 05/19] block: rework requesting modules for unclaimed devices

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Instead of reusing the ranges in bdev_map, add a new helper that is
called if no ranges was found.  This is a first step to unpeel and
eventually remove the complex ranges structure.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 25 +++--
  1 file changed, 15 insertions(+), 10 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 04/19] block: split block_class_lock

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Split the block_class_lock mutex into one each to protect bdev_map
and major_names.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 29 +++--
  1 file changed, 15 insertions(+), 14 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 03/19] block: cleanup del_gendisk a bit

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Merge three hidden gendisk checks into one.

Signed-off-by: Christoph Hellwig 
---
  block/genhd.c | 15 ---
  1 file changed, 8 insertions(+), 7 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 02/19] block: merge drivers/base/map.c into block/genhd.c

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

Now that there is just a single user of the kobj_map functionality left,
merge it into the user to prepare for additional simplications.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
---
  block/genhd.c| 130 +
  drivers/base/Makefile|   2 +-
  drivers/base/map.c   | 154 ---
  include/linux/kobj_map.h |  20 -
  4 files changed, 118 insertions(+), 188 deletions(-)
  delete mode 100644 drivers/base/map.c
  delete mode 100644 include/linux/kobj_map.h


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 01/19] char_dev: replace cdev_map with an xarray

2020-09-04 Thread Hannes Reinecke

On 9/3/20 10:01 AM, Christoph Hellwig wrote:

None of the complicated overlapping regions bits of the kobj_map are
required for the character device lookup, so just a trivial xarray
instead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Greg Kroah-Hartman 
---
  fs/char_dev.c | 94 +--
  fs/dcache.c   |  1 -
  fs/internal.h |  5 ---
  3 files changed, 47 insertions(+), 53 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: rework check_disk_change()

2020-09-04 Thread Hannes Reinecke

On 9/2/20 5:38 PM, Douglas Gilbert wrote:

On 2020-09-02 10:11 a.m., Christoph Hellwig wrote:

Hi Jens,

this series replaced the not very nice check_disk_change() function with
a new bdev_media_changed that avoids having the ->revalidate_disk call
at its end.  As a result ->revalidate_disk can be removed from a lot of
drivers.



For over 20 years the sg driver has been carrying this snippet that hangs
off the completion callback:

    if (driver_stat & DRIVER_SENSE) {
     struct scsi_sense_hdr ssh;

     if (scsi_normalize_sense(sbp, sense_len, )) {
     if (!scsi_sense_is_deferred()) {
     if (ssh.sense_key == UNIT_ATTENTION) {
     if (sdp->device->removable)
     sdp->device->changed = 1;
     }
     }
     }
     }

Is it needed? The unit attention (UA) may not be associated with the
device changing. Shouldn't the SCSI mid-level monitor UAs if they
impact the state of a scsi_device object?


We do; check scsi_io_completion_action() in drivers/scsi/scsi_lib.c
So I don't think you'd need to keep it in sg.c.

Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH 07/19] ide: remove ide_{,un}register_region

2020-08-27 Thread Hannes Reinecke
On 8/26/20 8:24 AM, Christoph Hellwig wrote:
> There is no need to ever register the fake gendisk used for ide-tape.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/ide/ide-probe.c | 32 
>  drivers/ide/ide-tape.c  |  2 --
>  include/linux/ide.h |  3 ---
>  3 files changed, 37 deletions(-)
> 
IDE-tape. Shudder.
Anything to get rid of them.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 01/19] char_dev: replace cdev_map with an xarray

2020-08-27 Thread Hannes Reinecke
On 8/26/20 8:24 AM, Christoph Hellwig wrote:
> None of the complicated overlapping regions bits of the kobj_map are
> required for the character device lookup, so just a trivial xarray
> instead.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  fs/char_dev.c | 94 +--
>  fs/dcache.c   |  1 -
>  fs/internal.h |  5 ---
>  3 files changed, 46 insertions(+), 54 deletions(-)
> 
> diff --git a/fs/char_dev.c b/fs/char_dev.c
> index ba0ded7842a779..6c4d6c4938f14b 100644
> --- a/fs/char_dev.c
> +++ b/fs/char_dev.c
> @@ -17,7 +17,6 @@
>  #include 
>  
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -25,8 +24,7 @@
>  
>  #include "internal.h"
>  
> -static struct kobj_map *cdev_map;
> -
> +static DEFINE_XARRAY(cdev_map);
>  static DEFINE_MUTEX(chrdevs_lock);
>  
>  #define CHRDEV_MAJOR_HASH_SIZE 255
> @@ -367,6 +365,29 @@ void cdev_put(struct cdev *p)
>   }
>  }
>  
> +static struct cdev *cdev_lookup(dev_t dev)
> +{
> + struct cdev *cdev;
> +
> +retry:
> + mutex_lock(_lock);
> + cdev = xa_load(_map, dev);
> + if (!cdev) {
> + mutex_unlock(_lock);
> +
> + if (request_module("char-major-%d-%d",
> +MAJOR(dev), MINOR(dev)) > 0)
> + /* Make old-style 2.4 aliases work */
> + request_module("char-major-%d", MAJOR(dev));
> + goto retry;
> + }
> +
> + if (!cdev_get(cdev))
> + cdev = NULL;
> + mutex_unlock(_lock);
> + return cdev;
> +}
> +
>  /*
>   * Called every time a character special file is opened
>   */
> @@ -380,13 +401,10 @@ static int chrdev_open(struct inode *inode, struct file 
> *filp)
>   spin_lock(_lock);
>   p = inode->i_cdev;
>   if (!p) {
> - struct kobject *kobj;
> - int idx;
>   spin_unlock(_lock);
> - kobj = kobj_lookup(cdev_map, inode->i_rdev, );
> - if (!kobj)
> + new = cdev_lookup(inode->i_rdev);
> + if (!new)
>   return -ENXIO;
> - new = container_of(kobj, struct cdev, kobj);
>   spin_lock(_lock);
>   /* Check i_cdev again in case somebody beat us to it while
>  we dropped the lock. */
> @@ -454,18 +472,6 @@ const struct file_operations def_chr_fops = {
>   .llseek = noop_llseek,
>  };
>  
> -static struct kobject *exact_match(dev_t dev, int *part, void *data)
> -{
> - struct cdev *p = data;
> - return >kobj;
> -}
> -
> -static int exact_lock(dev_t dev, void *data)
> -{
> - struct cdev *p = data;
> - return cdev_get(p) ? 0 : -1;
> -}
> -
>  /**
>   * cdev_add() - add a char device to the system
>   * @p: the cdev structure for the device
> @@ -478,7 +484,7 @@ static int exact_lock(dev_t dev, void *data)
>   */
>  int cdev_add(struct cdev *p, dev_t dev, unsigned count)
>  {
> - int error;
> + int error, i;
>  
>   p->dev = dev;
>   p->count = count;
> @@ -486,14 +492,22 @@ int cdev_add(struct cdev *p, dev_t dev, unsigned count)
>   if (WARN_ON(dev == WHITEOUT_DEV))
>   return -EBUSY;
>  
> - error = kobj_map(cdev_map, dev, count, NULL,
> -  exact_match, exact_lock, p);
> - if (error)
> - return error;
> + mutex_lock(_lock);
> + for (i = 0; i < count; i++) {
> + error = xa_insert(_map, dev + i, p, GFP_KERNEL);
> + if (error)
> +     goto out_unwind;
> + }
> + mutex_unlock(_lock);
>  
>   kobject_get(p->kobj.parent);
> -
>   return 0;
> +
> +out_unwind:
> + while (--i >= 0)
> + xa_erase(_map, dev + i);
> + mutex_unlock(_lock);
> + return error;
>  }
>  
>  /**
Do you really need the mutex?
Wouldn't xa_store_range() be better and avoid the mutex?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 3/3] nvme: don't call revalidate_disk from nvme_set_queue_dying

2020-08-24 Thread Hannes Reinecke
On 8/23/20 11:10 AM, Christoph Hellwig wrote:
> In nvme_set_queue_dying we really just want to ensure the disk and bdev
> sizes are set to zero.  Going through revalidate_disk leads to a somewhat
> arcance and complex callchain relying on special behavior in a few
> places.  Instead just lift the set_capacity directly to
> nvme_set_queue_dying, and rename and move the nvme_mpath_update_disk_size
> helper so that we can use it in nvme_set_queue_dying to propagate the
> size to the bdev without detours.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/nvme/host/core.c | 33 +++--
>  drivers/nvme/host/nvme.h | 13 -
>  2 files changed, 23 insertions(+), 23 deletions(-)
> 
YES!
I've been bitten by this far too often.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 2/3] block: fix locking for struct block_device size updates

2020-08-24 Thread Hannes Reinecke
On 8/23/20 11:10 AM, Christoph Hellwig wrote:
> Two different callers use two different mutexes for updating the
> block device size, which obviously doesn't help to actually protect
> against concurrent updates from the different callers.  In addition
> one of the locks, bd_mutex is rather prone to deadlocks with other
> parts of the block stack that use it for high level synchronization.
> 
> Switch to using a new spinlock protecting just the size updates, as
> that is all we need, and make sure everyone does the update through
> the proper helper.
> 
> This fixes a bug reported with the nvme revalidating disks during a
> hot removal operation, which can currently deadlock on bd_mutex.
> 
> Reported-by: Xianting Tian 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/partitions/core.c |  4 ++--
>  drivers/block/aoe/aoecmd.c  |  4 +---
>  drivers/md/dm.c | 15 ++-
>  drivers/s390/block/dasd_ioctl.c |  9 ++---
>  fs/block_dev.c  | 25 ++---
>  include/linux/blk_types.h   |  1 +
>  6 files changed, 22 insertions(+), 36 deletions(-)
> Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 2/2] block: fix locking for struct block_device size updates

2020-08-21 Thread Hannes Reinecke
On 8/21/20 10:56 AM, Christoph Hellwig wrote:
> Two different callers use two different mutexes for updating the
> block device size, which obviously doesn't help to actually protect
> against concurrent updates from the different callers.  In addition
> one of the locks, bd_mutex is rather prone to deadlocks with other
> parts of the block stack that use it for high level synchronization.
> 
> Switch to using a new spinlock protecting just the size updates, as
> that is all we need, and make sure everyone does the update through
> the proper helper.
> 
> This fixeѕ a bug reported with the nvme revalidating disks during a
> hot removal operation.
> 
> Reported-by: Xianting Tian 
> Signed-off-by: Christoph Hellwig 
> ---
>  block/partitions/core.c |  4 ++--
>  drivers/block/aoe/aoecmd.c  |  4 +---
>  drivers/md/dm.c | 15 ++-
>  drivers/s390/block/dasd_ioctl.c |  9 ++---
>  fs/block_dev.c  | 18 +-
>  include/linux/blk_types.h   |  1 +
>  6 files changed, 17 insertions(+), 34 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH 1/2] block: replace bd_set_size with bd_set_nr_sectors

2020-08-21 Thread Hannes Reinecke
On 8/21/20 10:55 AM, Christoph Hellwig wrote:
> Replace bd_set_size with a version that takes the number of sectors
> instead, as that fits most of the current and future callers much better.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/loop.c |  4 ++--
>  drivers/block/nbd.c  |  7 ---
>  drivers/block/pktcdvd.c  |  2 +-
>  drivers/nvme/host/nvme.h |  2 +-
>  fs/block_dev.c   | 10 +-
>  include/linux/genhd.h|  2 +-
>  6 files changed, 14 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v3] block: check queue's limits.discard_granularity in __blkdev_issue_discard()

2020-08-05 Thread Hannes Reinecke

On 8/5/20 8:31 AM, Coly Li wrote:

If create a loop device with a backing NVMe SSD, current loop device
driver doesn't correctly set its  queue's limits.discard_granularity and
leaves it as 0. If a discard request at LBA 0 on this loop device, in
__blkdev_issue_discard() the calculated req_sects will be 0, and a zero
length discard request will trigger a BUG() panic in generic block layer
code at block/blk-mq.c:563.

[  955.565006][   C39] [ cut here ]
[  955.559660][   C39] invalid opcode:  [#1] SMP NOPTI
[  955.622171][   C39] CPU: 39 PID: 248 Comm: ksoftirqd/39 Tainted: G   
 E 5.8.0-default+ #40
[  955.622171][   C39] Hardware name: Lenovo ThinkSystem SR650 
-[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE160M-2.70]- 07/17/2020
[  955.622175][   C39] RIP: 0010:blk_mq_end_request+0x107/0x110
[  955.622177][   C39] Code: 48 8b 03 e9 59 ff ff ff 48 89 df 5b 5d 41 5c e9 9f ed ff 
ff 48 8b 35 98 3c f4 00 48 83 c7 10 48 83 c6 19 e8 cb 56 c9 ff eb cb <0f> 0b 0f 
1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 54
[  955.622179][   C39] RSP: 0018:b1288701fe28 EFLAGS: 00010202
[  955.749277][   C39] RAX: 0001 RBX: 956fffba5080 RCX: 
4003
[  955.749278][   C39] RDX: 0003 RSI:  RDI: 

[  955.749279][   C39] RBP:  R08:  R09: 

[  955.749279][   C39] R10: b1288701fd28 R11: 0001 R12: 
a8e05160
[  955.749280][   C39] R13: 0004 R14: 0004 R15: 
a7ad3a1e
[  955.749281][   C39] FS:  () GS:95bfbda0() 
knlGS:
[  955.749282][   C39] CS:  0010 DS:  ES:  CR0: 80050033
[  955.749282][   C39] CR2: 7f6f0ef766a8 CR3: 005a37012002 CR4: 
007606e0
[  955.749283][   C39] DR0:  DR1:  DR2: 

[  955.749284][   C39] DR3:  DR6: fffe0ff0 DR7: 
0400
[  955.749284][   C39] PKRU: 5554
[  955.749285][   C39] Call Trace:
[  955.749290][   C39]  blk_done_softirq+0x99/0xc0
[  957.550669][   C39]  __do_softirq+0xd3/0x45f
[  957.550677][   C39]  ? smpboot_thread_fn+0x2f/0x1e0
[  957.550679][   C39]  ? smpboot_thread_fn+0x74/0x1e0
[  957.550680][   C39]  ? smpboot_thread_fn+0x14e/0x1e0
[  957.550684][   C39]  run_ksoftirqd+0x30/0x60
[  957.550687][   C39]  smpboot_thread_fn+0x149/0x1e0
[  957.886225][   C39]  ? sort_range+0x20/0x20
[  957.886226][   C39]  kthread+0x137/0x160
[  957.886228][   C39]  ? kthread_park+0x90/0x90
[  957.886231][   C39]  ret_from_fork+0x22/0x30
[  959.117120][   C39] ---[ end trace 3dacdac97e2ed164 ]---

This is the procedure to reproduce the panic,
   # modprobe scsi_debug delay=0 dev_size_mb=2048 max_queue=1
   # losetup -f /dev/nvme0n1 --direct-io=on
   # blkdiscard /dev/loop0 -o 0 -l 0x200

This patch fixes the issue by checking q->limits.discard_granularity in
__blkdev_issue_discard() before composing the discard bio. If the value
is 0, then prints a warning oops information and returns -EOPNOTSUPP to
the caller to indicate that this buggy device driver doesn't support
discard request.

Fixes: 9b15d109a6b2 ("block: improve discard bio alignment in 
__blkdev_issue_discard()")
Fixes: c52abf563049 ("loop: Better discard support for block devices")
Reported-and-suggested-by: Ming Lei 
Signed-off-by: Coly Li 
Reviewed-by: Ming Lei 
Cc: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Enzo Matsumiya 
Cc: Evan Green 
Cc: Hannes Reinecke 
Cc: Jens Axboe 
Cc: Martin K. Petersen 
Cc: Xiao Ni 
---
Changelog:
v3: print device name assocated with the buggy driver.
v2: fix typo of the wrong return error code.
v1: first version.

  block/blk-lib.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 019e09bb9c0e..d3bbb3d9fac3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -47,6 +47,15 @@ int __blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
op = REQ_OP_DISCARD;
}
  
+	/* In case the discard granularity isn't set by buggy device driver */

+   if (WARN_ON_ONCE(!q->limits.discard_granularity)) {
+   char dev_name[BDEVNAME_SIZE];
+
+   bdevname(bdev, dev_name);
+   pr_err("%s: Error: discard_granularity is 0.\n", dev_name);
+   return -EOPNOTSUPP;
+   }
+
bs_mask = (bdev_logical_block_size(bdev) >> 9) - 1;
if ((sector | nr_sects) & bs_mask)
    return -EINVAL;


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH] block: Use non _rcu version of list functions for tag_set_list

2020-07-28 Thread Hannes Reinecke

On 7/28/20 3:29 PM, Daniel Wagner wrote:

tag_set_list is only accessed under the tag_set_lock lock. There is
no need for using the _rcu list functions.

The _rcu list function were introduced to allow read access to the
tag_set_list protected under RCU, see 705cda97ee3a ("blk-mq: Make it
safe to use RCU to iterate over blk_mq_tag_set.tag_list") and
05b79413946d ("Revert "blk-mq: don't handle TAG_SHARED in restart"").
Those changes got reverted later but the cleanup commit missed a
couple of places to undo the changes.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from 
blk_mq_del_queue_tag_set()"
Cc: Ming Lei 
Signed-off-by: Daniel Wagner 
---
  block/blk-mq.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e32cb0217135..14ee7506f32f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2792,7 +2792,7 @@ static void blk_mq_del_queue_tag_set(struct request_queue 
*q)
struct blk_mq_tag_set *set = q->tag_set;
  
  	mutex_lock(>tag_list_lock);

-   list_del_rcu(>tag_set_list);
+   list_del(>tag_set_list);
if (list_is_singular(>tag_list)) {
/* just transitioned to unshared */
set->flags &= ~BLK_MQ_F_TAG_SHARED;
@@ -2819,7 +2819,7 @@ static void blk_mq_add_queue_tag_set(struct 
blk_mq_tag_set *set,
}
if (set->flags & BLK_MQ_F_TAG_SHARED)
queue_set_hctx_shared(q, true);
-   list_add_tail_rcu(>tag_set_list, >tag_list);
+   list_add_tail(>tag_set_list, >tag_list);
  
  	mutex_unlock(>tag_list_lock);

  }


Indeed.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 24/24] scsi: aic7xxx: aic79xx_osm: Remove set but unused variabes 'saved_scsiid' and 'saved_modes'

2020-07-21 Thread Hannes Reinecke

On 7/15/20 1:03 AM, James Bottomley wrote:

On Tue, 2020-07-14 at 22:39 +0100, Lee Jones wrote:

On Tue, 14 Jul 2020, James Bottomley wrote:


On Tue, 2020-07-14 at 09:46 +0200, Hannes Reinecke wrote:

On 7/13/20 10:00 AM, Lee Jones wrote:

Haven't been used since 2006.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/aic7xxx/aic79xx_osm.c: In function
‘ahd_linux_queue_abort_cmd’:
  drivers/scsi/aic7xxx/aic79xx_osm.c:2155:17: warning: variable
‘saved_modes’ set but not used [-Wunused-but-set-variable]
  drivers/scsi/aic7xxx/aic79xx_osm.c:2148:9: warning: variable
‘saved_scsiid’ set but not used [-Wunused-but-set-variable]

Cc: Hannes Reinecke 
Signed-off-by: Lee Jones 
---
  drivers/scsi/aic7xxx/aic79xx_osm.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c
b/drivers/scsi/aic7xxx/aic79xx_osm.c
index 3782a20d58885..140c4e74ddd7e 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -2141,14 +2141,12 @@ ahd_linux_queue_abort_cmd(struct
scsi_cmnd
*cmd)
u_int  saved_scbptr;
u_int  active_scbptr;
u_int  last_phase;
-   u_int  saved_scsiid;
u_int  cdb_byte;
intretval;
intwas_paused;
intpaused;
intwait;
intdisconnected;
-   ahd_mode_state saved_modes;
unsigned long flags;
  
  	pending_scb = NULL;

@@ -2239,7 +2237,6 @@ ahd_linux_queue_abort_cmd(struct
scsi_cmnd
*cmd)
goto done;
}
  
-	saved_modes = ahd_save_modes(ahd);

ahd_set_modes(ahd, AHD_MODE_SCSI, AHD_MODE_SCSI);
last_phase = ahd_inb(ahd, LASTPHASE);
saved_scbptr = ahd_get_scbptr(ahd);
@@ -2257,7 +2254,6 @@ ahd_linux_queue_abort_cmd(struct
scsi_cmnd
*cmd)
 * passed in command.  That command is currently
active on
the
 * bus or is in the disconnected state.
 */
-   saved_scsiid = ahd_inb(ahd, SAVED_SCSIID);
if (last_phase != P_BUSFREE
&& SCB_GET_TAG(pending_scb) == active_scbptr) {
  



Reviewed-by: Hannes Reinecke 


Hey, you don't get to do that ... I asked you to figure out why
we're missing an ahd_restore_modes().  Removing the
ahd_save_modes() is cosmetic: it gets rid of a warning but doesn't
fix the problem.  I'd rather keep the warning until the problem is
fixed and the problem is we need a mode save/restore around the
ahd_set_modes() which is only partially implemented in this
function.


I had a look.  Traced it back to the dawn of time (time == Git), then
delved even further back by downloading and trawling through ~10-15
tarballs.  It looks as though drivers/scsi/aic7xxx/aic79xx_osm.c was
upstreamed in v2.5.60, nearly 20 years ago.  'saved_modes' has been
unused since at least then.  If no one has complained in 2 decades,
I'd say it probably isn't an issue worth perusing.


Well, we have what looks like a fix now.  The reason it matters is that
  if the bus is not in AHD_MODE_SCSI when the routine is called, it ends
up being in a wrong state when the routine exits, so you abort one
command and screw up another.  Ultimately I bet escalation to bus reset
fixes it, so everything appears to work, but it might have worked a lot
better if the original mode were restored.

This is error handling code, so most installations run just fine
without ever invoking it.

And since you mention it, it might also explain why every time this 
routine got invoked it failed to fixup anything and got escalated to bus 
reset.
(And yes, I _did_ have some customer issues with aic79xx EH escalations 
over the years).


Testing will be tricky, though, as you'd have to inject timeouts onto 
the parallel bus or device.


Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 24/24] scsi: aic7xxx: aic79xx_osm: Remove set but unused variabes 'saved_scsiid' and 'saved_modes'

2020-07-14 Thread Hannes Reinecke

On 7/14/20 11:39 PM, Lee Jones wrote:

On Tue, 14 Jul 2020, James Bottomley wrote:


On Tue, 2020-07-14 at 09:46 +0200, Hannes Reinecke wrote:

On 7/13/20 10:00 AM, Lee Jones wrote:

Haven't been used since 2006.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/aic7xxx/aic79xx_osm.c: In function
‘ahd_linux_queue_abort_cmd’:
  drivers/scsi/aic7xxx/aic79xx_osm.c:2155:17: warning: variable
‘saved_modes’ set but not used [-Wunused-but-set-variable]
  drivers/scsi/aic7xxx/aic79xx_osm.c:2148:9: warning: variable
‘saved_scsiid’ set but not used [-Wunused-but-set-variable]

Cc: Hannes Reinecke 
Signed-off-by: Lee Jones 
---
  drivers/scsi/aic7xxx/aic79xx_osm.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c
b/drivers/scsi/aic7xxx/aic79xx_osm.c
index 3782a20d58885..140c4e74ddd7e 100644
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -2141,14 +2141,12 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd
*cmd)
u_int  saved_scbptr;
u_int  active_scbptr;
u_int  last_phase;
-   u_int  saved_scsiid;
u_int  cdb_byte;
intretval;
intwas_paused;
intpaused;
intwait;
intdisconnected;
-   ahd_mode_state saved_modes;
unsigned long flags;
  
  	pending_scb = NULL;

@@ -2239,7 +2237,6 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd
*cmd)
goto done;
}
  
-	saved_modes = ahd_save_modes(ahd);

ahd_set_modes(ahd, AHD_MODE_SCSI, AHD_MODE_SCSI);
last_phase = ahd_inb(ahd, LASTPHASE);
saved_scbptr = ahd_get_scbptr(ahd);
@@ -2257,7 +2254,6 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd
*cmd)
 * passed in command.  That command is currently active on
the
 * bus or is in the disconnected state.
 */
-   saved_scsiid = ahd_inb(ahd, SAVED_SCSIID);
if (last_phase != P_BUSFREE
&& SCB_GET_TAG(pending_scb) == active_scbptr) {
  



Reviewed-by: Hannes Reinecke 


Hey, you don't get to do that ... I asked you to figure out why we're
missing an ahd_restore_modes().  Removing the ahd_save_modes() is
cosmetic: it gets rid of a warning but doesn't fix the problem.  I'd
rather keep the warning until the problem is fixed and the problem is
we need a mode save/restore around the ahd_set_modes() which is only
partially implemented in this function.


I had a look.  Traced it back to the dawn of time (time == Git), then
delved even further back by downloading and trawling through ~10-15
tarballs.  It looks as though drivers/scsi/aic7xxx/aic79xx_osm.c was
upstreamed in v2.5.60, nearly 20 years ago.  'saved_modes' has been
unused since at least then.  If no one has complained in 2 decades,
I'd say it probably isn't an issue worth perusing.

That's not really the point; this function is the first stage of error 
recovery. And the only real way of exercising this is to inject a 
command timeout, which is nearly impossible without dedicated hardware.
So this function will have a very limited exposure, but nevertheless a 
quite crucial function.
Hence I'm not quite agree with your reasoning, and rather would have it 
fixed.
But as we're having an alternative fix now, it might be best if you 
could drop it from your patchset and we'll fix it separately.


Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2.1 05/29] scsi: fcoe: fcoe_ctlr: Fix a myriad of documentation issues

2020-07-14 Thread Hannes Reinecke

On 7/14/20 5:07 PM, Lee Jones wrote:

Mostly missing or incorrect (bitrotted) function parameters.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/fcoe/fcoe_ctlr.c:139: warning: Function parameter or member 
'mode' not described in 'fcoe_ctlr_init'
  drivers/scsi/fcoe/fcoe_ctlr.c:604: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_encaps'
  drivers/scsi/fcoe/fcoe_ctlr.c:1312: warning: Function parameter or member 
'skb' not described in 'fcoe_ctlr_recv_clr_vlink'
  drivers/scsi/fcoe/fcoe_ctlr.c:1312: warning: Excess function parameter 'fh' 
description in 'fcoe_ctlr_recv_clr_vlink'
  drivers/scsi/fcoe/fcoe_ctlr.c:1781: warning: Function parameter or member 't' 
not described in 'fcoe_ctlr_timeout'
  drivers/scsi/fcoe/fcoe_ctlr.c:1781: warning: Excess function parameter 'arg' 
description in 'fcoe_ctlr_timeout'
  drivers/scsi/fcoe/fcoe_ctlr.c:1904: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_recv_flogi'
  drivers/scsi/fcoe/fcoe_ctlr.c:2166: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_disc_stop_locked'
  drivers/scsi/fcoe/fcoe_ctlr.c:2166: warning: Excess function parameter 'fip' 
description in 'fcoe_ctlr_disc_stop_locked'
  drivers/scsi/fcoe/fcoe_ctlr.c:2188: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_disc_stop'
  drivers/scsi/fcoe/fcoe_ctlr.c:2188: warning: Excess function parameter 'fip' 
description in 'fcoe_ctlr_disc_stop'
  drivers/scsi/fcoe/fcoe_ctlr.c:2204: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_disc_stop_final'
  drivers/scsi/fcoe/fcoe_ctlr.c:2204: warning: Excess function parameter 'fip' 
description in 'fcoe_ctlr_disc_stop_final'
  drivers/scsi/fcoe/fcoe_ctlr.c:2273: warning: Function parameter or member 
'frport' not described in 'fcoe_ctlr_vn_parse'
  drivers/scsi/fcoe/fcoe_ctlr.c:2273: warning: Excess function parameter 
'rdata' description in 'fcoe_ctlr_vn_parse'
  drivers/scsi/fcoe/fcoe_ctlr.c:2804: warning: Function parameter or member 
'frport' not described in 'fcoe_ctlr_vlan_parse'
  drivers/scsi/fcoe/fcoe_ctlr.c:2804: warning: Excess function parameter 
'rdata' description in 'fcoe_ctlr_vlan_parse'
  drivers/scsi/fcoe/fcoe_ctlr.c:2900: warning: Excess function parameter 
'min_len' description in 'fcoe_ctlr_vlan_send'
  drivers/scsi/fcoe/fcoe_ctlr.c:2977: warning: Function parameter or member 
'fip' not described in 'fcoe_ctlr_vlan_recv'
  drivers/scsi/fcoe/fcoe_ctlr.c:2977: warning: Function parameter or member 
'skb' not described in 'fcoe_ctlr_vlan_recv'
  drivers/scsi/fcoe/fcoe_ctlr.c:2977: warning: Excess function parameter 
'lport' description in 'fcoe_ctlr_vlan_recv'
  drivers/scsi/fcoe/fcoe_ctlr.c:2977: warning: Excess function parameter 'fp' 
description in 'fcoe_ctlr_vlan_recv'
  drivers/scsi/fcoe/fcoe_ctlr.c:3033: warning: Function parameter or member 
'callback' not described in 'fcoe_ctlr_disc_start'
  drivers/scsi/fcoe/fcoe_ctlr.c:3033: warning: Function parameter or member 
'lport' not described in 'fcoe_ctlr_disc_start'
  drivers/scsi/fcoe/fcoe_ctlr.c:3033: warning: Excess function parameter 'fip' 
description in 'fcoe_ctlr_disc_start'

Cc: Hannes Reinecke 
Signed-off-by: Lee Jones 
---
Changelog:

v2
  - Rename title s/fcoe_ctlr_disc_recv/fcoe_ctlr_disc_start/ while we're at it

  drivers/scsi/fcoe/fcoe_ctlr.c | 28 ++--
  1 file changed, 14 insertions(+), 14 deletions(-)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 04/29] scsi: fcoe: fcoe: Fix various kernel-doc infringements

2020-07-14 Thread Hannes Reinecke
On 7/14/20 9:58 AM, Lee Jones wrote:
> On Tue, 14 Jul 2020, Hannes Reinecke wrote:
> 
>> On 7/13/20 9:46 AM, Lee Jones wrote:
>>> A couple of headers make no attempt to document their associated function
>>> parameters.  Others looks as if they are suffering with a little bitrot.
>>>
>>> Fixes the following W=1 kernel build warning(s):
>>>
>>>   drivers/scsi/fcoe/fcoe.c:654: warning: Function parameter or member 
>>> 'lport' not described in 'fcoe_netdev_features_change'
>>>   drivers/scsi/fcoe/fcoe.c:654: warning: Function parameter or member 
>>> 'netdev' not described in 'fcoe_netdev_features_change'
>>>   drivers/scsi/fcoe/fcoe.c:2039: warning: Function parameter or member 
>>> 'ctlr_dev' not described in 'fcoe_ctlr_mode'
>>>   drivers/scsi/fcoe/fcoe.c:2039: warning: Excess function parameter 'cdev' 
>>> description in 'fcoe_ctlr_mode'
>>>   drivers/scsi/fcoe/fcoe.c:2144: warning: Function parameter or member 
>>> 'fcoe' not described in 'fcoe_dcb_create'
>>>   drivers/scsi/fcoe/fcoe.c:2144: warning: Excess function parameter 
>>> 'netdev' description in 'fcoe_dcb_create'
>>>   drivers/scsi/fcoe/fcoe.c:2144: warning: Excess function parameter 'port' 
>>> description in 'fcoe_dcb_create'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 
>>> 'lport' not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 
>>> 'did' not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 'fp' 
>>> not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 'op' 
>>> not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 
>>> 'resp' not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 
>>> 'arg' not described in 'fcoe_elsct_send'
>>>   drivers/scsi/fcoe/fcoe.c:2627: warning: Function parameter or member 
>>> 'timeout' not described in 'fcoe_elsct_send'
>>>
>>> Cc: Hannes Reinecke 
>>> Signed-off-by: Lee Jones 
>>> ---
>>>   drivers/scsi/fcoe/fcoe.c | 10 --
>>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
>>> index cb41d166e0c0f..0f9274960dc6b 100644
>>> --- a/drivers/scsi/fcoe/fcoe.c
>>> +++ b/drivers/scsi/fcoe/fcoe.c
>>> @@ -645,7 +645,7 @@ static int fcoe_lport_config(struct fc_lport *lport)
>>> return 0;
>>>   }
>>> -/**
>>> +/*
>>>* fcoe_netdev_features_change - Updates the lport's offload flags based
>>>* on the LLD netdev's FCoE feature flags
>>>*/
>>> @@ -2029,7 +2029,7 @@ static int fcoe_ctlr_enabled(struct fcoe_ctlr_device 
>>> *cdev)
>>>   /**
>>>* fcoe_ctlr_mode() - Switch FIP mode
>>> - * @cdev: The FCoE Controller that is being modified
>>> + * @ctlr_dev: The FCoE Controller that is being modified
>>>*
>>>* When the FIP mode has been changed we need to update
>>>* the multicast addresses to ensure we get the correct
>>> @@ -2136,9 +2136,7 @@ static bool fcoe_match(struct net_device *netdev)
>>>   /**
>>>* fcoe_dcb_create() - Initialize DCB attributes and hooks
>>> - * @netdev: The net_device object of the L2 link that should be queried
>>> - * @port: The fcoe_port to bind FCoE APP priority with
>>> - * @
>>> + * @fcoe:   The new FCoE interface
>>>*/
>>>   static void fcoe_dcb_create(struct fcoe_interface *fcoe)
>>>   {
>>> @@ -2609,7 +2607,7 @@ static void fcoe_logo_resp(struct fc_seq *seq, struct 
>>> fc_frame *fp, void *arg)
>>> fc_lport_logo_resp(seq, fp, lport);
>>>   }
>>> -/**
>>> +/*
>>>* fcoe_elsct_send - FCoE specific ELS handler
>>>*
>>>* This does special case handling of FIP encapsualted ELS exchanges for 
>>> FCoE,
>>>
>> I'd rather convert this and the fcoe_netdev_features_change to proper
>> kerneldocs:
>>
>> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
>> index cb41d166e0c0..151fe4c53b07 100644
>> --- a/drivers/scsi/fcoe/fcoe.c
>> +++ b/drivers/scsi/fcoe/fcoe.c
>> @@ -646,8 +646,12 @@ static int fcoe_lport_config(struct fc_lport *lport)
>>  }
>>
>

Re: [PATCH v2 21/24] scsi: aic7xxx: aic79xx_osm: Remove unused variable 'ahd'

2020-07-14 Thread Hannes Reinecke
On 7/13/20 9:59 AM, Lee Jones wrote:
> Hasn't been used since 2005.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/scsi/aic7xxx/aic79xx_osm.c: In function ‘ahd_linux_slave_configure’:
>  drivers/scsi/aic7xxx/aic79xx_osm.c:703:20: warning: variable ‘ahd’ set but 
> not used [-Wunused-but-set-variable]
> 
> Cc: Hannes Reinecke 
> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/aic7xxx/aic79xx_osm.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
> b/drivers/scsi/aic7xxx/aic79xx_osm.c
> index dc4fe334efd01..9235b6283c0b3 100644
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
> @@ -700,9 +700,6 @@ ahd_linux_slave_alloc(struct scsi_device *sdev)
>  static int
>  ahd_linux_slave_configure(struct scsi_device *sdev)
>  {
> - struct  ahd_softc *ahd;
> -
> - ahd = *((struct ahd_softc **)sdev->host->hostdata);
>   if (bootverbose)
>   sdev_printk(KERN_INFO, sdev, "Slave Configure\n");
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 22/24] scsi: aic7xxx: aic79xx_osm: Remove unused variables 'wait' and 'paused'

2020-07-14 Thread Hannes Reinecke
On 7/13/20 9:59 AM, Lee Jones wrote:
> It looks like they have never actually been used.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/scsi/aic7xxx/aic79xx_osm.c: In function ‘ahd_linux_dev_reset’:
>  drivers/scsi/aic7xxx/aic79xx_osm.c:782:9: warning: variable ‘wait’ set but 
> not used [-Wunused-but-set-variable]
>  drivers/scsi/aic7xxx/aic79xx_osm.c:781:9: warning: variable ‘paused’ set but 
> not used [-Wunused-but-set-variable]
> 
> Cc: Hannes Reinecke 
> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/aic7xxx/aic79xx_osm.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
> b/drivers/scsi/aic7xxx/aic79xx_osm.c
> index 9235b6283c0b3..8e43ff86e0a60 100644
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
> @@ -775,16 +775,13 @@ ahd_linux_dev_reset(struct scsi_cmnd *cmd)
>   struct scb *reset_scb;
>   u_int  cdb_byte;
>   intretval = SUCCESS;
> - intpaused;
> - intwait;
>   struct  ahd_initiator_tinfo *tinfo;
>   struct  ahd_tmode_tstate *tstate;
>   unsigned long flags;
>   DECLARE_COMPLETION_ONSTACK(done);
>  
>   reset_scb = NULL;
> - paused = FALSE;
> - wait = FALSE;
> +
>   ahd = *(struct ahd_softc **)cmd->device->host->hostdata;
>  
>   scmd_printk(KERN_INFO, cmd,
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 24/24] scsi: aic7xxx: aic79xx_osm: Remove set but unused variabes 'saved_scsiid' and 'saved_modes'

2020-07-14 Thread Hannes Reinecke
On 7/13/20 10:00 AM, Lee Jones wrote:
> Haven't been used since 2006.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/scsi/aic7xxx/aic79xx_osm.c: In function ‘ahd_linux_queue_abort_cmd’:
>  drivers/scsi/aic7xxx/aic79xx_osm.c:2155:17: warning: variable ‘saved_modes’ 
> set but not used [-Wunused-but-set-variable]
>  drivers/scsi/aic7xxx/aic79xx_osm.c:2148:9: warning: variable ‘saved_scsiid’ 
> set but not used [-Wunused-but-set-variable]
> 
> Cc: Hannes Reinecke 
> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/aic7xxx/aic79xx_osm.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
> b/drivers/scsi/aic7xxx/aic79xx_osm.c
> index 3782a20d58885..140c4e74ddd7e 100644
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
> @@ -2141,14 +2141,12 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd *cmd)
>   u_int  saved_scbptr;
>   u_int  active_scbptr;
>   u_int  last_phase;
> - u_int  saved_scsiid;
>   u_int  cdb_byte;
>   intretval;
>   intwas_paused;
>   intpaused;
>   intwait;
>   intdisconnected;
> - ahd_mode_state saved_modes;
>   unsigned long flags;
>  
>   pending_scb = NULL;
> @@ -2239,7 +2237,6 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd *cmd)
>   goto done;
>   }
>  
> - saved_modes = ahd_save_modes(ahd);
>   ahd_set_modes(ahd, AHD_MODE_SCSI, AHD_MODE_SCSI);
>   last_phase = ahd_inb(ahd, LASTPHASE);
>   saved_scbptr = ahd_get_scbptr(ahd);
> @@ -2257,7 +2254,6 @@ ahd_linux_queue_abort_cmd(struct scsi_cmnd *cmd)
>* passed in command.  That command is currently active on the
>* bus or is in the disconnected state.
>*/
> - saved_scsiid = ahd_inb(ahd, SAVED_SCSIID);
>   if (last_phase != P_BUSFREE
>   && SCB_GET_TAG(pending_scb) == active_scbptr) {
>  
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 23/24] scsi: aic7xxx: aic79xx_osm: Fix 'amount_xferred' set but not used issue

2020-07-14 Thread Hannes Reinecke
On 7/13/20 10:00 AM, Lee Jones wrote:
> 'amount_xferred' is used, but only in certain circumstances.  Place
> the same stipulations on the defining/allocating of 'amount_xferred'
> as is placed when using it.
> 
> We've been careful not to change any of the ordering semantics here.
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/scsi/aic7xxx/aic79xx_osm.c: In function ‘ahd_done’:
>  drivers/scsi/aic7xxx/aic79xx_osm.c:1796:12: warning: variable 
> ‘amount_xferred’ set but not used [-Wunused-but-set-variable]
> 
> Cc: Hannes Reinecke 
> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/aic7xxx/aic79xx_osm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
> b/drivers/scsi/aic7xxx/aic79xx_osm.c
> index 8e43ff86e0a60..3782a20d58885 100644
> --- a/drivers/scsi/aic7xxx/aic79xx_osm.c
> +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
> @@ -1787,10 +1787,12 @@ ahd_done(struct ahd_softc *ahd, struct scb *scb)
>*/
>   cmd->sense_buffer[0] = 0;
>   if (ahd_get_transaction_status(scb) == CAM_REQ_INPROG) {
> +#ifdef AHD_REPORT_UNDERFLOWS
>   uint32_t amount_xferred;
>  
>   amount_xferred =
>   ahd_get_transfer_length(scb) - ahd_get_residual(scb);
> +#endif
>   if ((scb->flags & SCB_TRANSMISSION_ERROR) != 0) {
>  #ifdef AHD_DEBUG
>       if ((ahd_debug & AHD_SHOW_MISC) != 0) {
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 15/24] scsi: myrs: Demote obvious misuse of kerneldoc to standard comment blocks

2020-07-14 Thread Hannes Reinecke
n parameter or member 'status' not 
> described in 'myrs_err_status'
>  drivers/scsi/myrs.c:2349: warning: Function parameter or member 'parm0' not 
> described in 'myrs_err_status'
>  drivers/scsi/myrs.c:2349: warning: Function parameter or member 'parm1' not 
> described in 'myrs_err_status'
> 
> Cc: Hannes Reinecke 
> Cc: Linux GmbH 

Please, do fix your mailer/script.
This is my company e-mail address, but my name is actually the same even
when working for the company ...

> Cc: "Leonard N. Zubkoff" 
> Signed-off-by: Lee Jones 
> ---
>  drivers/scsi/myrs.c | 34 +-
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
I had been wanting to convert this to proper kernel-doc style, but never
found the time to actually do it.
So this will serve for now.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer


Re: [PATCH v2 21/29] scsi: aic7xxx: aic7xxx_osm: Fix 'amount_xferred' set but not used issue

2020-07-14 Thread Hannes Reinecke

On 7/13/20 9:46 AM, Lee Jones wrote:

'amount_xferred' is used, but only in certain circumstances.  Place
the same stipulations on the defining/allocating of 'amount_xferred'
as is placed when using it.

We've been careful not to change any of the ordering semantics here.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/aic7xxx/aic7xxx_osm.c: In function ‘ahc_done’:
  drivers/scsi/aic7xxx/aic7xxx_osm.c:1725:12: warning: variable 
‘amount_xferred’ set but not used [-Wunused-but-set-variable]
  1725 | uint32_t amount_xferred;
  | ^~

Cc: Hannes Reinecke 
Cc: "Daniel M. Eischen" 
Cc: Doug Ledford 
Signed-off-by: Lee Jones 
---
  drivers/scsi/aic7xxx/aic7xxx_osm.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
b/drivers/scsi/aic7xxx/aic7xxx_osm.c
index ed437c16de881..e7ccb8b80fc19 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -1711,10 +1711,12 @@ ahc_done(struct ahc_softc *ahc, struct scb *scb)
 */
cmd->sense_buffer[0] = 0;
if (ahc_get_transaction_status(scb) == CAM_REQ_INPROG) {
+#ifdef AHC_REPORT_UNDERFLOWS
uint32_t amount_xferred;
  
  		amount_xferred =

ahc_get_transfer_length(scb) - ahc_get_residual(scb);
+#endif
if ((scb->flags & SCB_TRANSMISSION_ERROR) != 0) {
  #ifdef AHC_DEBUG
if ((ahc_debug & AHC_SHOW_MISC) != 0) {


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 20/29] scsi: aic7xxx: aic7xxx_osm: Remove unused variable 'targ'

2020-07-14 Thread Hannes Reinecke

On 7/13/20 9:46 AM, Lee Jones wrote:

Looks like checking the 'targ' was removed in 2005.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/aic7xxx/aic7xxx_osm.c: In function ‘ahc_send_async’:
  drivers/scsi/aic7xxx/aic7xxx_osm.c:1604:28: warning: variable ‘targ’ set but 
not used [-Wunused-but-set-variable]
  1604 | struct ahc_linux_target *targ;
  | ^~~~

Cc: Hannes Reinecke 
Cc: "Daniel M. Eischen" 
Cc: Doug Ledford 
Signed-off-by: Lee Jones 
---
  drivers/scsi/aic7xxx/aic7xxx_osm.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
b/drivers/scsi/aic7xxx/aic7xxx_osm.c
index cc4c7b1781466..ed437c16de881 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -1592,7 +1592,6 @@ ahc_send_async(struct ahc_softc *ahc, char channel,
case AC_TRANSFER_NEG:
{
struct  scsi_target *starget;
-   struct  ahc_linux_target *targ;
struct  ahc_initiator_tinfo *tinfo;
struct  ahc_tmode_tstate *tstate;
int target_offset;
@@ -1626,7 +1625,6 @@ ahc_send_async(struct ahc_softc *ahc, char channel,
starget = ahc->platform_data->starget[target_offset];
if (starget == NULL)
break;
-   targ = scsi_transport_target_data(starget);
  
  		target_ppr_options =

(spi_dt(starget) ? MSG_EXT_PPR_DT_REQ : 0)


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 19/29] scsi: aic7xxx: aic7xxx_osm: Remove unused variable 'ahc'

2020-07-14 Thread Hannes Reinecke

On 7/13/20 9:46 AM, Lee Jones wrote:

Looks as though 'ahc' hasn't been used since 2005.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/aic7xxx/aic7xxx_osm.c: In function ‘ahc_linux_slave_configure’:
  drivers/scsi/aic7xxx/aic7xxx_osm.c:674:20: warning: variable ‘ahc’ set but 
not used [-Wunused-but-set-variable]
  674 | struct ahc_softc *ahc;
  | ^~~

Cc: Hannes Reinecke 
Cc: "Daniel M. Eischen" 
Cc: Doug Ledford 
Signed-off-by: Lee Jones 
---
  drivers/scsi/aic7xxx/aic7xxx_osm.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
b/drivers/scsi/aic7xxx/aic7xxx_osm.c
index 32bfe20d79cc1..cc4c7b1781466 100644
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -666,10 +666,6 @@ ahc_linux_slave_alloc(struct scsi_device *sdev)
  static int
  ahc_linux_slave_configure(struct scsi_device *sdev)
  {
-   struct  ahc_softc *ahc;
-
-   ahc = *((struct ahc_softc **)sdev->host->hostdata);
-
if (bootverbose)
sdev_printk(KERN_INFO, sdev, "Slave Configure\n");
  


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


Re: [PATCH v2 12/29] scsi: libfc: fc_fcp: Provide missing and repair existing function documentation

2020-07-14 Thread Hannes Reinecke

On 7/13/20 9:46 AM, Lee Jones wrote:

Mostly due to descriptions not keeping up with API changes.

Fixes the following W=1 kernel build warning(s):

  drivers/scsi/libfc/fc_fcp.c:299: warning: Function parameter or member 
'status_code' not described in 'fc_fcp_retry_cmd'
  drivers/scsi/libfc/fc_fcp.c:595: warning: Function parameter or member 'seq' 
not described in 'fc_fcp_send_data'
  drivers/scsi/libfc/fc_fcp.c:595: warning: Excess function parameter 'sp' 
description in 'fc_fcp_send_data'
  drivers/scsi/libfc/fc_fcp.c:1289: warning: Function parameter or member 't' 
not described in 'fc_lun_reset_send'
  drivers/scsi/libfc/fc_fcp.c:1289: warning: Excess function parameter 'data' 
description in 'fc_lun_reset_send'
  drivers/scsi/libfc/fc_fcp.c:1422: warning: Function parameter or member 't' 
not described in 'fc_fcp_timeout'
  drivers/scsi/libfc/fc_fcp.c:1422: warning: Excess function parameter 'data' 
description in 'fc_fcp_timeout'
  drivers/scsi/libfc/fc_fcp.c:1696: warning: Function parameter or member 
'code' not described in 'fc_fcp_recovery'
  drivers/scsi/libfc/fc_fcp.c:1716: warning: Function parameter or member 
'offset' not described in 'fc_fcp_srr'
  drivers/scsi/libfc/fc_fcp.c:1859: warning: Function parameter or member 
'sc_cmd' not described in 'fc_queuecommand'
  drivers/scsi/libfc/fc_fcp.c:1859: warning: Excess function parameter 'cmd' 
description in 'fc_queuecommand'

Cc: Hannes Reinecke 
Signed-off-by: Lee Jones 
---
  drivers/scsi/libfc/fc_fcp.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/libfc/fc_fcp.c b/drivers/scsi/libfc/fc_fcp.c
index bf2cc9656e191..e11d4f002bd49 100644
--- a/drivers/scsi/libfc/fc_fcp.c
+++ b/drivers/scsi/libfc/fc_fcp.c
@@ -289,6 +289,7 @@ static int fc_fcp_send_abort(struct fc_fcp_pkt *fsp)
  /**
   * fc_fcp_retry_cmd() - Retry a fcp_pkt
   * @fsp: The FCP packet to be retried
+ * @status_code: The FCP status code to set
   *
   * Sets the status code to be FC_ERROR and then calls
   * fc_fcp_complete_locked() which in turn calls fc_io_compl().
@@ -580,7 +581,7 @@ static void fc_fcp_recv_data(struct fc_fcp_pkt *fsp, struct 
fc_frame *fp)
  /**
   * fc_fcp_send_data() - Send SCSI data to a target
   * @fsp:  The FCP packet the data is on
- * @sp:  The sequence the data is to be sent on
+ * @seq:  The sequence the data is to be sent on
   * @offset:   The starting offset for this data request
   * @seq_blen: The burst length for this data request
   *
@@ -1283,7 +1284,7 @@ static int fc_fcp_pkt_abort(struct fc_fcp_pkt *fsp)
  
  /**

   * fc_lun_reset_send() - Send LUN reset command
- * @data: The FCP packet that identifies the LUN to be reset
+ * @t: Timer context used to fetch the FSP packet
   */
  static void fc_lun_reset_send(struct timer_list *t)
  {
@@ -1409,7 +1410,7 @@ static void fc_fcp_cleanup(struct fc_lport *lport)
  
  /**

   * fc_fcp_timeout() - Handler for fcp_pkt timeouts
- * @data: The FCP packet that has timed out
+ * @t: Timer context used to fetch the FSP packet
   *
   * If REC is supported then just issue it and return. The REC exchange will
   * complete or time out and recovery can continue at that point. Otherwise,
@@ -1691,6 +1692,7 @@ static void fc_fcp_rec_error(struct fc_fcp_pkt *fsp, 
struct fc_frame *fp)
  /**
   * fc_fcp_recovery() - Handler for fcp_pkt recovery
   * @fsp: The FCP pkt that needs to be aborted
+ * @code: The FCP status code to set
   */
  static void fc_fcp_recovery(struct fc_fcp_pkt *fsp, u8 code)
  {
@@ -1709,6 +1711,7 @@ static void fc_fcp_recovery(struct fc_fcp_pkt *fsp, u8 
code)
   * fc_fcp_srr() - Send a SRR request (Sequence Retransmission Request)
   * @fsp:   The FCP packet the SRR is to be sent on
   * @r_ctl: The R_CTL field for the SRR request
+ * @offset: The SRR relative offset
   * This is called after receiving status but insufficient data, or
   * when expecting status but the request has timed out.
   */
@@ -1851,7 +1854,7 @@ static inline int fc_fcp_lport_queue_ready(struct 
fc_lport *lport)
  /**
   * fc_queuecommand() - The queuecommand function of the SCSI template
   * @shost: The Scsi_Host that the command was issued to
- * @cmd:   The scsi_cmnd to be executed
+ * @sc_cmd:   The scsi_cmnd to be executed
   *
   * This is the i/o strategy routine, called by the SCSI layer.
   */


Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


  1   2   3   4   5   6   7   8   9   10   >