[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 --- Comment #13 from lober...@redhat.com --- Apr 18 01:29:16 kernel: cmaidad D0 3442 1 0x Apr 18 01:29:16 kernel: Call Trace: Apr 18 01:29:16 kernel: __schedule+0x3b9/0x8f0 Apr 18 01:29:16 kernel: schedule+0x36/0x80 Apr 18 01:29:16 kernel: scsi_block_when_processing_errors+0xd5/0x110 Apr 18 01:29:16 kernel: ? wake_atomic_t_function+0x60/0x60 Apr 18 01:29:16 kernel: sg_open+0x14a/0x5c0 * Likely a pass though from the cma* management daemons Can you try reproduce with all the HP Health daemons disabled -- You are receiving this mail because: You are the assignee for the bug.
RE: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Thursday, April 19, 2018 2:54 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org > Cc: Long Li > Subject: [Patch v2] Storvsc: Select channel based on available percentage of > ring buffer to > write > > From: Long Li > > This is a best effort for estimating on how busy the ring buffer is for > that channel, based on available buffer to write in percentage. It is still > possible that at the time of actual ring buffer write, the space may not be > available due to other processes may be writing at the time. > > Selecting a channel based on how full it is can reduce the possibility that > a ring buffer write will fail, and avoid the situation a channel is over > busy. > > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. > > Changes. > v2: Pre-allocate struct cpumask on the heap. > Struct cpumask is a big structure (1k bytes) when CONFIG_NR_CPUS=8192 (default > value when CONFIG_MAXSMP=y). Don't use kernel stack for it by pre-allocating > them using kmalloc when channels are first initialized. > > Signed-off-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 90 > -- > 1 file changed, 72 insertions(+), 18 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index a2ec0bc9e9fa..2a9fff94dd1a 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer > size > (bytes)"); > > module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to > subchannels"); > + > +static int ring_avail_percent_lowater = 10; > +module_param(ring_avail_percent_lowater, int, S_IRUGO); > +MODULE_PARM_DESC(ring_avail_percent_lowater, > + "Select a channel if available ring size > this in percent"); > + > /* > * Timeout in seconds for all devices managed by this driver. > */ > @@ -468,6 +474,13 @@ struct storvsc_device { >* Mask of CPUs bound to subchannels. >*/ > struct cpumask alloced_cpus; > + /* > + * Pre-allocated struct cpumask for each hardware queue. > + * struct cpumask is used by selecting out-going channels. It is a > + * big structure, default to 1024k bytes when CONFIG_MAXSMP=y. I think you mean "1024 bytes" or "1k bytes" in the above comment. > + * Pre-allocate it to avoid allocation on the kernel stack. > + */ > + struct cpumask *cpumask_chns; > /* Used for vsc/vsp channel reset process */ > struct storvsc_cmd_request init_request; > struct storvsc_cmd_request reset_request; > @@ -872,6 +885,13 @@ static int storvsc_channel_init(struct hv_device > *device, bool is_fc) > if (stor_device->stor_chns == NULL) > return -ENOMEM; > > + stor_device->cpumask_chns = kcalloc(num_possible_cpus(), > + sizeof(struct cpumask), GFP_KERNEL); Note that num_possible_cpus() is 240 for a Hyper-V 2016 guest unless overridden on the kernel boot line, so this is going to allocate 240 Kbytes for each synthetic SCSI controller. On an Azure VM, which has two IDE and two SCSI controllers, this is nearly 1 Mbyte. It's unfortunate to have to allocate this much memory for a what is essentially a temporary variable. Further down in these comments, I've proposed an alternate implementation of the code that avoids the need for the temporary variable, and hence avoids the need for this allocation. > + if (stor_device->cpumask_chns == NULL) { > + kfree(stor_device->stor_chns); > + return -ENOMEM; > + } > + > stor_device->stor_chns[device->channel->target_cpu] = device->channel; > cpumask_set_cpu(device->channel->target_cpu, > &stor_device->alloced_cpus); > @@ -1232,6 +1252,7 @@ static int storvsc_dev_remove(struct hv_device *device) > vmbus_close(device->channel); > > kfree(stor_device->stor_chns); > + kfree(stor_device->cpumask_chns); > kfree(stor_device); > return 0; > } > @@ -1241,7 +1262,7 @@ static struct vmbus_channel *get_og_chn(struct > storvsc_device > *stor_device, > {1G/ > u16 slot = 0; > u16 hash_qnum; > - struct cpumask alloced_mask; > + struct cpumask *alloced_mask = &stor_device->cpumask_chns[q_num]; > int num_channels, tgt_cpu; > > if (stor_device->num_sc == 0) > @@ -1257,10 +1278,10 @@ static struct vmbus_channel *get_og_chn(struct > storvsc_device > *stor_device, >* III. Mapping is persistent. >*/ > > - cpumask_and(&alloced_mask, &stor_device->alloced_cpus, > +
[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 lober...@redhat.com changed: What|Removed |Added CC||lober...@redhat.com --- Comment #12 from lober...@redhat.com --- We had a bunch of issues with the HPSA as already mentioned above. The specific issue that we had to revert was this commit 8b834bff1b73dce46f4e9f5e84af6f73fed8b0ef I assume your array has a charged battery (capacitor) and the writeback-cache is enabled on the 420i Are you only seeing this wen you have cmaeventd running, because hat can use pass through commands and has been known to cause issues. I am not running any of the HPE Proliant SPP daemons on my system. I have not seen this load related issue (without those daemons running) that you are seeing on my DL380G7 or Dl380G8 here so I will work on trying to reproduce and assist. Thanks Laurence -- You are receiving this mail because: You are the assignee for the bug.
Greetings !!!
Citigroup Center Citibank House, North Carolina, United States Of America Greetings ,. We have FUND here in your name. We have been able to confirm that the money was deposited here as your Beneficiary/Inheritance Fund. The Fund was stopped for about 3 years now but it has finally been released by the Treasury for processing and payment to you. Get back to me if you are still available on this email address for release details. Have a blessed day. I await for your reply soon. Yours sincerely, Deborah Geist Citibank Customer Service msdeborage...@gmx.us
[Bug 199435] HPSA + P420i resetting logical Direct-Access never complete
https://bugzilla.kernel.org/show_bug.cgi?id=199435 --- Comment #11 from Anthony Hausman (anthonyhaussm...@gmail.com) --- The only patch that I'm sure that I have is the "scsi: hpsa: fix selection of reply queue" one. For the I'm using an out of the box 4.11 kernel. So I'm really not sure that the other patches are present. Unfortunately, the module does not compile using 4.11.0-14-generic headers. # make -C /lib/modules/4.11.0-14-generic/build M=$(pwd) --makefile="/root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt" make: Entering directory '/usr/src/linux-headers-4.11.0-14-generic' make -C /lib/modules/4.4.0-96-generic/build M=/usr/src/linux-headers-4.11.0-14-generic EXTRA_CFLAGS+=-DKCLASS4A modules make[1]: Entering directory '/usr/src/linux-headers-4.4.0-96-generic' make[2]: *** No rule to make target 'kernel/bounds.c', needed by 'kernel/bounds.s'. Stop. Makefile:1423: recipe for target '_module_/usr/src/linux-headers-4.11.0-14-generic' failed make[1]: *** [_module_/usr/src/linux-headers-4.11.0-14-generic] Error 2 make[1]: Leaving directory '/usr/src/linux-headers-4.4.0-96-generic' /root/hpsa-3.4.20-136/hpsa-3.4.20/drivers/scsi/Makefile.alt:96: recipe for target 'default' failed make: *** [default] Error 2 make: Leaving directory '/usr/src/linux-headers-4.11.0-14-generic' But if you tell me the principal problem is using the 4.11 kernel, I can upgrade it to use the 4.16.3 kernel. If I use it, must I use the out of box 3.4.20-136 hpsa driver or use your precedent patch on the last 3.4.20-125? -- You are receiving this mail because: You are the assignee for the bug.
Re: [PATCH 15/39] acpi/battery: simplify procfs code
On Thu, Apr 19, 2018 at 2:41 PM, Christoph Hellwig wrote: > Use remove_proc_subtree to remove the whole subtree on cleanup, and > unwind the registration loop into individual calls. Switch to use > proc_create_seq where applicable. > > Signed-off-by: Christoph Hellwig It is OK AFAICS. Reviewed-by: Rafael J. Wysocki
Re: [PATCH] bsg referencing bus driver module
On Fri, 2018-04-20 at 16:44 -0600, Anatoliy Glagolev wrote: > > > This patch isn't applyable because your mailer has changed all the > > tabs to spaces. > > > > I also think there's no need to do it this way. I think what we > > need is for fc_bsg_remove() to wait until the bsg queue is > > drained. It does look like the author thought this happened > > otherwise the code wouldn't have the note. If we fix it that way > > we can do the same thing in all the other transport classes that > > use bsg (which all have a similar issue). > > > > James > > > > Thanks, James. Sorry about the tabs; re-sending. > > On fc_bsg_remove()...: are you suggesting to implement the whole fix > in scsi_transport_fc.c? Yes, but it's not just scsi_transport_fc, scsi_transport_sas has the same issue. I think it's probably just the one liner addition of blk_drain_queue() that fixes this. There should probably be a block primitive that does the correct queue reference dance and calls blk_cleanup_queue() and blk_drain_queue() in order. > That would be nice, but I do not see how that > is possible. Even with the queue drained bsg still holds a reference > to the Scsi_Host via bsg_class_device; bsg_class_device itself is > referenced on bsg_open and kept around while a user-mode process > keeps a handle to bsg. Once you've called bsg_unregister_queue(), the queue will be destroyed and the reference released once the last job is drained, meaning the user can keep the bsg device open, but it will just return errors because of the lack of queue. This scenario allows removal to proceed without being held hostage by open devices. > Even if we somehow implement the waiting the call may be stuck > forever if the user-mode process keeps the handle. No it won't: after blk_cleanup_queue(), the queue is in bypass mode: no requests queued after this do anything other than complete with error, so they never make it into SCSI. > I think handling it via a rererence to the module is more consistent > with the way things are done in Linux. You suggested the approach > youself back in "Waiting for scsi_host_template release" discussion. That was before I analyzed the code paths. Module release is tricky, because the module exit won't be called until the references drop to zero, so you have to be careful about not creating a situation where module exit never gets called and module exit code should force stuff to detach and wait for the forcing to complete to make up for the reference circularity problem. If you do it purely by refcounting, the module actually may never release (that's why scsi_remove_host works the way it does, for instance). James