Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

2023-08-09 Thread Ming Lei
On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> On 08/10/23 at 08:09am, Ming Lei wrote:
> > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > I'm starting to sound like a broken record, but we can't just do random
> > > is_kdump checks, and it's not going to get better by resending it again 
> > > and
> > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > reflected in cpu_possible_map and we need to use that information.
> > > 
> > 
> > Can you look at previous kdump/arch guys' comment about kdump usage &
> > num_possible_cpus?
> > 
> > 
> > https://lore.kernel.org/linux-block/caf+s44ruqswbosy9kmdx35crviqnxoeuvgnsue75bb0y2jg...@mail.gmail.com/
> > https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > 
> > The point is that kdump kernels does not limit the number of possible CPUs.
> > 
> > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > num_possible_cpus becomes 1.
> 
> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> number which can be brought up during bootup. We noticed this diference
> because a large number of possible cpus will cost more memory in kdump
> kernel. e.g percpu initialization, even though kdump kernel have set
> "maxcpus=1". 
> 
> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> dev and maintainers do not care about it. Finally the patches are not
> accepted, and the work is not continued.
> 
> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> Can we reconsider adding 'nr_cpus=' to power arch since real issue
> occurred in kdump kernel?

If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.

> 
> As for this patchset, it can be accpeted so that no failure in kdump
> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.

IMO 'nr_cpus=' support should be preferred, given it is annoying to
maintain two kinds of implementation for kdump kernel from driver
viewpoint. I guess kdump things can be simplified too with supporting
'nr_cpus=' only.

thanks,
Ming



Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()

2023-08-09 Thread Ming Lei
On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> I'm starting to sound like a broken record, but we can't just do random
> is_kdump checks, and it's not going to get better by resending it again and
> again.  If kdump kernels limit the number of possible CPUs, it needs to
> reflected in cpu_possible_map and we need to use that information.
> 

Can you look at previous kdump/arch guys' comment about kdump usage &
num_possible_cpus?


https://lore.kernel.org/linux-block/caf+s44ruqswbosy9kmdx35crviqnxoeuvgnsue75bb0y2jg...@mail.gmail.com/
https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/

The point is that kdump kernels does not limit the number of possible CPUs.

1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
num_possible_cpus becomes 1.

2) some archs do not support 'nr_cpus=1', and have to rely on
'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
with single online cpu. That causes trouble because blk-mq limits single
queue.

Documentation/admin-guide/kdump/kdump.rst

Thanks, 
Ming



Re: [PATCH 2/2] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues

2023-07-10 Thread Ming Lei
Hi Baoquan,

On Tue, Jul 11, 2023 at 11:35:50AM +0800, Baoquan He wrote:
> On 07/10/23 at 05:14pm, Ming Lei wrote:
> > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote:
> > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote:
> > > > Take blk-mq's knowledge into account for calculating io queues.
> > > > 
> > > > Fix wrong queue mapping in case of kdump kernel.
> > > > 
> > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> > > > still returns all CPUs.
> > > 
> > > That's simply broken.  Please fix the arch code to make sure
> > > it does not return a bogus num_possible_cpus value for these
> > 
> > That is documented in Documentation/admin-guide/kdump/kdump.rst.
> > 
> > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1"
> > simply keep one of CPU cores as online, and others as offline.
> 
> I don't know maxcpus on arm and ppc64 well. But maxcpus=1 or nr_cpus=1
> are suggested parameter. Because usually nr_cpus=1 is enough to make
> kdump kernel work well to capture vmcore. However, user is allowed to
> specify nr_cpus=n (n>1) if they think multiple cpus are needed in kdump
> kernel. Your hard coding of cpu number in kdump kernel may be not so
> reasonable.

As I mentioned, for arm/ppc64, passing 'maxcpus=1' actually follows
Documentation/admin-guide/kdump/kdump.rst.

'nr_cpus=N' just works fine, so not related with this topic.

After 'maxcpus=1' is passed, kernel only brings up one of cpu cores as
online during booting, and others still can be put into online by
userspace. Now this way causes IO timeout on some storage device which
uses managed irq and supports multiple io queues.

Here the focus is if passing 'maxcpus=1' is valid for kdump
kernel, that is we want to hear from our arch/kdump guys.

If yes, something needs to be fixed, such as, what this patchset is
doing.

> 
> Please cc kexec mailing list when posting so that people can view the
> whole thread of discussion.

Already Cc kexe & arm/powerpc & irq list.


Thanks,
Ming



Re: [PATCH 2/2] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues

2023-07-10 Thread Ming Lei
On Mon, Jul 10, 2023 at 10:51:43AM -0600, Keith Busch wrote:
> On Mon, Jul 10, 2023 at 05:14:15PM +0800, Ming Lei wrote:
> > On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote:
> > > On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote:
> > > > Take blk-mq's knowledge into account for calculating io queues.
> > > > 
> > > > Fix wrong queue mapping in case of kdump kernel.
> > > > 
> > > > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> > > > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> > > > still returns all CPUs.
> > > 
> > > That's simply broken.  Please fix the arch code to make sure
> > > it does not return a bogus num_possible_cpus value for these
> > 
> > That is documented in Documentation/admin-guide/kdump/kdump.rst.
> > 
> > On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1"
> > simply keep one of CPU cores as online, and others as offline.
> > 
> > So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for
> > kdump kernel.
> > 
> > > setups, otherwise you'll have to paper over it in all kind of
> > > drivers.
> > 
> > The issue is only triggered for drivers which use managed irq &
> > multiple hw queues.
> 
> Is the problem that the managed interrupt sets the effective irq
> affinity to an offline CPU? You mentioned observed timeouts; are you

Yes, the problem is that blk-mq only creates hctx0, so nvme-pci
translate it into hctx0's nvme_queue, this way is actually wrong, cause
blk-mq's view on queue topo isn't same with nvme's view.

> seeing the "completion polled" nvme message?

Yes, "completion polled" can be observed. Meantime the warning in
__irq_startup_managed() can be triggered from
nvme_timeout()->nvme_poll_irqdisable()->enable_irq().


Thanks,
Ming



Re: [PATCH 2/2] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues

2023-07-10 Thread Ming Lei
On Mon, Jul 10, 2023 at 08:41:09AM +0200, Christoph Hellwig wrote:
> On Sat, Jul 08, 2023 at 10:02:59AM +0800, Ming Lei wrote:
> > Take blk-mq's knowledge into account for calculating io queues.
> > 
> > Fix wrong queue mapping in case of kdump kernel.
> > 
> > On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> > `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> > still returns all CPUs.
> 
> That's simply broken.  Please fix the arch code to make sure
> it does not return a bogus num_possible_cpus value for these

That is documented in Documentation/admin-guide/kdump/kdump.rst.

On arm and ppc64, 'maxcpus=1' is passed for kdump kernel, and "maxcpu=1"
simply keep one of CPU cores as online, and others as offline.

So Cc our arch(arm & ppc64) & kdump guys wrt. passing 'maxcpus=1' for
kdump kernel.

> setups, otherwise you'll have to paper over it in all kind of
> drivers.

The issue is only triggered for drivers which use managed irq &
multiple hw queues.


Thanks,
Ming



Re: [next-20220225][Oops][ppc] lvm snapshot merge results kernel panics (throtl_pending_timer_fn)

2022-03-02 Thread Ming Lei
On Wed, Mar 02, 2022 at 01:31:39PM +0530, Abdul Haleem wrote:
> Greeting's
> 
> Linux next kernel 5.17.0-rc5-next-20220225 crashed on my power 10 LPAR when
> merge lvm snapshot on nvme disk

Please try next-20220301, in which the "bad" patch of 'block: cancel all
throttled bios in del_gendisk()' is dropped.


Thanks,
Ming



Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Ming Lei
On Thu, Jan 14, 2021 at 11:24:35AM -0600, Brian King wrote:
> On 1/13/21 7:27 PM, Ming Lei wrote:
> > On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> >> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> >>> On 1/12/21 2:54 PM, Brian King wrote:
> >>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>>>> Introduce several new vhost fields for managing MQ state of the adapter
> >>>>> as well as initial defaults for MQ enablement.
> >>>>>
> >>>>> Signed-off-by: Tyrel Datwyler 
> >>>>> ---
> >>>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
> >>>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
> >>>>>  2 files changed, 17 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c 
> >>>>> b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> index ba95438a8912..9200fe49c57e 100644
> >>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template 
> >>>>> = {
> >>>>> .max_sectors = IBMVFC_MAX_SECTORS,
> >>>>> .shost_attrs = ibmvfc_attrs,
> >>>>> .track_queue_depth = 1,
> >>>>> +   .host_tagset = 1,
> >>>>
> >>>> This doesn't seem right. You are setting host_tagset, which means you 
> >>>> want a
> >>>> shared, host wide, tag set for commands. It also means that the total
> >>>> queue depth for the host is can_queue. However, it looks like you are 
> >>>> allocating
> >>>> max_requests events for each sub crq, which means you are over 
> >>>> allocating memory.
> >>>
> >>> With the shared tagset yes the queue depth for the host is can_queue, but 
> >>> this
> >>> also implies that the max queue depth for each hw queue is also 
> >>> can_queue. So,
> >>> in the worst case that all commands are queued down the same hw queue we 
> >>> need an
> >>> event pool with can_queue commands.
> >>>
> >>>>
> >>>> Looking at this closer, we might have bigger problems. There is a host 
> >>>> wide
> >>>> max number of commands that the VFC host supports, which gets returned on
> >>>> NPIV Login. This value can change across a live migration event.
> >>>
> >>> From what I understand the max commands can only become less.
> >>>
> >>>>
> >>>> The ibmvfc driver, which does the same thing the lpfc driver does, 
> >>>> modifies
> >>>> can_queue on the scsi_host *after* the tag set has been allocated. This 
> >>>> looks
> >>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> >>>> we look at can_queue once the tag set is setup, and I'm not seeing a 
> >>>> good way
> >>>> to dynamically change the host queue depth once the tag set is setup. 
> >>>>
> >>>> Unless I'm missing something, our best options appear to either be to 
> >>>> implement
> >>>> our own host wide busy reference counting, which doesn't sound very 
> >>>> good, or
> >>>> we need to add some API to block / scsi that allows us to dynamically 
> >>>> change
> >>>> can_queue.
> >>>
> >>> Changing can_queue won't do use any good with the shared tagset becasue 
> >>> each
> >>> queue still needs to be able to queue can_queue number of commands in the 
> >>> worst
> >>> case.
> >>
> >> The issue I'm trying to highlight here is the following scenario:
> >>
> >> 1. We set shost->can_queue, then call scsi_add_host, which allocates the 
> >> tag set.
> >>
> >> 2. On our NPIV login response from the VIOS, we might get a lower value 
> >> than we
> >> initially set in shost->can_queue, so we update it, but nobody ever looks 
> >> at it
> >> again, and we don't have any protection against sending too many commands 
> >> to the host.
> >>
> >>
> >> Basically, we no longer have any code that ensures we don't send more
> >> commands to the VIOS than we are told it supports. According to the 
> >> architectur

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-13 Thread Ming Lei
On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> > On 1/12/21 2:54 PM, Brian King wrote:
> >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>> Introduce several new vhost fields for managing MQ state of the adapter
> >>> as well as initial defaults for MQ enablement.
> >>>
> >>> Signed-off-by: Tyrel Datwyler 
> >>> ---
> >>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
> >>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
> >>>  2 files changed, 17 insertions(+)
> >>>
> >>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c 
> >>> b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> index ba95438a8912..9200fe49c57e 100644
> >>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> >>>   .max_sectors = IBMVFC_MAX_SECTORS,
> >>>   .shost_attrs = ibmvfc_attrs,
> >>>   .track_queue_depth = 1,
> >>> + .host_tagset = 1,
> >>
> >> This doesn't seem right. You are setting host_tagset, which means you want 
> >> a
> >> shared, host wide, tag set for commands. It also means that the total
> >> queue depth for the host is can_queue. However, it looks like you are 
> >> allocating
> >> max_requests events for each sub crq, which means you are over allocating 
> >> memory.
> > 
> > With the shared tagset yes the queue depth for the host is can_queue, but 
> > this
> > also implies that the max queue depth for each hw queue is also can_queue. 
> > So,
> > in the worst case that all commands are queued down the same hw queue we 
> > need an
> > event pool with can_queue commands.
> > 
> >>
> >> Looking at this closer, we might have bigger problems. There is a host wide
> >> max number of commands that the VFC host supports, which gets returned on
> >> NPIV Login. This value can change across a live migration event.
> > 
> > From what I understand the max commands can only become less.
> > 
> >>
> >> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> >> can_queue on the scsi_host *after* the tag set has been allocated. This 
> >> looks
> >> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> >> we look at can_queue once the tag set is setup, and I'm not seeing a good 
> >> way
> >> to dynamically change the host queue depth once the tag set is setup. 
> >>
> >> Unless I'm missing something, our best options appear to either be to 
> >> implement
> >> our own host wide busy reference counting, which doesn't sound very good, 
> >> or
> >> we need to add some API to block / scsi that allows us to dynamically 
> >> change
> >> can_queue.
> > 
> > Changing can_queue won't do use any good with the shared tagset becasue each
> > queue still needs to be able to queue can_queue number of commands in the 
> > worst
> > case.
> 
> The issue I'm trying to highlight here is the following scenario:
> 
> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag 
> set.
> 
> 2. On our NPIV login response from the VIOS, we might get a lower value than 
> we
> initially set in shost->can_queue, so we update it, but nobody ever looks at 
> it
> again, and we don't have any protection against sending too many commands to 
> the host.
> 
> 
> Basically, we no longer have any code that ensures we don't send more
> commands to the VIOS than we are told it supports. According to the 
> architecture,
> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
> of a bug on our part.
> 
> I don't think it was ever clearly defined in the API that a driver can
> change shost->can_queue after calling scsi_add_host, but up until
> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
> it doesn't. 

Actually it isn't related with commit 6eb045e092ef, because 
blk_mq_alloc_tag_set()
uses .can_queue to create driver tag sbitmap and request pool.

So even thought without 6eb045e092ef, the updated .can_queue can't work
as expected because the max driver tag depth has been fixed by blk-mq already.

What 6eb045e092ef does is just to remove the double check on max
host-wide allowed commands because that has been respected by blk-mq
driver tag allocation already.

> 
> I started looking through drivers that do this, and so far, it looks like the
> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely 
> others...
> 
> We probably need an API that lets us change shost->can_queue dynamically.

I'd suggest to confirm changing .can_queue is one real usecase.


Thanks,
Ming



Re: [powerpc][next-20200701] Hung task timeouts during regression test runs

2020-07-02 Thread Ming Lei
On Thu, Jul 02, 2020 at 04:53:04PM +0530, Sachin Sant wrote:
> Starting with linux-next 20200701 release I am observing automated regressions
> tests taking longer time to complete. A test which took 10 minutes with 
> next-20200630
> took more than 60 minutes against next-20200701. 
> 
> Following hung task timeout messages were seen during these runs
> 
> [ 1718.848351]   Not tainted 5.8.0-rc3-next-20200701-autotest #1
> [ 1718.848356] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 1718.848362] NetworkManager  D0  2626  1 0x00040080
> [ 1718.848367] Call Trace:
> [ 1718.848374] [c008b0f6b8f0] [c0c6d558] schedule+0x78/0x130 
> (unreliable)
> [ 1718.848382] [c008b0f6bad0] [c001b070] __switch_to+0x2e0/0x480
> [ 1718.848388] [c008b0f6bb30] [c0c6ce9c] __schedule+0x2cc/0x910
> [ 1718.848394] [c008b0f6bc10] [c0c6d558] schedule+0x78/0x130
> [ 1718.848401] [c008b0f6bc40] [c05d5a64] 
> jbd2_log_wait_commit+0xd4/0x1a0
> [ 1718.848408] [c008b0f6bcc0] [c055fb6c] 
> ext4_sync_file+0x1cc/0x480
> [ 1718.848415] [c008b0f6bd20] [c0493530] vfs_fsync_range+0x70/0xf0
> [ 1718.848421] [c008b0f6bd60] [c0493638] do_fsync+0x58/0xd0
> [ 1718.848427] [c008b0f6bda0] [c04936d8] sys_fsync+0x28/0x40
> [ 1718.848433] [c008b0f6bdc0] [c0035e28] 
> system_call_exception+0xf8/0x1c0
> [ 1718.848440] [c008b0f6be20] [c000ca70] 
> system_call_common+0xf0/0x278
> 
> Comparing next-20200630 with next-20200701 one possible candidate seems to
> be following commit:
> 
> commit 37f4a24c2469a10a4c16c641671bd766e276cf9f
> blk-mq: centralise related handling into blk_mq_get_driver_tag
> 
> Reverting this commit allows the test to complete in 10 minutes.

Hello,

Thanks for the report.

Please try the following fix:

https://lore.kernel.org/linux-block/20200702062041.GC2452799@T590/raw


Thanks,
Ming



Re: remove a few uses of ->queuedata

2020-05-08 Thread Ming Lei
On Fri, May 08, 2020 at 06:15:02PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> various bio based drivers use queue->queuedata despite already having
> set up disk->private_data, which can be used just as easily.  This
> series cleans them up to only use a single private data pointer.
> 
> blk-mq based drivers that have code pathes that can't easily get at
> the gendisk are unaffected by this series.

Yeah, before adding disk, there still may be requests queued to LLD
for blk-mq based drivers.

So are there this similar situation for these bio based drivers?


Thanks,
Ming



Re: Oops in blk_mq_get_request() (was Re: ppc64le kernel panic on 5.2.9-rc1)

2019-08-16 Thread Ming Lei
 data->ctx->rq_dispatched[op_is_sync(op)]++;
>
> r10 = data->ctx->rq_dispatched[x]
> c067b230:   a8 00 49 e9 ld  r10,168(r9)   
>   <-  NIP
>
> x++
> c067b234:   01 00 4a 39 addir10,r10,1
>
> data->ctx->rq_dispatched[x] = r10
> c067b238:   a8 00 49 f9 std r10,168(r9)
>
> refcount_set(>ref, 1);
> c067b23c:   d4 00 ff 90 stw r7,212(r31)
>
>
> So we're oopsing at data->ctx->rq_dispatched[op_is_sync(op)]++.
>
> data->ctx looks completely bogus, ie. 800a00066b9e7d80, that's not
> anything like a valid kernel address.
>
> And also op doesn't look like a valid op value, it's 0x23, which has no
> flag bits set, but also doesn't match any of the values in req_opf.
>
> So I suspect data is pointing somewhere bogus. Or possibly it used to
> point at a blk_mq_alloc_data but doesn't anymore.
>
> Why that's happened I have no idea. I can't see any obvious commits in
> mainline or stable that mention anything similar, maybe someone on
> linux-block recognises it?
>
> cheers

Please try:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.4/block=556f36e90dbe7dded81f4fac084d2bc8a2458330

Strictly speaking, it is still a workaround, but it works in case that
CPU hotplug isn't
involved.

Thanks,
Ming Lei


Re: powerpc hugepage leak caused by 576ed913 "block: use bio_add_page in bio_iov_iter_get_pages"

2019-06-08 Thread Ming Lei
be in latest linux-next).
> >
> > I'll see if I can try that when I next get access to the machine.
>
> Ok, I've now had a chance to test the next-20190423 tree.
>
> I can still reproduce the problem: in fact it is substantially worse,
> and somewhat more consistent.
>
> Previously I usually lost 2-3 hugepages per run, though I'd
> occasionally seen other values between 0 and 8.  With the next tree, I
> lost 46 hugepages on most runs, though I also saw 45 and 48
> occasionally.

Hi David,

The following two patches should fix the issue, please test it.

https://lore.kernel.org/linux-block/20190608164853.10938-1-ming@redhat.com/T/#t

Thanks,
Ming Lei


Re: powerpc hugepage leak caused by 576ed913 "block: use bio_add_page in bio_iov_iter_get_pages"

2019-04-26 Thread Ming Lei
ould be in latest linux-next).
> >
> > I'll see if I can try that when I next get access to the machine.
>
> Ok, I've now had a chance to test the next-20190423 tree.
>
> I can still reproduce the problem: in fact it is substantially worse,
> and somewhat more consistent.
>
> Previously I usually lost 2-3 hugepages per run, though I'd
> occasionally seen other values between 0 and 8.  With the next tree, I
> lost 46 hugepages on most runs, though I also saw 45 and 48
> occasionally.

Please try to apply the following three patches, and I guess you may
see same behavior as linus tree(v5.1-rc, or v4.18). And the last one has
been applied to for-5.2/block which should be the latest next.

https://lore.kernel.org/linux-block/20190426104521.30602-2-ming@redhat.com/T/#u
https://lore.kernel.org/linux-block/20190426104521.30602-3-ming@redhat.com/T/#u

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/commit/?h=for-5.2/block=0257c0ed5ea3de3e32cb322852c4c40bc09d1b97

Thanks,
Ming Lei


Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Ming Lei
On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
 wrote:
>>
>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>
> That's what I mentioned earlier ;)
>
> Now Ming send an additional patch with seems to fix the bug introduced
> through the commit bdced438acd8. When testing with this new patch I
> can't get the panic anymore, but Mark reported he is still hitting it.

Laurent, thanks for your test on the 1st patch, and looks there are at
least two problems, and my 2nd patch sent just now should address
Mark's issue which is caused by bdced438acd83a.

Once the 2nd one is tested OK, I will send out the two together.

Thanks,
Ming
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-23 Thread Ming Lei
On Mon, 23 Nov 2015 10:46:20 +0800
Ming Lei <ming@canonical.com> wrote:

> Hi Mark,
> 
> On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msal...@redhat.com> wrote:
> > On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> >> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msal...@redhat.com> wrote:
> >> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> > > On Sat, 21 Nov 2015 12:30:14 +0100
> >> > > Laurent Dufour <lduf...@linux.vnet.ibm.com> wrote:
> >> > >
> >> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> >> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >> > > > >
> >> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> >> > > > > > merge bios to request infrastucture where we don't obey the 
> >> > > > > > merging
> >> > > > > > limits properly.
> >> > > > > >
> >> > > > > > Does either of you have a known good and first known bad kernel?
> >> > > > >
> >> > > > > Not me, I've only hit it one or two times. All I can say is I have 
> >> > > > > hit it in
> >> > > > > 4.4-rc1.
> >> > > > >
> >> > > > > Laurent, can you narrow it down at all?
> >> > > >
> >> > > > It seems that the panic is triggered by the commit bdced438acd8 
> >> > > > ("block:
> >> > > > setup bi_phys_segments after splitting") which has been pulled by the
> >> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >> > > > git://git.kernel.dk/linux-block").
> >> > > >
> >> > > > My system is panicing promptly when running a kernel built at
> >> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run 
> >> > > > hours
> >> > > > without panicing.
> >> > > >
> >> > > > This being said, I can't explain what's going wrong.
> >> > > >
> >> > > > May Ming shed some light here ?
> >> > >
> >> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> > > mind testing the following patch to see if it fixes your issue?
> >> > >
> >> > > ---
> >> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> > > From: Ming Lei <ming@canonical.com>
> >> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> > > Subject: [PATCH] block: fix segment split
> >> > >
> >> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> > > always points to the iterator local variable, which is obviously
> >> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> >> > >
> >> > > Signed-off-by: Ming Lei <ming@canonical.com>
> >> > > ---
> >> > >  block/blk-merge.c | 4 ++--
> >> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> > >
> >> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> > > index de5716d8..f2efe8a 100644
> >> > > --- a/block/blk-merge.c
> >> > > +++ b/block/blk-merge.c
> >> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
> >> > > request_queue *q,
> >> > >
> >> > >   seg_size += bv.bv_len;
> >> > >   bvprv = bv;
> >> > > - bvprvp = 
> >> > > + bvprvp = 
> >> > >   sectors += bv.bv_len >> 9;
> >> > >   continue;
> >> > >   }
> >> > > @@ -108,7 +108,7 @@ new_segment:
> >> > >
> >> > >   nsegs++;
> >> > >   bvprv = bv;
> >> > > - bvprvp = 
> >> > > + bvprvp = 
> >> > >   seg_size = bv.bv_len;
> >> > >   sectors += bv.bv_len >> 9;
> >> > >   }
> >> >
> >> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> >>
> >> OK, looks there ar

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
Hi Mark,

On Mon, Nov 23, 2015 at 9:50 AM, Mark Salter <msal...@redhat.com> wrote:
> On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
>> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msal...@redhat.com> wrote:
>> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> > > On Sat, 21 Nov 2015 12:30:14 +0100
>> > > Laurent Dufour <lduf...@linux.vnet.ibm.com> wrote:
>> > >
>> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > > > >
>> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > > > merge bios to request infrastucture where we don't obey the merging
>> > > > > > limits properly.
>> > > > > >
>> > > > > > Does either of you have a known good and first known bad kernel?
>> > > > >
>> > > > > Not me, I've only hit it one or two times. All I can say is I have 
>> > > > > hit it in
>> > > > > 4.4-rc1.
>> > > > >
>> > > > > Laurent, can you narrow it down at all?
>> > > >
>> > > > It seems that the panic is triggered by the commit bdced438acd8 
>> > > > ("block:
>> > > > setup bi_phys_segments after splitting") which has been pulled by the
>> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > > > git://git.kernel.dk/linux-block").
>> > > >
>> > > > My system is panicing promptly when running a kernel built at
>> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > > > without panicing.
>> > > >
>> > > > This being said, I can't explain what's going wrong.
>> > > >
>> > > > May Ming shed some light here ?
>> > >
>> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> > > mind testing the following patch to see if it fixes your issue?
>> > >
>> > > ---
>> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> > > From: Ming Lei <ming@canonical.com>
>> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
>> > > Subject: [PATCH] block: fix segment split
>> > >
>> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> > > always points to the iterator local variable, which is obviously
>> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
>> > >
>> > > Signed-off-by: Ming Lei <ming@canonical.com>
>> > > ---
>> > >  block/blk-merge.c | 4 ++--
>> > >  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
>> > > index de5716d8..f2efe8a 100644
>> > > --- a/block/blk-merge.c
>> > > +++ b/block/blk-merge.c
>> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> > > request_queue *q,
>> > >
>> > >   seg_size += bv.bv_len;
>> > >   bvprv = bv;
>> > > - bvprvp = 
>> > > + bvprvp = 
>> > >   sectors += bv.bv_len >> 9;
>> > >   continue;
>> > >   }
>> > > @@ -108,7 +108,7 @@ new_segment:
>> > >
>> > >   nsegs++;
>> > >   bvprv = bv;
>> > > - bvprvp = 
>> > > + bvprvp = 
>> > >   seg_size = bv.bv_len;
>> > >   sectors += bv.bv_len >> 9;
>> > >   }
>> >
>> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> OK, looks there are still other bugs, care to share us how to reproduce
>> it on arm64?
>>
>> thanks,
>> Ming
>
> Unfortunately, the best reproducer I have is to boot the platform. I have 
> seen the
> BUG a few times post-boot, but I don't have a consistant reproducer. I am 
> using
> upstream 4.4-rc1 with this config:
>
>   http://people.redhat.com/msalter/fh_defconfig
>
> With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 
> boots.
> On an AMD Seattle platform, about every 9 boots.

Thanks for 

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-22 Thread Ming Lei
On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msal...@redhat.com> wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <lduf...@linux.vnet.ibm.com> wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit 
>> > > it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct 
>> request_queue *q,
>>
>>   seg_size += bv.bv_len;
>>   bvprv = bv;
>> - bvprvp = 
>> + bvprvp = 
>>   sectors += bv.bv_len >> 9;
>>   continue;
>>   }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>>   nsegs++;
>>   bvprv = bv;
>> - bvprvp = 
>> + bvprvp = 
>>   seg_size = bv.bv_len;
>>   sectors += bv.bv_len >> 9;
>>   }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kernel BUG at drivers/scsi/scsi_lib.c:1096!

2015-11-21 Thread Ming Lei
On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour <lduf...@linux.vnet.ibm.com> wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
> 
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
> 
> This being said, I can't explain what's going wrong.
> 
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming@canonical.com>
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei <ming@canonical.com>
---
 block/blk-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,
 
seg_size += bv.bv_len;
bvprv = bv;
-   bvprvp = 
+   bvprvp = 
sectors += bv.bv_len >> 9;
continue;
}
@@ -108,7 +108,7 @@ new_segment:
 
nsegs++;
bvprv = bv;
-   bvprvp = 
+   bvprvp = 
seg_size = bv.bv_len;
sectors += bv.bv_len >> 9;
}
-- 
1.9.1



Thanks,
Ming


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

watchdog exception on 8548CDS during cpu_idle

2009-11-18 Thread Ming Lei

I used the vanilla linux 2.6.30 and compiled with mpc85xx_defconfig(enable 
CONFIG_BOOK_WDT) and then ran on 8548CDS and soon after I saw the prompt I hit 
this watchdog.
 
bash-2.04# PowerPC Book-E Watchdog Exception
NIP: c000b740 LR: c00088dc CTR: c000b6b0
REGS: cfffbf10 TRAP: 3202   Not tainted  (2.6.30)
MSR: 00029000 EE,ME,CE  CR: 28028048  XER: 2000
TASK = c04f4458[0] 'swapper' THREAD: c052c000
GPR00: c000b6b0 c052df90 c04f4458 0080 80804080 001d c053af48 00069000 
GPR08:   08954400  002167ee 7f652f31 0ffad800 0fff 
GPR16:      f30a620b 0ff50450  
GPR24:   c053506c c0534fa0 c0534fa0 c052c034 0008 c052c000 
NIP [c000b740] e500_idle+0x90/0x94
LR [c00088dc] cpu_idle+0x98/0xec
Call Trace:
[c052df90] [c000889c] cpu_idle+0x58/0xec (unreliable)
[c052dfb0] [c00023ec] rest_init+0x5c/0x70
[c052dfc0] [c04c16f4] start_kernel+0x22c/0x290
[c052dff0] [c398] skpinv+0x2b0/0x2ec
Instruction dump:
7c90faa6 548402ce 7c841b78 4c00012c 7c90fba6 4c00012c 7ce000a6 64e70004 
60e78000 7c0004ac 7ce00124 4c00012c 4800 812b00a0 912b0090 3960 

Have anyone seen this before? Why the EE bit is on in the stack trace? I put 
show_regs in watchdog exception handler in traps.c. I verified that EE is off 
when entering the watchdog exception handler. Can I trust this stack trace?

Thanks
Ming

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


question on PPC64 relocatable kernel

2009-07-31 Thread Ming Lei
Hi,

I am researching PPC64 code and try to come up a design for relocatable kernel 
for ppc32. I noticed that the current ppc64 implementation only changes all the 
entries in RELA table to add the offset from compile load address to relocated 
address, but not for GOT table. Does GOT entry need the adjustment as well?

The second question is: does PPC64 relocatable kernel still needs to copy the 
whole kernel from running address to address 0?

Thanks
Ming
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev