Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations

2021-05-18 Thread Bhupesh Sharma
Hi Will,

On Tue, 18 May 2021 at 17:19, Will Deacon  wrote:
>
> [Fixing Bhupesh's email address]
>
> On Thu, Apr 29, 2021 at 02:35:31PM +0100, Marc Zyngier wrote:
> > It recently became apparent that using kexec with kexec_file_load() on
> > arm64 is pretty similar to playing Russian roulette.
> >
> > Depending on the amount of memory, the HW supported and the firmware
> > interface used, your secondary kernel may overwrite critical memory
> > regions without which the secondary kernel cannot boot (the GICv3 LPI
> > tables being a prime example of such reserved regions).
> >
> > It turns out that there is at least two ways for reserved memory
> > regions to be described to kexec: /proc/iomem for the userspace
> > implementation, and memblock.reserved for kexec_file. And of course,
> > our LPI tables are only reserved using the resource tree, leading to
> > the aforementioned stamping. Similar things could happen with ACPI
> > tables as well.
> >
> > On my 24xA53 system artificially limited to 256MB of RAM (yes, it
> > boots with that little memory), trying to kexec a secondary kernel
> > failed every times. I can only presume that this was mostly tested
> > using kdump, which preserves the entire kernel memory range.
> >
> > This small series aims at triggering a discussion on what are the
> > expectations for kexec_file, and whether we should unify the two
> > reservation mechanisms.
>
> Bhupesh, since you've been involved with kexec file on arm64 before, please
> could you take a look at these patches?

Thanks for adding me in Cc.
Yes, I will look and test these patches asap.

Regards,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 4/4] i40e: don't open i40iw client for kdump

2021-02-25 Thread Bhupesh SHARMA
Hello Coiby,

On Mon, Feb 22, 2021 at 12:40 PM Coiby Xu  wrote:
>
> i40iw consumes huge amounts of memory. For example, on a x86_64 machine,
> i40iw consumed 1.5GB for Intel Corporation Ethernet Connection X722 for
> for 1GbE while "craskernel=auto" only reserved 160M. With the module
> parameter "resource_profile=2", we can reduce the memory usage of i40iw
> to ~300M which is still too much for kdump.
>
> Disabling the client registration would spare us the client interface
> operation open , i.e., i40iw_open for iwarp/uda device. Thus memory is
> saved for kdump.
>
> Signed-off-by: Coiby Xu 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_client.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_client.c 
> b/drivers/net/ethernet/intel/i40e/i40e_client.c
> index a2dba32383f6..aafc2587f389 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_client.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_client.c
> @@ -4,6 +4,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "i40e.h"
>  #include "i40e_prototype.h"
> @@ -741,6 +742,12 @@ int i40e_register_client(struct i40e_client *client)
>  {
> int ret = 0;
>
> +   /* Don't open i40iw client for kdump because i40iw will consume huge
> +* amounts of memory.
> +*/
> +   if (is_kdump_kernel())
> +   return ret;
> +

Since crashkernel size can be manually set on the command line by a
user, and some users might be fine with a ~300M memory usage by i40iw
client [with resource_profile=2"], in my view, disabling the client
for all kdump cases seems too restrictive.

We can probably check the crash kernel size allocated (
$ cat /sys/kernel/kexec_crash_size) and then make a decision
accordingly, so for example something like:

 +   if (is_kdump_kernel() && kexec_crash_size < 512M)
 +   return ret;

What do you think?

Regards,
Bhupesh

> if (!client) {
> ret = -EIO;
> goto out;
> --
> 2.30.1
>
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 3/3] arm64: support flipped VA and 52-bit kernel VA

2021-02-04 Thread Bhupesh SHARMA
Hi Kazu,

On Thu, Feb 4, 2021 at 11:26 AM HAGIO KAZUHITO(萩尾 一仁)
 wrote:
>
> Hi Pingfan, Bhupesh,
>
> -Original Message-
> > Except an correction in notes, the rest looks good to me.
> >
> > Reviewed-by: Pingfan Liu 
>
> Thank you for reviewing and testing this, applied the patch set.
> https://github.com/makedumpfile/makedumpfile/compare/7f185d2...a0216b6
>
> Bhupesh, thank you for your long term efforts.  I had to drop a few of
> your ideas mainly to maintain its functionality and compatibility, but
> the basis of your patch was very helpful.

Sure, no problem. I had some issues with gmail and the account got
locked for a couple of days.

Thanks for all your help.

Thanks to Pingfan as well for the quick reviews.

Regards,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC PATCH 4/4] arm64: support flipped VA and 52-bit kernel VA

2021-01-14 Thread Bhupesh SHARMA
Hi Kazu,

On Thu, Jan 14, 2021 at 3:33 PM piliu  wrote:
>
>
>
> On 1/14/21 4:25 PM, kazuhito.ha...@gmail.com wrote:
> > From: Kazuhito Hagio 
> >
> > Based on Bhupesh's patch and contains Pingfan's idea.
> >
> > Signed-off-by: Bhupesh Sharma 
> > Signed-off-by: Kazuhito Hagio 
> > ---
> >   arch/arm64.c   | 95 
> > --
> >   makedumpfile.c |  2 ++
> >   makedumpfile.h |  1 +
> >   3 files changed, 83 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm64.c b/arch/arm64.c
> > index 61ec89a..4ece19d 100644
> > --- a/arch/arm64.c
> > +++ b/arch/arm64.c
> > @@ -47,6 +47,8 @@ typedef struct {
> >   static int lpa_52_bit_support_available;
> >   static int pgtable_level;
> >   static int va_bits;
> > +static int vabits_actual;
> > +static int flipped_va;
> >   static unsigned long kimage_voffset;
> >
> >   #define SZ_4K   4096
> > @@ -58,7 +60,6 @@ static unsigned long kimage_voffset;
> >   #define PAGE_OFFSET_42  ((0xUL) << 42)
> >   #define PAGE_OFFSET_47  ((0xUL) << 47)
> >   #define PAGE_OFFSET_48  ((0xUL) << 48)
> > -#define PAGE_OFFSET_52   ((0xUL) << 52)
> >
> >   #define pgd_val(x)  ((x).pgd)
> >   #define pud_val(x)  (pgd_val((x).pgd))
> > @@ -218,12 +219,20 @@ pmd_page_paddr(pmd_t pmd)
> >   #define pte_index(vaddr)(((vaddr) >> PAGESHIFT()) & 
> > (PTRS_PER_PTE - 1))
> >   #define pte_offset(dir, vaddr)  (pmd_page_paddr((*dir)) + 
> > pte_index(vaddr) * sizeof(pte_t))
> >
> > +/*
> > + * The linear kernel range starts at the bottom of the virtual address
> > + * space. Testing the top bit for the start of the region is a
> > + * sufficient check and avoids having to worry about the tag.
> > + */
> > +#define is_linear_addr(addr) (flipped_va ?   \
> > + (!((unsigned long)(addr) & (1UL << (vabits_actual - 1 : \
> > + (!!((unsigned long)(addr) & (1UL << (vabits_actual - 1)
> > +
> >   static unsigned long long
> >   __pa(unsigned long vaddr)
> >   {
> > - if (kimage_voffset == NOT_FOUND_NUMBER ||
> > - (vaddr >= PAGE_OFFSET))
> > - return (vaddr - PAGE_OFFSET + info->phys_base);
> > + if (kimage_voffset == NOT_FOUND_NUMBER || is_linear_addr(vaddr))
> > + return ((vaddr & ~PAGE_OFFSET) + info->phys_base);
> >   else
> >   return (vaddr - kimage_voffset);
> >   }
> > @@ -253,6 +262,7 @@ static int calculate_plat_config(void)
> >   (PAGESIZE() == SZ_64K && va_bits == 42)) {
> >   pgtable_level = 2;
> >   } else if ((PAGESIZE() == SZ_64K && va_bits == 48) ||
> > + (PAGESIZE() == SZ_64K && va_bits == 52) ||
> >   (PAGESIZE() == SZ_4K && va_bits == 39) ||
> >   (PAGESIZE() == SZ_16K && va_bits == 47)) {
> >   pgtable_level = 3;
> > @@ -263,6 +273,7 @@ static int calculate_plat_config(void)
> >   PAGESIZE(), va_bits);
> >   return FALSE;
> >   }
> > + DEBUG_MSG("pgtable_level: %d\n", pgtable_level);
> >
> >   return TRUE;
> >   }
> > @@ -383,22 +394,54 @@ get_va_bits_from_stext_arm64(void)
> >   return TRUE;
> >   }
> >
> > +static void
> > +get_page_offset_arm64(void)
> > +{
> > + ulong page_end;
> > + int vabits_min;
> > +
> > + /*
> > +  * See arch/arm64/include/asm/memory.h for more details of
> > +  * the PAGE_OFFSET calculation.
> > +  */
> > + vabits_min = (va_bits > 48) ? 48 : va_bits;
> > + page_end = -(1UL << (vabits_min - 1));
> > +
> > + if (SYMBOL(_stext) > page_end) {
> > + flipped_va = TRUE;
> > + info->page_offset = -(1UL << vabits_actual);
> > + } else {
> > + flipped_va = FALSE;
> > + info->page_offset = -(1UL << (vabits_actual - 1));
> > + }
> > +
> > + DEBUG_MSG("page_offset   : %lx (from page_end check)\n",
> > + info->page_offset);
> > +}
> > +
> >   int
> >   get_machdep_info_arm64(void)
> >   {
> &g

Re: [RFC PATCH 0/3] makedumpfile: about failing on arm64 with kernel > 5.4

2020-11-22 Thread Bhupesh SHARMA
Hi Alexander,

Thanks for the patchset.
I am not sure why this new patchset is needed for makedumpfile
upstream - if you need a separate patchset for Yocto please feel free
to submit it to the Yocto list and Cc us.

However for upstream makedumpfile project these are _probably_ not
required - I have tested my patch on several arm64 boards and it works
fine there.

I will send the next version of my patch once I am back from my
holidays later this week.
If you see any breakage with the same, please feel free to report here
with relevant logs and I can help further

Thanks,
Bhupesh

On Mon, Nov 23, 2020 at 10:10 AM Alexander Kamensky
 wrote:
>
> Hi Kazu, Bhupesh,
>
> I am hitting the linear mapping swap issue with makedumpfile failing on
> arm64 Yocto Project qemuarm64 machine with 5.8 kernel as it was discussed
> several times on this mailing list:
>
> root@qemuarm64:~# makedumpfile -c -F /proc/vmcore > /dev/null
> readpage_elf: Attempt to read non-existent page at 0x0.
> readmem: type_addr: 1, addr:440, size:8
> vaddr_to_paddr_arm64: Can't read pmd
> readmem: Can't convert a virtual address(ffc01107f94c) to physical 
> address.
> readmem: type_addr: 0, addr:ffc01107f94c, size:390
> check_release: Can't get the address of system_utsname.
>
> I've have tried Bhupesh's remaining third patch [1] from [2] series,
> it does help. But I am a bit hesitant to submit it to the Yocto Project,
> since Kazu pointed out [3] that this patch uses current kernel version to
> make decision how __pa is handled and it may mismatch the version where
> vmcore was collected, and in such case it may not operate correctly.
>
> In this RFC series I have tried to implement Kazu's suggestion and use
> kernel version retrieved from OSRELEASE string from vmcoreinfo note. I
> wonder whether it will help to merge arm64 5.4+ makedumpfile fix? Is
> there anything else outstanding that prevents such merge?
>
> My RFC patches series does include Bhupesh's patch [1], and I posted
> my modifications on top of it as separate patch for readability.
>
> Thanks,
> Alexander
>
> [1] http://lists.infradead.org/pipermail/kexec/2020-September/021336.html
> [2] http://lists.infradead.org/pipermail/kexec/2020-September/021333.html
> [3] http://lists.infradead.org/pipermail/kexec/2020-September/021488.html
>
> Alexander Kamensky (2):
>   added way to determine kernel version that vmcore is from
>   arm64: use kernel version from OSRELEASE to determine linear mapping
> position
>
> Bhupesh Sharma (1):
>   makedumpfile/arm64: Add support for ARMv8.2-LVA (52-bit kernel VA
> support)
>
>  arch/arm64.c   | 229 ++---
>  common.h   |  10 +++
>  makedumpfile.c |  23 +
>  makedumpfile.h |   6 +-
>  4 files changed, 234 insertions(+), 34 deletions(-)
>
> --
> 2.26.2
>
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Facing issue with va to pa conversion in arm64/makedumpfile

2020-11-20 Thread Bhupesh Sharma
Hello Akshay,

As Kazu mentioned please try the patch :
http://lists.infradead.org/pipermail/kexec/2020-September/021336.html
and let me know your observations.

I am still on my holidays but will try to check my emails from today,
so will help you with any further issues.

Thanks,
Bhupesh

On Thu, Nov 19, 2020 at 5:56 AM HAGIO KAZUHITO(萩尾 一仁)
 wrote:
>
> Hi Akshay,
> (Cc Bhupesh)
>
> -Original Message-
> > Hi Pratyush/ Kazuhito ,
> >
> > While attempting to create dumpfile on arm64, I was face issues with
> > virtual address to physical address conversion when kalsr is enable.
> > Linux kernel version is 5.4.42 stable kernel.
>
> It seems that it would be due to the flipped VA space at 5.4 arm64 kernel.
> Bhupesh has been addressing this for us, but not merged yet.
> How does it work with this patch?
> http://lists.infradead.org/pipermail/kexec/2020-September/021336.html
>
> (Its 1/3 and 2/3 patches were already merged, so I think the 3/3 patch
> can be applied on top of the latest makedumpfile.)
>
> Thanks,
> Kazu
>
> >
> > The following patch seems to add patch to enable kalsr support :
> >
> > [PATCH 10/10] arm64: fix memory layout as per changes in v4.6 kernel
> >
> > https://sourceforge.net/p/makedumpfile/code/ci/b6fe70c7ffef9affb540412407702b15d4a196e9
> >
> >
> > static unsigned long long
> > +__pa(unsigned long vaddr)
> > +{
> > +   if (kimage_voffset == NOT_FOUND_NUMBER ||
> > +   (vaddr >= PAGE_OFFSET))
> > +   return (vaddr - PAGE_OFFSET + info->phys_base);
> > +   else
> > +   return (vaddr - kimage_voffset);
> > +}
> >
> >
> > I see that even when kimage_voffset is available , it calculates pa
> > using PAGE_OFFSET.
> > But as KALSR is enabled it should ideally use kimage_voffset.
> >
> > Not sure what is the significance of the check (vaddr >= PAGE_OFFSET) if
> > kimage_voffset is available.
> >
> > i think this should be something like
> >
> > +   if (kimage_voffset == NOT_FOUND_NUMBER) && (vaddr >=
> > PAGE_OFFSET))
> > +   return (vaddr - PAGE_OFFSET + info->phys_base);
> > +   else
> > +   return (vaddr - kimage_voffset);
> >
> >
> >
> > Because we want to check vaddr is greater than PAGE_OFFSET when we are
> > considering linear mapping without kaslr seed.
> >
> > Or may be we should have a separate check which will error out all vaddr
> > less than PAGE_OFFSET and kimage_voffset.
> >
> > +   if  (vaddr < PAGE_OFFSET)
> > +  return -EINVAL;
> > +   if (kimage_voffset == NOT_FOUND_NUMBER)
> > +   return (vaddr - PAGE_OFFSET + info->phys_base);
> > +   else
> > +   return (vaddr - kimage_voffset);
> >
> > --
> > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
> > Forum,
> > a Linux Foundation Collaborative Project
> >
> > ___
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/3] makedumpfile/arm64: Add support for ARMv8.2 extensions

2020-11-11 Thread Bhupesh Sharma
 > 1. CPUs which don't support ARMv8.2 features, e.g. qualcomm-amberwing,
> > >ampere-osprey.
> > > 2. Prototype models which support ARMv8.2 extensions (e.g. ARMv8 FVP
> > >simulation model).
> > >
> > > Also a preparation patch has been added in this patchset which adds a
> > > common feature for archs (except arm64, for which similar support is
> > > added via subsequent patch) to retrieve 'MAX_PHYSMEM_BITS' from
> > > vmcoreinfo (if available).
> > >
> > > This patchset ensures backward compatibility for kernel versions in
> > > which 'TCR_EL1.T1SZ' and 'MAX_PHYSMEM_BITS' are not available in
> > > vmcoreinfo.
> > >
> > > In the newer kernels (>= 5.4.0) the following patches export these
> > > variables in the vmcoreinfo:
> > >  - 1d50e5d0c505 ("crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to 
> > > vmcoreinfo")
> > >  - bbdbc11804ff ("arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo")
> > >
> > > Cc: John Donnelly 
> > > Cc: Kazuhito Hagio 
> > > Cc: kexec@lists.infradead.org
> > >
> > > Bhupesh Sharma (3):
> > >   tree-wide: Retrieve 'MAX_PHYSMEM_BITS' from vmcoreinfo (if available)
> > >   makedumpfile/arm64: Add support for ARMv8.2-LPA (52-bit PA support)
> > >   makedumpfile/arm64: Add support for ARMv8.2-LVA (52-bit kernel VA
> > > support)
> > >
> > >  arch/arm.c |   8 +-
> > >  arch/arm64.c   | 520 ++---
> > >  arch/ia64.c|   7 +-
> > >  arch/ppc.c |   8 +-
> > >  arch/ppc64.c   |  49 +++--
> > >  arch/s390x.c   |  29 +--
> > >  arch/sparc64.c |   9 +-
> > >  arch/x86.c |  34 ++--
> > >  arch/x86_64.c  |  27 +--
> > >  common.h   |  10 +
> > >  makedumpfile.c |   4 +-
> > >  makedumpfile.h |   6 +-
> > >  12 files changed, 529 insertions(+), 182 deletions(-)
> > >
> > > --
> > > 2.26.2
> >
> >
> > ___
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

2020-11-11 Thread Bhupesh SHARMA
Hi Chen,

On Wed, Nov 11, 2020 at 7:05 PM chenzhou  wrote:
>
> Hi Baoquan, Bhupesh,
>
>
> On 2020/11/11 11:01, Baoquan He wrote:
> > Hi Zhou, Bhupesh
> >
> > On 10/31/20 at 03:44pm, Chen Zhou wrote:
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> >> will fail when there is no enough low memory.
> >> 2. If reserving crashkernel above 4G, in this case, crash dump
> >> kernel will boot failure because there is no low memory available
> >> for allocation.
> >> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> >> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> >> the devices in crash dump kernel need to use ZONE_DMA will alloc
> >> fail.
> > I went through this patchset, mainly the x86 related and generic
> > changes, the changes look great and no risk. And I know Bhupesh is
> > following up this and helping review, thanks, both.
> >
> > So you have also tested crashkernel reservation on x86_64, with the
> > normal reservation, and high/low reservation, it is working well,
> > right? Asking this because I didn't see the test result description, and
> > just note it.
>
> Yeah, i also tested on x86_64 and work well. I did these basic tests before 
> sending every
> new version.
> But Bhupesh may have some review comments(Bhupesh referred one month ago).

Sorry for the late response. I was caught up in some other urgent
issues. I have just started reviewing
this series and will have more updates in a day or two. I am also
testing the same on x86_64 and arm64 machines and will share the test
observations soon as well.

Thanks for your patience.
Regards,
Bhupesh

> >> To solve these issues, change the behavior of crashkernel=X.
> >> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
> >> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
> >>
> >> We can also use "crashkernel=X,high" to select a high region above
> >> DMA zone, which also tries to allocate at least 256M low memory in
> >> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> >> "crashkernel=Y,low" can be used to allocate specified size low memory.
> >>
> >> When reserving crashkernel in high memory, some low memory is reserved
> >> for crash dump kernel devices. So there may be two regions reserved for
> >> crash dump kernel.
> >> In order to distinct from the high region and make no effect to the use
> >> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> >> and pass the low region by reusing DT property
> >> "linux,usable-memory-range". We made the low memory region as the last
> >> range of "linux,usable-memory-range" to keep compatibility with existing
> >> user-space and older kdump kernels.
> >>
> >> Besides, we need to modify kexec-tools:
> >> arm64: support more than one crash kernel regions(see [1])
> >>
> >> Another update is document about DT property 'linux,usable-memory-range':
> >> schemas: update 'linux,usable-memory-range' node schema(see [2])
> >>
> >> This patchset contains the following eight patches:
> >> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> >> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> >> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> >> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> >> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> >> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> >> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> >> 0008-kdump-update-Documentation-about-crashkernel.patch
> >>
> >> 0001-0003 are some x86 cleanups which prepares for making
> >> functionsreserve_crashkernel[_low]() generic.
> >> 0004 makes functions reserve_crashkernel[_low]() generic.
> >> 0005-0006 reimplements arm64 crashkernel=X.
> >> 0007 adds memory for devices by DT property linux,usable-memory-range.
> >> 0008 updates the doc.
> >>
> >> Changes since [v12]
> >> - Rebased on top of 5.10-rc1.
> >> - Keep CRASH_ALIGN as 16M suggested by Dave.
> >> - Drop patch "kdump: add threshold for the required memory".
> >> - Add Tested-by from John.
> >>
> >> Changes since [v11]
> >> - Rebased on top of 5.9-rc4.
> >> - Make the function reserve_crashkernel() of x86 generic.
> >> Suggested by Catalin, make the function reserve_crashkernel() of x86 
> >> generic
> >> and arm64 use the generic version to reimplement crashkernel=X.
> >>
> >> Changes since [v10]
> >> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> >>
> >> Changes since [v9]
> >> - Patch 1 add Acked-by from Dave.
> >> - Update patch 5 according to Dave's comments.
> >> - Update chosen schema.
> >>
> >> Changes since [v8]
> >> - Reuse DT property "linux,usable-memory-range".
> >> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass 
> >> the low
> >> memory region.
> >> - Fix kdump broken with ZONE_DMA 

Re: [MAKDUMPFILE PATCH] Add option to estimate the size of vmcore dump files

2020-10-13 Thread Bhupesh Sharma
Hello Julien,

On Tue, Oct 13, 2020 at 3:23 PM Julien Thierry  wrote:
>
> Hi Bhupesh,
>
> On 10/13/20 10:27 AM, Bhupesh Sharma wrote:
> > Hello Julien,
> >
> > Thanks for the patch. Some nitpicks inline:
> >
> > On Mon, Oct 12, 2020 at 12:39 PM Julien Thierry  wrote:
> >>
> >> A user might want to know how much space a vmcore file will take on
> >> the system and how much space on their disk should be available to
> >> save it during a crash.
> >>
> >> The option --vmcore-size does not create the vmcore file but provides
> >> an estimation of the size of the final vmcore file created with the
> >> same make dumpfile options.
> >>
> >> Signed-off-by: Julien Thierry 
> >> Cc: Kazuhito Hagio 
> >> ---
> >>   makedumpfile.c | 98 --
> >>   makedumpfile.h | 12 +++
> >>   print_info.c   |  4 +++
> >>   3 files changed, 111 insertions(+), 3 deletions(-)
> >
> > Please update 'makedumpfile.8' as well in v2, so that the man page can
> > document the newly added option and how to use it to determine the
> > vmcore-size.
> >
>
> Ah yes, I'll do that.
>
> >> diff --git a/makedumpfile.c b/makedumpfile.c
> >> index 4c4251e..0a2bfba 100644
> >> --- a/makedumpfile.c
> >> +++ b/makedumpfile.c
> >> @@ -26,6 +26,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >
> > I know we don't follow alphabetical order for include files in
> > makedumpfile code, but it would be good to place the new - ones
> > accordingly. So  can go with  here.
> >
>
> Noted.
>
> >>   struct symbol_tablesymbol_table;
> >>   struct size_table  size_table;
> >> @@ -1366,7 +1367,25 @@ open_dump_file(void)
> >>  if (!info->flag_force)
> >>  open_flags |= O_EXCL;
> >>
> >> -   if (info->flag_flatten) {
> >> +   if (info->flag_vmcore_size) {
> >> +   char *namecpy;
> >> +   struct stat statbuf;
> >> +   int res;
> >> +
> >> +   namecpy = strdup(info->name_dumpfile ?
> >> +info->name_dumpfile : ".");
> >> +
> >> +   res = stat(dirname(namecpy), );
> >> +   free(namecpy);
> >> +   if (res != 0)
> >> +   return FALSE;
> >> +
> >> +   fd = -1;
> >> +   info->dumpsize_info.blksize = statbuf.st_blksize;
> >> +   info->dumpsize_info.block_buff_size = BASE_NUM_BLOCKS;
> >> +   info->dumpsize_info.block_info = calloc(BASE_NUM_BLOCKS, 
> >> 1);
> >> +   info->dumpsize_info.non_hole_blocks = 0;
> >> +   } else if (info->flag_flatten) {
> >>  fd = STDOUT_FILENO;
> >>  info->name_dumpfile = filename_stdout;
> >>  } else if ((fd = open(info->name_dumpfile, open_flags,
> >> @@ -1384,6 +1403,9 @@ check_dump_file(const char *path)
> >>   {
> >>  char *err_str;
> >>
> >> +   if (info->flag_vmcore_size)
> >> +   return TRUE;
> >> +
> >>  if (access(path, F_OK) != 0)
> >>  return TRUE; /* File does not exist */
> >>  if (info->flag_force) {
> >> @@ -4622,6 +4644,47 @@ write_and_check_space(int fd, void *buf, size_t 
> >> buf_size, char *file_name)
> >>  return TRUE;
> >>   }
> >>
> >> +static int
> >> +write_buffer_update_size_info(off_t offset, void *buf, size_t buf_size)
> >> +{
> >> +   struct dumpsize_info *dumpsize_info = >dumpsize_info;
> >> +   int blk_end_idx = (offset + buf_size - 1) / dumpsize_info->blksize;
> >> +   int i;
> >> +
> >> +   /* Need to grow the dumpsize block buffer? */
> >> +   if (blk_end_idx >= dumpsize_info->block_buff_size) {
> >> +   int alloc_size = MAX(blk_end_idx - 
> >> dumpsize_info->block_buff_size, BASE_NUM_BLOCKS);
> >> +
> >> +   dumpsize_info->block_info = 
> >> realloc(dumpsize_info->block_info,
> >> +   
> >> dumpsize_info->block_buff_size + alloc_size);
> >> 

Re: [MAKDUMPFILE PATCH] Add option to estimate the size of vmcore dump files

2020-10-13 Thread Bhupesh Sharma
Hello Julien,

Thanks for the patch. Some nitpicks inline:

On Mon, Oct 12, 2020 at 12:39 PM Julien Thierry  wrote:
>
> A user might want to know how much space a vmcore file will take on
> the system and how much space on their disk should be available to
> save it during a crash.
>
> The option --vmcore-size does not create the vmcore file but provides
> an estimation of the size of the final vmcore file created with the
> same make dumpfile options.
>
> Signed-off-by: Julien Thierry 
> Cc: Kazuhito Hagio 
> ---
>  makedumpfile.c | 98 --
>  makedumpfile.h | 12 +++
>  print_info.c   |  4 +++
>  3 files changed, 111 insertions(+), 3 deletions(-)

Please update 'makedumpfile.8' as well in v2, so that the man page can
document the newly added option and how to use it to determine the
vmcore-size.

> diff --git a/makedumpfile.c b/makedumpfile.c
> index 4c4251e..0a2bfba 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

I know we don't follow alphabetical order for include files in
makedumpfile code, but it would be good to place the new - ones
accordingly. So  can go with  here.

>  struct symbol_tablesymbol_table;
>  struct size_table  size_table;
> @@ -1366,7 +1367,25 @@ open_dump_file(void)
> if (!info->flag_force)
> open_flags |= O_EXCL;
>
> -   if (info->flag_flatten) {
> +   if (info->flag_vmcore_size) {
> +   char *namecpy;
> +   struct stat statbuf;
> +   int res;
> +
> +   namecpy = strdup(info->name_dumpfile ?
> +info->name_dumpfile : ".");
> +
> +   res = stat(dirname(namecpy), );
> +   free(namecpy);
> +   if (res != 0)
> +   return FALSE;
> +
> +   fd = -1;
> +   info->dumpsize_info.blksize = statbuf.st_blksize;
> +   info->dumpsize_info.block_buff_size = BASE_NUM_BLOCKS;
> +   info->dumpsize_info.block_info = calloc(BASE_NUM_BLOCKS, 1);
> +   info->dumpsize_info.non_hole_blocks = 0;
> +   } else if (info->flag_flatten) {
> fd = STDOUT_FILENO;
> info->name_dumpfile = filename_stdout;
> } else if ((fd = open(info->name_dumpfile, open_flags,
> @@ -1384,6 +1403,9 @@ check_dump_file(const char *path)
>  {
> char *err_str;
>
> +   if (info->flag_vmcore_size)
> +   return TRUE;
> +
> if (access(path, F_OK) != 0)
> return TRUE; /* File does not exist */
> if (info->flag_force) {
> @@ -4622,6 +4644,47 @@ write_and_check_space(int fd, void *buf, size_t 
> buf_size, char *file_name)
> return TRUE;
>  }
>
> +static int
> +write_buffer_update_size_info(off_t offset, void *buf, size_t buf_size)
> +{
> +   struct dumpsize_info *dumpsize_info = >dumpsize_info;
> +   int blk_end_idx = (offset + buf_size - 1) / dumpsize_info->blksize;
> +   int i;
> +
> +   /* Need to grow the dumpsize block buffer? */
> +   if (blk_end_idx >= dumpsize_info->block_buff_size) {
> +   int alloc_size = MAX(blk_end_idx - 
> dumpsize_info->block_buff_size, BASE_NUM_BLOCKS);
> +
> +   dumpsize_info->block_info = realloc(dumpsize_info->block_info,
> +   
> dumpsize_info->block_buff_size + alloc_size);
> +   if (!dumpsize_info->block_info) {
> +   ERRMSG("Not enough memory\n");
> +   return FALSE;
> +   }
> +
> +   memset(dumpsize_info->block_info + 
> dumpsize_info->block_buff_size,
> +  0, alloc_size);
> +   dumpsize_info->block_buff_size += alloc_size;
> +   }
> +
> +   for (i = 0; i < buf_size; ++i) {
> +   int blk_idx = (offset + i) / dumpsize_info->blksize;
> +
> +   if (dumpsize_info->block_info[blk_idx]) {
> +   i += dumpsize_info->blksize;
> +   i = i - (i % dumpsize_info->blksize) - 1;
> +   continue;
> +   }
> +
> +   if (((char *) buf)[i] != 0) {
> +   dumpsize_info->non_hole_blocks++;
> +   dumpsize_info->block_info[blk_idx] = 1;
> +   }
> +   }
> +
> +   return TRUE;
> +}
> +
>  int
>  write_buffer(int fd, off_t offset, void *buf, size_t buf_size, char 
> *file_name)
>  {
> @@ -4643,6 +4706,8 @@ write_buffer(int fd, off_t offset, void *buf, size_t 
> buf_size, char *file_name)
> }
> if (!write_and_check_space(fd, , sizeof(fdh), file_name))
> return FALSE;
> +   } else if (info->flag_vmcore_size && fd == info->fd_dumpfile) {
> +   return write_buffer_update_size_info(offset, buf, buf_size);
> } else {
> 

Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-07 Thread Bhupesh Sharma
Hi Catalin,

On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas  wrote:
>
> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
> > I think my earlier email with the test results on this series bounced
> > off the mailing list server (for some weird reason), but I still see
> > several issues with this patchset. I will add specific issues in the
> > review comments for each patch again, but overall, with a crashkernel
> > size of say 786M, I see the following issue:
> >
> > # cat /proc/cmdline
> > BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> 
> > rd.lvm.lv=<..snip..> crashkernel=786M
> >
> > I see two regions of size 786M and 256M reserved in low and high
> > regions respectively, So we reserve a total of 1042M of memory, which
> > is an incorrect behaviour:
> >
> > # dmesg | grep -i crash
> > [0.00] Reserving 256MB of low memory at 2816MB for crashkernel 
> > (System low RAM: 768MB)
> > [0.00] Reserving 786MB of memory at 654158MB for crashkernel 
> > (System RAM: 130816MB)
> > [0.00] Kernel command line: 
> > BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ 
> > root=/dev/mapper/rhel_ampere--hr330a--03-root ro 
> > rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap 
> > crashkernel=786M cma=1024M
> >
> > # cat /proc/iomem | grep -i crash
> >   b000-bfff : Crash kernel (low)
> >   bfcbe0-bffcff : Crash kernel
>
> As Chen said, that's the intended behaviour and how x86 works. The
> requested 768M goes in the high range if there's not enough low memory
> and an additional buffer for swiotlb is allocated, hence the low 256M.

I understand, but why 256M (as low) for arm64? x86_64 setups usually
have more system memory available as compared to several commercially
available arm64 setups. So is the intent, just to keep the behavior
similar between arm64 and x86_64?

Should we have a CONFIG option / bootarg to help one select the max
'low_size'? Currently the ' low_size' value is calculated as:

/*
 * two parts from kernel/dma/swiotlb.c:
 * -swiotlb size: user-specified with swiotlb= or default.
 *
 * -swiotlb overflow buffer: now hardcoded to 32k. We round it
 * to 8M for other buffers that may need to stay low too. Also
 * make sure we allocate enough extra low memory so that we
 * don't run out of DMA buffers for 32-bit devices.
 */
low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);

Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
bootargs, the low_size, still ends up being 256M in such cases,
whereas this 256M can be used for some other purposes - so should we
be limiting this to 64M and failing the crash kernel allocation
request (gracefully) otherwise?

> We could (as an additional patch), subtract the 256M from the high
> allocation so that you'd get a low 256M and a high 512M, not sure it's
> worth it. Note that with a "crashkernel=768M,high" option, you still get
> the additional low 256M, otherwise the crashkernel won't be able to
> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
> case, I'm not sure subtracted the 256M is more intuitive.

> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
> to cover the entire 32-bit address space (i.e. identical to the current
> ZONE_DMA32).
>
> > IMO, we should test this feature more before including this in 5.11
>
> Definitely. That's one of the reasons we haven't queued it yet. So any
> help with testing here is appreciated.

Sure, I am running more checks on this series. I will be soon back
with more updates.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-05 Thread Bhupesh Sharma
Hi Catalin, Chen,

On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas  wrote:
>
> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
> > On 9/7/20 8:47 AM, Chen Zhou wrote:
> > > Chen Zhou (9):
> > >x86: kdump: move CRASH_ALIGN to 2M
> > >x86: kdump: make the lower bound of crash kernel reservation
> > >  consistent
> > >x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
> > >  reserve_crashkernel[_low]()
> > >x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
> > >arm64: kdump: introduce some macroes for crash kernel reservation
> > >arm64: kdump: reimplement crashkernel=X
> > >kdump: add threshold for the required memory
> > >arm64: kdump: add memory for devices by DT property
> > >  linux,usable-memory-range
> > >kdump: update Documentation about crashkernel
> [...]
> > I did a brief unit-test on 5.9-rc4.
> >
> > Please add:
> >
> > Tested-by:  John Donnelly 
>
> Thanks for testing.
>
> > This activity is over a year old. It needs accepted.
>
> It's getting there, hopefully in 5.11. There are some minor tweaks to
> address.

I think my earlier email with the test results on this series bounced
off the mailing list server (for some weird reason), but I still see
several issues with this patchset. I will add specific issues in the
review comments for each patch again, but overall, with a crashkernel
size of say 786M, I see the following issue:

# cat /proc/cmdline
BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
rd.lvm.lv=<..snip..> crashkernel=786M

I see two regions of size 786M and 256M reserved in low and high
regions respectively, So we reserve a total of 1042M of memory, which
is an incorrect behaviour:

# dmesg | grep -i crash
[0.00] Reserving 256MB of low memory at 2816MB for crashkernel
(System low RAM: 768MB)
[0.00] Reserving 786MB of memory at 654158MB for crashkernel
(System RAM: 130816MB)
[0.00] Kernel command line:
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
root=/dev/mapper/rhel_ampere--hr330a--03-root ro
rd.lvm.lv=rhel_ampere-hr330a-03/root
rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M

# cat /proc/iomem | grep -i crash
  b000-bfff : Crash kernel (low)
  bfcbe0-bffcff : Crash kernel

IMO, we should test this feature more before including this in 5.11

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4] arm64: Add purgatory printing

2020-10-02 Thread Bhupesh Sharma
No valid console found for %s\n",
> +   __func__, device);
> +   return 0;
> +   }
> +
> +   ret = snprintf(mem, sizeof(mem), "%s%s", device, "/iomem_base");
> +   if (ret < 0 || ret >= sizeof(mem)) {
> +   fprintf(stderr, "snprintf failed: %s\n", strerror(errno));
> +   return 0;
> +   }
> +
> +   printf("console memory read from %s\n", mem);
> +
> +   fd = open(mem, O_RDONLY);
> +   if (fd < 0) {
> +   fprintf(stderr, "kexec: %s: No able to open %s\n",
> +   __func__, mem);
> +   return 0;
> +   }
> +
> +   memset(buffer, '\0', sizeof(buffer));
> +   ret = read(fd, buffer, sizeof(buffer));
> +   if (ret < 0) {
> +   fprintf(stderr, "kexec: %s: not able to read fd\n", __func__);
> +   close(fd);
> +   return 0;
> +   }
> +
> +   sscanf(buffer, "%lx", );
> +   printf("console memory is at %#lx\n", iomem);
> +
> +   close(fd);
> +   return iomem;
> +}
> +
>  /**
>   * struct dtb - Info about a binary device tree.
>   *
> @@ -637,6 +699,7 @@ int arm64_load_other_segments(struct kexec_info *info,
> unsigned long hole_min;
> unsigned long hole_max;
> unsigned long initrd_end;
> +   uint64_t purgatory_sink;
> char *initrd_buf = NULL;
> struct dtb dtb;
> char command_line[COMMAND_LINE_SIZE] = "";
> @@ -654,6 +717,11 @@ int arm64_load_other_segments(struct kexec_info *info,
> command_line[sizeof(command_line) - 1] = 0;
> }
>
> +   purgatory_sink = find_purgatory_sink(arm64_opts.console);
> +
> +   dbgprintf("%s:%d: purgatory sink: 0x%" PRIx64 "\n", __func__, 
> __LINE__,
> +   purgatory_sink);
> +
> if (arm64_opts.dtb) {
> dtb.name = "dtb_user";
> dtb.buf = slurp_file(arm64_opts.dtb, );
> @@ -742,6 +810,9 @@ int arm64_load_other_segments(struct kexec_info *info,
>
> info->entry = (void *)elf_rel_get_addr(>rhdr, 
> "purgatory_start");
>
> +   elf_rel_set_symbol(>rhdr, "arm64_sink", _sink,
> +   sizeof(purgatory_sink));
> +
> elf_rel_set_symbol(>rhdr, "arm64_kernel_entry", _base,
> sizeof(image_base));
>
> diff --git a/purgatory/arch/arm64/purgatory-arm64.c 
> b/purgatory/arch/arm64/purgatory-arm64.c
> index fe50fcf..b4d8578 100644
> --- a/purgatory/arch/arm64/purgatory-arm64.c
> +++ b/purgatory/arch/arm64/purgatory-arm64.c
> @@ -5,15 +5,30 @@
>  #include 
>  #include 
>
> +/* Symbols set by kexec. */
> +
> +uint8_t *arm64_sink __attribute__ ((section ("data")));
> +extern void (*arm64_kernel_entry)(uint64_t, uint64_t, uint64_t, uint64_t);
> +extern uint64_t arm64_dtb_addr;
> +
>  void putchar(int ch)
>  {
> -   /* Nothing for now */
> +   if (!arm64_sink)
> +   return;
> +
> +   *arm64_sink = ch;
> +
> +   if (ch == '\n')
> +   *arm64_sink = '\r';
>  }
>
>  void post_verification_setup_arch(void)
>  {
> +   printf("purgatory: booting kernel now\n");
>  }
>
>  void setup_arch(void)
>  {
> +   printf("purgatory: entry=%lx\n", (unsigned long)arm64_kernel_entry);
> +   printf("purgatory: dtb=%lx\n", arm64_dtb_addr);
>  }
> --
> 2.28.0

Looks good to me, so:
Acked-by: Bhupesh Sharma 

Thanks.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec/arm64: Add support for ARMv8.2 (large space addressing) 52-bit VA extensions

2020-09-28 Thread Bhupesh Sharma
Hello Simon,

Thanks for your review. Please see my comments in-line:

On Fri, Sep 25, 2020 at 11:37 AM Simon Horman  wrote:
>
> Hi Bhupesh,
>
> thanks for your patch.
>
> ...
>
> > +static int get_vabits_actual_from_id_aa64mmfr2_el1(void)
> > +{
> > + int l_vabits_actual;
> > + unsigned long val;
> > +
> > + /* Check if ID_AA64MMFR2_EL1 CPU-ID register indicates
> > +  * ARMv8.2/LVA support:
> > +  * VARange, bits [19:16]
> > +  *   From ARMv8.2:
> > +  *   Indicates support for a larger virtual address.
> > +  *   Defined values are:
> > +  * 0b VMSAv8-64 supports 48-bit VAs.
> > +  * 0b0001 VMSAv8-64 supports 52-bit VAs when using the 64KB
> > +  *page size. The other translation granules support
> > +  *48-bit VAs.
> > +  *
> > +  * See ARMv8 ARM for more details.
> > +  */
> > + if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
>
> Likely my build environment wants updating, but the above does not seem
> to build in my (aged) environment.
>
> $ aarch64-linux-gnu-gcc --version
> aarch64-linux-gnu-gcc (Linaro GCC 7.2-2017.11) 7.2.1 20171011
> Copyright (C) 2017 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ aarch64-linux-gnu-gcc -Wall -Wextra -Wpointer-arith -Wwrite-strings 
> -Wformat -O2 -fomit-frame-pointer -pipe -fno-strict-aliasing -Wall 
> -Wstrict-prototypes 
> -I/home/horms/local/opt/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/include
>  -I./include -I./util_lib/include -Iinclude/ -I ./kexec/ -I./kexec/libfdt 
> -I./kexec/arch/arm64/include  -c -MD -o kexec/arch/arm64/kexec-uImage-arm64.o 
> kexec/arch/arm64/kexec-uImage-arm64.c
> kexec/arch/arm64/common-arm64.c: In function 
> ‘get_vabits_actual_from_id_aa64mmfr2_el1’:
> kexec/arch/arm64/common-arm64.c:133:30: error: ‘HWCAP_CPUID’ undeclared 
> (first use in this function); did you mean ‘HWCAP_CRC32’?
>   if (!(getauxval(AT_HWCAP) & HWCAP_CPUID)) {
>
> ...

Ok, let me have a look at the compilation issue with aarch64-gcc-7
version. I will soon back with updates.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 1/1] kdump: append uts_namespace.name offset to VMCOREINFO

2020-09-24 Thread Bhupesh Sharma
Hi Alexander,

On Thu, Sep 24, 2020 at 6:18 PM Alexander Egorenkov
 wrote:
>
> The offset of the field 'init_uts_ns.name' has changed
> since
>
> commit 9a56493f6942c0e2df1579986128721da96e00d8
> Author: Kirill Tkhai 
> Date:   Mon Aug 3 13:16:21 2020 +0300
>
> uts: Use generic ns_common::count

A minor nitpick:
You can add the following line to your .gitconfig:
one = show -s --pretty='format:%h (\"%s\")'

running a command '$ git one ' will then give you an
abbreviated form to be used while referring to existing git commits in
the log message. For e.g. in this case, the output would be something
like:

$ git one 9a56493f6942c0e2df1579986128721da96e00d8
9a56493f6942 ("uts: Use generic ns_common::count")

Then you can use '9a56493f6942 ("uts: Use generic ns_common::count")'
to refer to an existing upstream patch in the log message.

But I think this can be fixed while applying the patch (if there are
no further revisions required).

> Link: 
> https://lore.kernel.org/r/159644978167.604812.1773586504374412107.stgit@localhost.localdomain
>
> Make the offset of the field 'uts_namespace.name' available
> in VMCOREINFO because tools like 'crash-utility' and
> 'makedumpfile' must be able to read it from crash dumps.
>
> Signed-off-by: Alexander Egorenkov 
> ---
>  kernel/crash_core.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..173fdc261882 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -447,6 +447,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> VMCOREINFO_PAGESIZE(PAGE_SIZE);
>
> VMCOREINFO_SYMBOL(init_uts_ns);
> +   VMCOREINFO_OFFSET(uts_namespace, name);
> VMCOREINFO_SYMBOL(node_online_map);
>  #ifdef CONFIG_MMU
> VMCOREINFO_SYMBOL_ARRAY(swapper_pg_dir);
> --
> 2.26.2

Thanks for making the changes we discussed in the v1 review. Otherwise
the patch looks fine to me, so:

Reviewed-by: Bhupesh Sharma 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec: Fix snprintf related compilation warnings

2020-09-23 Thread Bhupesh Sharma
This patch fixes the following snprintf related compilation warning
seen currently with gcc versions 7 and 8 when kexec is compiled with
-Wformat-truncation option:

kexec/fs2dt.c:673:34: warning: ‘stdout-path’ directive output may be 
truncated writing 11 bytes into a region of size between 1 and 1024 
[-Wformat-truncation=]
   snprintf(filename, MAXPATH, "%sstdout-path", pathname);
  ^~~
kexec/fs2dt.c:673:3: note: ‘snprintf’ output between 12 and 1035 bytes into 
a destination of size 1024
   snprintf(filename, MAXPATH, "%sstdout-path", pathname);
   ^~
kexec/fs2dt.c:676:35: warning: ‘linux,stdout-path’ directive output may be 
truncated writing 17 bytes into a region of size between 1 and 1024 
[-Wformat-truncation=]
snprintf(filename, MAXPATH, "%slinux,stdout-path", pathname);
   ^
kexec/fs2dt.c:676:4: note: ‘snprintf’ output between 18 and 1041 bytes into 
a destination of size 1024
snprintf(filename, MAXPATH, "%slinux,stdout-path", pathname);
^~~~

kexec/firmware_memmap.c:132:35: warning: ‘%s’ directive output may be 
truncated writing 5 bytes into a region of size between 0 and 4095 
[-Wformat-truncation=]
  snprintf(filename, PATH_MAX, "%s/%s", entry, "start");
   ^~  ~~~
kexec/firmware_memmap.c:132:2: note: ‘snprintf’ output between 7 and 4102 
bytes into a destination of size 4096
  snprintf(filename, PATH_MAX, "%s/%s", entry, "start");
  ^
kexec/firmware_memmap.c:142:35: warning: ‘%s’ directive output may be 
truncated writing 3 bytes into a region of size between 0 and 4095 
[-Wformat-truncation=]
  snprintf(filename, PATH_MAX, "%s/%s", entry, "end");
   ^~  ~
kexec/firmware_memmap.c:142:2: note: ‘snprintf’ output between 5 and 4100 
bytes into a destination of size 4096
  snprintf(filename, PATH_MAX, "%s/%s", entry, "end");
  ^~~
kexec/firmware_memmap.c:152:35: warning: ‘%s’ directive output may be 
truncated writing 4 bytes into a region of size between 0 and 4095 
[-Wformat-truncation=]
  snprintf(filename, PATH_MAX, "%s/%s", entry, "type");
   ^~  ~~
kexec/firmware_memmap.c:152:2: note: ‘snprintf’ output between 6 and 4101 
bytes into a destination of size 4096
  snprintf(filename, PATH_MAX, "%s/%s", entry, "type");
  ^~~~

Since the simplest method to address the gcc warnings and possible
truncation would be to check the return value provided from snprintf
(well there are other methods like using 'asnprintf' or using
'open_memstream' function to create the FILE object, but these are more
intrusive), so this patch does the same.

Cc: Simon Horman 
Cc: Eric Biederman 
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 kexec/firmware_memmap.c | 22 +++---
 kexec/fs2dt.c   | 16 +---
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/kexec/firmware_memmap.c b/kexec/firmware_memmap.c
index 1ee214aa9316..457c3dc9a608 100644
--- a/kexec/firmware_memmap.c
+++ b/kexec/firmware_memmap.c
@@ -125,11 +125,17 @@ static int parse_memmap_entry(const char *entry, struct 
memory_range *range)
 {
char filename[PATH_MAX];
char *type;
+   int ret;
 
/*
 * entry/start
 */
-   snprintf(filename, PATH_MAX, "%s/%s", entry, "start");
+   ret = snprintf(filename, PATH_MAX, "%s/%s", entry, "start");
+   if (ret < 0 || ret >= PATH_MAX) {
+   fprintf(stderr, "snprintf failed: %s\n", strerror(errno));
+   return -1;
+   }
+
filename[PATH_MAX-1] = 0;
 
range->start = parse_numeric_sysfs(filename);
@@ -139,7 +145,12 @@ static int parse_memmap_entry(const char *entry, struct 
memory_range *range)
/*
 * entry/end
 */
-   snprintf(filename, PATH_MAX, "%s/%s", entry, "end");
+   ret = snprintf(filename, PATH_MAX, "%s/%s", entry, "end");
+   if (ret < 0 || ret >= PATH_MAX) {
+   fprintf(stderr, "snprintf failed: %s\n", strerror(errno));
+   return -1;
+   }
+
filename[PATH_MAX-1] = 0;
 
range->end = parse_numeric_sysfs(filename);
@@ -149,7 +160,12 @@ static int parse_memmap_entry(const char *entry, struct 
memory_range

[PATCH] vmcore-dmesg/man page: Update the vmcore-dmesg man page

2020-09-22 Thread Bhupesh Sharma
The vmcore-dmesg utility has been in usage for several years,
and is pretty stable now.

So its useful now to modify its man page to indicate the same.
Also fix some minor formatting issues.

Cc: Simon Horman 
Cc: Eric Biederman 
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 vmcore-dmesg/vmcore-dmesg.8 | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/vmcore-dmesg/vmcore-dmesg.8 b/vmcore-dmesg/vmcore-dmesg.8
index d9e3c62ebcb6..ec594b048a61 100644
--- a/vmcore-dmesg/vmcore-dmesg.8
+++ b/vmcore-dmesg/vmcore-dmesg.8
@@ -2,7 +2,7 @@
 .\" First parameter, NAME, should be all caps
 .\" Second parameter, SECTION, should be 1-8, maybe w/ subsection
 .\" other parameters are allowed: see man(7), man(1)
-.TH VMCORE-DMESG 8 "Sep 7, 2010"
+.TH VMCORE-DMESG 8 "Sep 21, 2020"
 .\" Please adjust this date whenever revising the manpage.
 .\"
 .\" Some roff macros, for reference:
@@ -16,7 +16,7 @@
 .\" .sp insert n+1 empty lines
 .\" for manpage-specific macros, see man(7)
 .SH NAME
-vmcore-dmesg \- This is just a placeholder until real man page has been written
+vmcore-dmesg
 .SH SYNOPSIS
 .B vmcore-dmesg
 .RI " vmcore"
@@ -26,9 +26,9 @@ vmcore-dmesg \- This is just a placeholder until real man 
page has been written
 .\" \fI\fP escape sequences to invode bold face and italics,
 .\" respectively.
 \fBvmcore-dmesg\fP extracts the dmesg from a vmcore and write it to
-standard out.  \fBvmcore-dmesg\fP works against either
+standard out. \fBvmcore-dmesg\fP works against either
 \fB/proc/vmcore\fP in a crash dump capture context or a copy
-of \fB/proc/vmcore\fP that has been saved for later analysis.  A
+of \fB/proc/vmcore\fP that has been saved for later analysis. A
 single build of \fBvmcore-dmesg\fP should work against any linux
 vmcore written created on any architecture.
 
-- 
2.26.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64: Add purgatory printing

2020-09-18 Thread Bhupesh SHARMA
Hi Matthias,

On Fri, Sep 18, 2020 at 2:01 PM Matthias Brugger  wrote:
>
>
>
> On 18/09/2020 07:16, Bhupesh SHARMA wrote:
> > Hi Matthias,
> >
> > Thanks for the patch. Some nitpicks inline:
> >
> > On Fri, Sep 18, 2020 at 1:09 AM  wrote:
> >>
> >> From: Matthias Brugger 
> >>
> >> Add option to allow purgatory printing on arm64 hardware
> >> by passing the console name which should be used.
> >> Based on a patch by Geoff Levand.
> >>
> >> Cc: Geoff Levand 
> >> Signed-off-by: Matthias Brugger 
> >> ---
> >>   kexec/arch/arm64/include/arch/options.h |  6 ++-
> >>   kexec/arch/arm64/kexec-arm64.c  | 61 +
> >>   purgatory/arch/arm64/purgatory-arm64.c  | 17 ++-
> >>   3 files changed, 82 insertions(+), 2 deletions(-)
> >
> > Probably we also need to update the man page 'kexec/kexec.8' to add
> > documentation about the newly introduced argument.
> >
>
> I checked the documentation and under "ARCHITECTURE OPTIONS" and it only
> documents a subset of the x86(_64) architecture specific commands. So I think
> there is more work to do to get the documentation right.
>
> But after having a look, I think we might want to use "--serial" as option
> instead of "--console" to be in sync with x86. Although many architectures
> reinvent the options, if we can get it as near to x86 as possible, I think 
> that
> would be a good thing. I'll do that in v2.

That's a good suggestion. I think the '--serial' option is a better
one as compared to '--console'.

> >> diff --git a/kexec/arch/arm64/include/arch/options.h 
> >> b/kexec/arch/arm64/include/arch/options.h
> >> index a17d933..be7d169 100644
> >> --- a/kexec/arch/arm64/include/arch/options.h
> >> +++ b/kexec/arch/arm64/include/arch/options.h
> >> @@ -5,7 +5,8 @@
> >>   #define OPT_DTB((OPT_MAX)+1)
> >>   #define OPT_INITRD ((OPT_MAX)+2)
> >>   #define OPT_REUSE_CMDLINE  ((OPT_MAX)+3)
> >> -#define OPT_ARCH_MAX   ((OPT_MAX)+4)
> >> +#define OPT_CONSOLE((OPT_MAX)+4)
> >> +#define OPT_ARCH_MAX   ((OPT_MAX)+5)
> >>
> >>   #define KEXEC_ARCH_OPTIONS \
> >>  KEXEC_OPTIONS \
> >> @@ -13,6 +14,7 @@
> >>  { "command-line",  1, NULL, OPT_APPEND }, \
> >>  { "dtb",   1, NULL, OPT_DTB }, \
> >>  { "initrd",1, NULL, OPT_INITRD }, \
> >> +   { "console",   1, NULL, OPT_CONSOLE }, \
> >>  { "ramdisk",   1, NULL, OPT_INITRD }, \
> >>  { "reuse-cmdline", 0, NULL, OPT_REUSE_CMDLINE }, \
> >>
> >> @@ -25,6 +27,7 @@ static const char arm64_opts_usage[] __attribute__ 
> >> ((unused)) =
> >>   " --command-line=STRING Set the kernel command line to STRING.\n"
> >>   " --dtb=FILEUse FILE as the device tree blob.\n"
> >>   " --initrd=FILE Use FILE as the kernel initial ramdisk.\n"
> >> +" --console=STRING  Console used for purgatory printing.\n"
> >>   " --ramdisk=FILEUse FILE as the kernel initial ramdisk.\n"
> >>   " --reuse-cmdline   Use kernel command line from running 
> >> system.\n";
> >
> > Just a thought... sometimes the console string is also available as a
> > part of the '--reuse-cmdline' command line argument passed to the
> > kdump kernel. Can we also try to extract the --console string from the
> > cmdline provided to the primary kernel itself?
> >
>
> Well the problem is, that there can be more then one consoles present, so 
> which
> one would be the correct one?
> I see this more like a debug feature which you use knowing which console you 
> are
> looking for the debug messages. In the end it only helps you to see if kdump
> failed in the production system kernel or in the crash kernel.

Indeed, I think we need to add some more comments to explain this
option and distinguish it better from the serial port(s) mentioned in
the '--reuse-cmdline' option.

> >> @@ -32,6 +35,7 @@ struct arm64_opts {
> >>  const char *command_line;
> >>  const char *dtb;
> >>  const char *initrd;
> >> +   const char *console;
> >>   };
> >>
> >>   extern struct arm64_opts arm64_opts;
> >> diff --git a/kexec/arch/arm64/kexec-arm64.c 
> >> b/

Re: [PATCH] arm64: Add purgatory printing

2020-09-17 Thread Bhupesh SHARMA
Hi Matthias,

Thanks for the patch. Some nitpicks inline:

On Fri, Sep 18, 2020 at 1:09 AM  wrote:
>
> From: Matthias Brugger 
>
> Add option to allow purgatory printing on arm64 hardware
> by passing the console name which should be used.
> Based on a patch by Geoff Levand.
>
> Cc: Geoff Levand 
> Signed-off-by: Matthias Brugger 
> ---
>  kexec/arch/arm64/include/arch/options.h |  6 ++-
>  kexec/arch/arm64/kexec-arm64.c  | 61 +
>  purgatory/arch/arm64/purgatory-arm64.c  | 17 ++-
>  3 files changed, 82 insertions(+), 2 deletions(-)

Probably we also need to update the man page 'kexec/kexec.8' to add
documentation about the newly introduced argument.

> diff --git a/kexec/arch/arm64/include/arch/options.h 
> b/kexec/arch/arm64/include/arch/options.h
> index a17d933..be7d169 100644
> --- a/kexec/arch/arm64/include/arch/options.h
> +++ b/kexec/arch/arm64/include/arch/options.h
> @@ -5,7 +5,8 @@
>  #define OPT_DTB((OPT_MAX)+1)
>  #define OPT_INITRD ((OPT_MAX)+2)
>  #define OPT_REUSE_CMDLINE  ((OPT_MAX)+3)
> -#define OPT_ARCH_MAX   ((OPT_MAX)+4)
> +#define OPT_CONSOLE((OPT_MAX)+4)
> +#define OPT_ARCH_MAX   ((OPT_MAX)+5)
>
>  #define KEXEC_ARCH_OPTIONS \
> KEXEC_OPTIONS \
> @@ -13,6 +14,7 @@
> { "command-line",  1, NULL, OPT_APPEND }, \
> { "dtb",   1, NULL, OPT_DTB }, \
> { "initrd",1, NULL, OPT_INITRD }, \
> +   { "console",   1, NULL, OPT_CONSOLE }, \
> { "ramdisk",   1, NULL, OPT_INITRD }, \
> { "reuse-cmdline", 0, NULL, OPT_REUSE_CMDLINE }, \
>
> @@ -25,6 +27,7 @@ static const char arm64_opts_usage[] __attribute__ 
> ((unused)) =
>  " --command-line=STRING Set the kernel command line to STRING.\n"
>  " --dtb=FILEUse FILE as the device tree blob.\n"
>  " --initrd=FILE Use FILE as the kernel initial ramdisk.\n"
> +" --console=STRING  Console used for purgatory printing.\n"
>  " --ramdisk=FILEUse FILE as the kernel initial ramdisk.\n"
>  " --reuse-cmdline   Use kernel command line from running system.\n";

Just a thought... sometimes the console string is also available as a
part of the '--reuse-cmdline' command line argument passed to the
kdump kernel. Can we also try to extract the --console string from the
cmdline provided to the primary kernel itself?

> @@ -32,6 +35,7 @@ struct arm64_opts {
> const char *command_line;
> const char *dtb;
> const char *initrd;
> +   const char *console;
>  };
>
>  extern struct arm64_opts arm64_opts;
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index 45ebc54..44c9e6e 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -165,6 +165,8 @@ int arch_process_options(int argc, char **argv)
> break;
> case OPT_KEXEC_FILE_SYSCALL:
> do_kexec_file_syscall = 1;
> +   case OPT_CONSOLE:
> +   arm64_opts.console = optarg;
> break;
> default:
> break; /* Ignore core and unknown options. */
> @@ -180,12 +182,62 @@ int arch_process_options(int argc, char **argv)
> dbgprintf("%s:%d: dtb: %s\n", __func__, __LINE__,
> (do_kexec_file_syscall && arm64_opts.dtb ? "(ignored)" :
> arm64_opts.dtb));
> +   dbgprintf("%s:%d: console: %s\n", __func__, __LINE__,
> +   arm64_opts.console);
> +
> if (do_kexec_file_syscall)
> arm64_opts.dtb = NULL;
>
> return 0;
>  }
>
> +/**
> + * find_purgatory_sink - Find a sink for purgatory output.
> + */
> +
> +static uint64_t find_purgatory_sink(const char *console)
> +{
> +   int fd, ret;
> +   char folder[255], device[255], mem[255];
> +   struct stat sb;
> +   char buffer[18];

Just trying to understand the reasoning behind keeping the buffer 18
chars long. Can the bytes read from the console exceed the array size
here (may be a boundary check is required here to avoid overflows)?

> +   uint64_t iomem = 0x0;
> +
> +   if (!console)
> +   return 0;
> +
> +   sprintf(device, "/sys/class/tty/%s", console);
> +   if (!stat(folder, ) == 0 && S_ISDIR(sb.st_mode)) {
> +   fprintf(stderr, "kexec: %s: No valid console found for %s\n",
> +   __func__, device);
> +   return 0;
> +   }
> +
> +   sprintf(mem, "%s%s", device, "/iomem_base");
> +   printf("console memory read from %s\n", mem);
> +
> +   fd = open(mem, O_RDONLY);
> +   if (fd < 0) {
> +   fprintf(stderr, "kexec: %s: No able to open %s\n",
> +   __func__, mem);
> +   return 0;
> +   }
> +
> +   memset(buffer, '\0', sizeof(char) * 18);
> 

Re: [PATCH] kexec-tools: Add some missing free() calls

2020-09-16 Thread Bhupesh SHARMA
Hi Youling,

See some comments inline:

On Sat, Sep 12, 2020 at 7:11 AM Youling Tang  wrote:
>
> Add some missing free() calls.
>
> Signed-off-by: Youling Tang 
> ---
>  kexec/arch/i386/crashdump-x86.c| 22 +-
>  kexec/arch/mips/crashdump-mips.c   |  5 -
>  kexec/arch/ppc64/crashdump-ppc64.c |  8 ++--
>  3 files changed, 27 insertions(+), 8 deletions(-)

First, I think this is a step in the right direction, however, earlier
also while running 'valgrind' on an x86_64 kexec elf I saw the
following memory leaks reported:

==596886== 15,604 bytes in 1 blocks are indirectly lost in loss record 4 of 12
==596886==at 0x483A809: malloc (vg_replace_malloc.c:307)
==596886==by 0x40396D: xmalloc (kexec.c:101)
==596886==by 0x410D35: do_bzImage64_load (kexec-bzImage64.c:182)
==596886==by 0x410D35: bzImage64_load (kexec-bzImage64.c:391)
==596886==by 0x404410: my_load (kexec.c:774)
==596886==by 0x402D2D: main (kexec.c:1605)

==596886== 15,732 (128 direct, 15,604 indirect) bytes in 1 blocks are
definitely lost in loss record 5 of 12
==596886==at 0x483CCE8: realloc (vg_replace_malloc.c:834)
==596886==by 0x403FB9: xrealloc (kexec.c:112)
==596886==by 0x403FB9: add_segment_phys_virt (kexec.c:357)
==596886==by 0x40410F: add_buffer_phys_virt (kexec.c:392)
==596886==by 0x404153: add_buffer_virt (kexec.c:401)
==596886==by 0x40CD11: setup_linux_bootloader_parameters_high
(x86-linux-setup.c:80)
==596886==by 0x410E6A: do_bzImage64_load (kexec-bzImage64.c:214)
==596886==by 0x410E6A: bzImage64_load (kexec-bzImage64.c:391)
==596886==by 0x404410: my_load (kexec.c:774)
==596886==by 0x402D2D: main (kexec.c:1605)

==596886== 28,896 bytes in 1 blocks are indirectly lost in loss record 7 of 12
==596886==at 0x483A809: malloc (vg_replace_malloc.c:307)
==596886==by 0x40396D: xmalloc (kexec.c:101)
==596886==by 0x406781: elf_rel_load (kexec-elf-rel.c:254)
==596886==by 0x406EEA: elf_rel_build_load (kexec-elf-rel.c:432)
==596886==by 0x410CFE: do_bzImage64_load (kexec-bzImage64.c:173)
==596886==by 0x410CFE: bzImage64_load (kexec-bzImage64.c:391)
==596886==by 0x404410: my_load (kexec.c:774)
==596886==by 0x402D2D: main (kexec.c:1605)

==596886== 30,048 (1,152 direct, 28,896 indirect) bytes in 1 blocks
are definitely lost in loss record 8 of 12
==596886==at 0x483A809: malloc (vg_replace_malloc.c:307)
==596886==by 0x40396D: xmalloc (kexec.c:101)
==596886==by 0x405735: build_mem_shdrs (kexec-elf.c:618)
==596886==by 0x405735: build_elf_info (kexec-elf.c:774)
==596886==by 0x406EB9: build_elf_rel_info (kexec-elf-rel.c:142)
==596886==by 0x406EB9: elf_rel_build_load (kexec-elf-rel.c:427)
==596886==by 0x410CFE: do_bzImage64_load (kexec-bzImage64.c:173)
==596886==by 0x410CFE: bzImage64_load (kexec-bzImage64.c:391)
==596886==by 0x404410: my_load (kexec.c:774)
==596886==by 0x402D2D: main (kexec.c:1605)
==596886==

Note that there were 12 issues highlighted via valgrind out of which I
have removed the zlib related issue reports.

You can run 'valgrind' on mips, i386 and ppc64 executables (as shown
below) to see if all such issues are fixed by your patch:
$ sudo valgrind --leak-check=full --show-leak-kinds=all
--track-origins=yes --verbose --log-file=valgrind-out.txt ./kexec -l
/boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname -r`.img
--reuse-cmdline -d

> diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
> index c79791f..d5b5b68 100644
> --- a/kexec/arch/i386/crashdump-x86.c
> +++ b/kexec/arch/i386/crashdump-x86.c
> @@ -913,8 +913,11 @@ int load_crashdump_segments(struct kexec_info *info, 
> char* mod_cmdline,
> add_memmap(memmap_p, _memmap, info->backup_src_start, 
> info->backup_src_size, RANGE_RAM);
> for (i = 0; i < crash_reserved_mem_nr; i++) {
> sz = crash_reserved_mem[i].end - crash_reserved_mem[i].start 
> +1;
> -   if (add_memmap(memmap_p, _memmap, 
> crash_reserved_mem[i].start, sz, RANGE_RAM) < 0)
> +   if (add_memmap(memmap_p, _memmap, 
> crash_reserved_mem[i].start,
> +   sz, RANGE_RAM) < 0) {
> +   free(memmap_p);
> return ENOCRASHKERNEL;
> +   }
> }
>
> /* Create a backup region segment to store backup data*/
> @@ -926,22 +929,29 @@ int load_crashdump_segments(struct kexec_info *info, 
> char* mod_cmdline,
> 0, max_addr, -1);
> dbgprintf("Created backup segment at 0x%lx\n",
>   info->backup_start);
> -   if (delete_memmap(memmap_p, _memmap, info->backup_start, 
> sz) < 0)
> +   if (delete_memmap(memmap_p, _memmap, info->backup_start, 
> sz) < 0) {
> +   free(tmp);
> +   free(memmap_p);
> return EFAILED;
> +   }
>  

Re: [PATCH] kexec-tools: Fix a prompt message when crashkernel is not reserved

2020-09-16 Thread Bhupesh SHARMA
Hi Youling,

On Sat, Sep 12, 2020 at 7:10 AM Youling Tang  wrote:
>
> Where Y specifies how much memory to reserve for the dump-capture kernel
> and X specifies the beginning of this reserved memory. So Y should be
> placed before X.
>
> Signed-off-by: Youling Tang 
> ---
>  kexec/kexec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kexec/kexec.c b/kexec/kexec.c
> index bb88caa..fd7c8d2 100644
> --- a/kexec/kexec.c
> +++ b/kexec/kexec.c
> @@ -1530,7 +1530,7 @@ int main(int argc, char *argv[])
> !is_crashkernel_mem_reserved()) {
> die("Memory for crashkernel is not reserved\n"
> "Please reserve memory by passing"
> -   "\"crashkernel=X@Y\" parameter to kernel\n"
> +   "\"crashkernel=Y@X\" parameter to kernel\n"
>         "Then try to loading kdump kernel\n");
> }
>
> --
> 2.1.0

Thanks for the patch. LGTM, so:
Reviewed-by: Bhupesh Sharma 

- Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/1] Calculate offset to field 'init_uts_ns.name'

2020-09-16 Thread Bhupesh SHARMA
Hi Alexander,

Thanks for the patch. See some nitpicks inline:

On Wed, Sep 16, 2020 at 2:39 PM Alexander Egorenkov
 wrote:
>
> The offset has changed in linux-next (v5.9.0) from 4 to 0 because
> there is no more 'kref' member variable at the beginning of 'init_uts_ns'.
> The change was introduced with commit 
> 9a56493f6942c0e2df1579986128721da96e00d8.
> To handle both cases correctly, calculate the offset at run time instead.
>
> Signed-off-by: Alexander Egorenkov 
> ---
>  makedumpfile.c | 6 --
>  makedumpfile.h | 4 
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 4c4251e..5114705 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -1159,7 +1159,7 @@ check_release(void)
> if (SYMBOL(system_utsname) != NOT_FOUND_SYMBOL) {
> utsname = SYMBOL(system_utsname);
> } else if (SYMBOL(init_uts_ns) != NOT_FOUND_SYMBOL) {
> -   utsname = SYMBOL(init_uts_ns) + sizeof(int);
> +   utsname = SYMBOL(init_uts_ns) + OFFSET(init_uts_ns.name);

I am not sure if it is an issue with my mail-client or if the
indentation is a bit different from the original code (I see a
additional space before the statement) [and likewise below].

Please fix those..

> } else {
> ERRMSG("Can't get the symbol of system_utsname.\n");
> return FALSE;
> @@ -2077,7 +2077,7 @@ get_str_osrelease_from_vmlinux(void)
> if (SYMBOL(system_utsname) != NOT_FOUND_SYMBOL) {
> utsname = SYMBOL(system_utsname);
> } else if (SYMBOL(init_uts_ns) != NOT_FOUND_SYMBOL) {
> -   utsname = SYMBOL(init_uts_ns) + sizeof(int);
> +   utsname = SYMBOL(init_uts_ns) + OFFSET(init_uts_ns.name);
> } else {
> ERRMSG("Can't get the symbol of system_utsname.\n");
> return FALSE;
> @@ -2697,6 +2697,8 @@ read_vmcoreinfo(void)
> READ_MEMBER_OFFSET("log.text_len", printk_log.text_len);
> }
>
> +   READ_MEMBER_OFFSET("init_uts_ns.name", init_uts_ns.name);
> +
> READ_ARRAY_LENGTH("node_data", node_data);
> READ_ARRAY_LENGTH("pgdat_list", pgdat_list);
> READ_ARRAY_LENGTH("mem_section", mem_section);

Hmm.. don't we need a similar addition inside 'write_vmcoreinfo_data'.
Something like:
WRITE_MEMBER_OFFSET("init_uts_ns.name", init_uts_ns.name);

> diff --git a/makedumpfile.h b/makedumpfile.h
> index 03fb4ce..7d8c54d 100644
> --- a/makedumpfile.h
> +++ b/makedumpfile.h
> @@ -1880,6 +1880,10 @@ struct offset_table {
> struct cpu_spec_s {
> longmmu_features;
> } cpu_spec;
> +
> +   struct init_uts_ns_s {
> +   longname;
> +   } init_uts_ns;
>  };
>
>  /*
> --
> 2.26.2

Thanks,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec/arm64: Add support for ARMv8.2 (large space addressing) 52-bit VA extensions

2020-09-15 Thread Bhupesh Sharma
This patch adds support for ARMv8.2 52-bit VA (large space addressing)
extension in kexec-tools.

With ARMv8.2-LVA architecture extension availability, arm64 hardware
which supports this extension can support upto 52-bit virtual
addresses. It is specially useful for having a 52-bit user-space virtual
address space while the kernel can still retain 48-bit/52-bit virtual
addressing.

Since at the moment we enable the support of this extension in the
kernel via a CONFIG flag (CONFIG_ARM64_VA_BITS_52), so there are
no clear mechanisms in user-space to determine this CONFIG
flag value and use it to determine the kernel-space VA address range
values.

'kexec-tools' can instead use 'TCR_EL1.T1SZ' value from vmcoreinfo
which indicates the size offset of the memory region addressed by
TTBR1_EL1 (and hence can be used for determining the
'vabits_actual' value).

Using the vmcoreinfo variable exported by kernel commit
 bbdbc11804ff ("arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo"),
the user-space can use the following computation for determining the
'vabits_actual' value:

   if (TCR_EL1.T1SZ is available in vmcoreinfo)
vabits_actual = 64 - TCR_EL1.T1SZ;
   else {
read_id_aa64mmfr2_el1();

if (hardware supports 52-bit addressing)
   vabits_actual = 52;
else
   vabits_actual = va_bits value calculated via _stext symbol;
   }

I have tested several combinations with both old and latest upstream
kernels with different VA values (39, 42, 48 and 52-bits) on at-least
3 different boards, which include:
1. CPUs which don't support ARMv8.2 features, e.g. qualcomm-amberwing,
   ampere-osprey.
2. Prototype models which support ARMv8.2 extensions (e.g. ARMv8 FVP
   simulation model).

This patch is in accordance with ARMv8 Architecture Reference Manual.

Cc: Simon Horman 
Cc: kexec@lists.infradead.org

Signed-off-by: Bhupesh Sharma 
---
 kexec/arch/arm64/Makefile  |   2 +
 kexec/arch/arm64/common-arm64.c| 332 +
 kexec/arch/arm64/common-arm64.h|   8 +
 kexec/arch/arm64/crashdump-arm64.c |  29 +--
 kexec/arch/arm64/kexec-arm64.c | 120 +--
 kexec/kexec.h  |  10 +
 util_lib/elf_info.c|  35 +++
 util_lib/include/elf_info.h|   2 +
 8 files changed, 397 insertions(+), 141 deletions(-)
 create mode 100644 kexec/arch/arm64/common-arm64.c
 create mode 100644 kexec/arch/arm64/common-arm64.h

diff --git a/kexec/arch/arm64/Makefile b/kexec/arch/arm64/Makefile
index d27c8ee1b5e7..4ae21c3b02e6 100644
--- a/kexec/arch/arm64/Makefile
+++ b/kexec/arch/arm64/Makefile
@@ -11,6 +11,7 @@ arm64_MEM_REGIONS = kexec/mem_regions.c
 arm64_CPPFLAGS += -I $(srcdir)/kexec/
 
 arm64_KEXEC_SRCS += \
+   kexec/arch/arm64/common-arm64.c \
kexec/arch/arm64/crashdump-arm64.c \
kexec/arch/arm64/kexec-arm64.c \
kexec/arch/arm64/kexec-elf-arm64.c \
@@ -27,6 +28,7 @@ arm64_PHYS_TO_VIRT =
 
 dist += $(arm64_KEXEC_SRCS) \
kexec/arch/arm64/include/arch/options.h \
+   kexec/arch/arm64/common-arm64.h \
kexec/arch/arm64/crashdump-arm64.h \
kexec/arch/arm64/image-header.h \
kexec/arch/arm64/iomem.h \
diff --git a/kexec/arch/arm64/common-arm64.c b/kexec/arch/arm64/common-arm64.c
new file mode 100644
index ..65942e8914e3
--- /dev/null
+++ b/kexec/arch/arm64/common-arm64.c
@@ -0,0 +1,332 @@
+/*
+ * ARM64 common parts for kexec and crash.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "kexec.h"
+#include "kexec-arm64.h"
+#include "common-arm64.h"
+
+#define PAGE_OFFSET_36 ((0xUL) << 36)
+#define PAGE_OFFSET_39 ((0xUL) << 39)
+#define PAGE_OFFSET_42 ((0xUL) << 42)
+#define PAGE_OFFSET_47 ((0xUL) << 47)
+#define PAGE_OFFSET_48 ((0xUL) << 48)
+
+#define SZ_64K 65536
+
+/* ID_AA64MMFR2_EL1 related helpers: */
+#define ID_AA64MMFR2_LVA_SHIFT 16
+#define ID_AA64MMFR2_LVA_MASK  (0xf << ID_AA64MMFR2_LVA_SHIFT)
+
+/* CPU feature ID registers */
+#define get_cpu_ftr(id) ({ 
\
+   unsigned long __val;
\
+   asm volatile("mrs %0, " __stringify(id) : "=r" (__val));
\
+   __val;  
\
+})
+
+/* Machine specific details. */
+static int va_bits;
+
+/* Global flag which indicates that we have tried reading
+ * TCR_EL1_T1SZ from 'kcore' already.
+ */
+static bool try_read_tcr_el1_t1sz_from_kcore = false;
+
+/**
+ * get_va_bits - Helper for getting VA_BITS
+ */
+
+static int get_va_bits(void)
+{
+  

Re: [PATCH] arm64 : fix makedumpfile failure on 5.4+ kernels

2020-09-10 Thread Bhupesh SHARMA
Hello Ioanna,

Thanks for the patch. I am partially at blame here (and also for
top-posting here) as this failure is caused due to the flipped VA
address space support we have on arm64 architecture now with newer
kernels (>= 5.4.0) due to the addition of larger VA addressing space
features (52-bit).

I had sent out a v4 series to fix this issue several months back (see
) and I was
supposed to send an update for the same (v5), but the kernel patches
(for variable export to vmcoreinfo) took a longer time to get accepted
upstream.

I have shared the updated v5 version earlier today (see
).
Can you please try the same and share your testing inputs.

Thanks,
Bhupesh




On Thu, Sep 10, 2020 at 6:25 AM Ioanna Alifieraki
 wrote:
>
> Currently makedumpfile fails on arm64 for 5.4 and newer kernels with
> the following :
>
> [9.595476] kdump-tools[513]: Starting kdump-tools:
> [9.597988] kdump-tools[519]:  * running makedumpfile -c -d 31 /proc/vmcore 
> /var/crash/202009011332/dump-incomplete
> [9.636915] kdump-tools[537]: calculate_plat_config: PAGE SIZE 0x1000 and VA 
> Bits 47 not supported
> [9.639652] kdump-tools[537]: get_machdep_info_arm64: Can't determine platform 
> config values
> [9.642085] kdump-tools[537]: makedumpfile Failed.
> [9.643064] kdump-tools[519]:  * kdump-tools: makedumpfile failed, falling 
> back to 'cp'
>
> The problem starts at get_versiondep_info_arm64 function.
> This functions uses _stext to calculate the va_bits.
> Up to 5.3 kernels the _stext would be 10081000.
> After commit 14c127c957c1c607 (arm64: mm: Flip kernel VA space),
> _stext is 800010081000 which ends up the va_bits getting the value
> of 47, even though the kernel configuration is 48 bits.
> Now _stext has contiguous bits 8  and matches the 47 bits
> while in the past it had 0 that would match the 48 bits.
> The va_bits variable is already exported in vmcoreinfo and therefore
> that could be solved by reading the value  from there
> (va_bits = NUMBER(VA_BITS)) instead of relying on _stext.
> However, if we do so, the page_offset is not calculated properly.
> Currently :
> info->page_offset = (0xUL) << (va_bits - 1);
> The page_offset still depends on the _stext value.
> So read the va_bits from vmcoreinfo and keep the old logic of calculating
> va_bits to calculate page_offset (calc_page_offset_variable).
>
> Signed-off-by: Ioanna Alifieraki 
> ---
>  arch/arm64.c | 18 +++---
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64.c b/arch/arm64.c
> index 54d60b4..2b0fca3 100644
> --- a/arch/arm64.c
> +++ b/arch/arm64.c
> @@ -290,6 +290,7 @@ int
>  get_versiondep_info_arm64(void)
>  {
> ulong _stext;
> +   int calc_page_offset;
>
> _stext = get_stext_symbol();
> if (!_stext) {
> @@ -297,25 +298,28 @@ get_versiondep_info_arm64(void)
> return FALSE;
> }
>
> +   va_bits = NUMBER(VA_BITS);
> +
> /* Derive va_bits as per arch/arm64/Kconfig */
> if ((_stext & PAGE_OFFSET_36) == PAGE_OFFSET_36) {
> -   va_bits = 36;
> +   calc_page_offset = 36;
> } else if ((_stext & PAGE_OFFSET_39) == PAGE_OFFSET_39) {
> -   va_bits = 39;
> +   calc_page_offset = 39;
> } else if ((_stext & PAGE_OFFSET_42) == PAGE_OFFSET_42) {
> -   va_bits = 42;
> +   calc_page_offset = 42;
> } else if ((_stext & PAGE_OFFSET_47) == PAGE_OFFSET_47) {
> -   va_bits = 47;
> +   calc_page_offset = 47;
> } else if ((_stext & PAGE_OFFSET_48) == PAGE_OFFSET_48) {
> -   va_bits = 48;
> +   calc_page_offset = 48;
> } else {
> -   ERRMSG("Cannot find a proper _stext for calculating 
> VA_BITS\n");
> +   ERRMSG("Cannot find a proper _stext for calculating 
> page_offset\n");
> return FALSE;
> }
>
> -   info->page_offset = (0xUL) << (va_bits - 1);
> +   info->page_offset = (0xUL) << (calc_page_offset - 1);
>
> DEBUG_MSG("va_bits  : %d\n", va_bits);
> +   DEBUG_MSG("calc_page_offset  : %d\n", calc_page_offset);
> DEBUG_MSG("page_offset  : %lx\n", info->page_offset);
>
> return TRUE;
> --
> 2.17.1
>
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] crash_dump: Remove the unused include statements

2020-09-10 Thread Bhupesh Sharma
Hi Tian,

On Thu, Sep 10, 2020 at 2:22 PM Tian Tao  wrote:
>
> linux/pgtable.h is included more than once, Remove the one that isn't
> necessary.
>
> Signed-off-by: Tian Tao 
> ---
>  include/linux/crash_dump.h | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
> index a5192b7..6bd8a33 100644
> --- a/include/linux/crash_dump.h
> +++ b/include/linux/crash_dump.h
> @@ -8,8 +8,6 @@
>  #include 
>  #include 
>
> -#include  /* for pgprot_t */
> -
>  #ifdef CONFIG_CRASH_DUMP
>  #define ELFCORE_ADDR_MAX   (-1ULL)
>  #define ELFCORE_ADDR_ERR   (-2ULL)
> --

LGTM, so:
Reviewed-by: Bhupesh Sharma 

Thanks.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v5 0/3] makedumpfile/arm64: Add support for ARMv8.2 extensions

2020-09-09 Thread Bhupesh Sharma
Changes since v4:

- v4 can be seen here:
  https://www.spinics.net/lists/kexec/msg23850.html
- Removed the patch (via [PATCH 4/4] in v3) which marked '--mem-usage'
  option as unsupported for arm64 architecture, as we now have a mechanism
  to read the 'vabits_actual' value from 'id_aa64mmfr2_el1' arm64 system
  architecture register. As per discussions with arm64 and gcc/binutils
  maintainers it turns out there is no standard ABI available between
  the kernel and user-space to export this value early enough to be used
  for page_offset calculation in the --mem-usage case. So, the next best
  option is to have the user-space read the system register to determine
  underlying hardware support for larger (52-bit) addressing support.

  This allows us to keep supporting '--mem-usage' option on arm64 even
  on newer kernels (with flipped VA space).

Changes since v3:

- v3 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-March/022534.html
- Added a new patch (via [PATCH 4/4]) which marks '--mem-usage' option as
  unsupported for arm64 architecture. With the newer arm64 kernels
  supporting 48-bit/52-bit VA address spaces and keeping a single
  binary for supporting the same, the address of
  kernel symbols like _stext, which could be earlier used to determine
  VA_BITS value, can no longer to determine whether VA_BITS is set to 48
  or 52 in the kernel space. Hence for now, it makes sense to mark
  '--mem-usage' option as unsupported for arm64 architecture until
  we have more clarity from arm64 kernel maintainers on how to manage
  the same in future kernel/makedumpfile versions.

Changes since v2:

- v2 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-February/022456.html
- I missed some comments from Kazu sent on the LVA v1 patch when I sent
  out the v2. So, addressing them now in v3.
- Also added a patch that adds a tree-wide feature to read
  'MAX_PHYSMEM_BITS' from vmcoreinfo (if available).

Changes since v1:

- v1 was sent as two separate patches:
  http://lists.infradead.org/pipermail/kexec/2019-February/022424.html
  (ARMv8.2-LPA)
  http://lists.infradead.org/pipermail/kexec/2019-February/022425.html
  (ARMv8.2-LVA)
- v2 combined the two in a single patchset and also addresses Kazu's
  review comments.

This patchset adds support for ARMv8.2 extensions in makedumpfile code.
I cover the following cases with this patchset:
- Both old (<5.4) and new kernels (>= 5.4) work well.
- All VA and PA bit combinations currently supported via the kernel
  CONFIG options work well, including:
 - 48-bit kernel VA + 52-bit PA (LPA)
 - 52-bit kernel VA (LVA) + 52-bit PA (LPA)

This has been tested for the following user-cases:
1. Analysing page information via '--mem-usage' option.
2. Creating a dumpfile using /proc/vmcore,
3. Creating a dumpfile using /proc/kcore, and
4. Post-processing a vmcore.

I have tested this patchset on the following platforms, with kernels
which support/do-not-support ARMv8.2 features:
1. CPUs which don't support ARMv8.2 features, e.g. qualcomm-amberwing,
   ampere-osprey.
2. Prototype models which support ARMv8.2 extensions (e.g. ARMv8 FVP
   simulation model).

Also a preparation patch has been added in this patchset which adds a
common feature for archs (except arm64, for which similar support is
added via subsequent patch) to retrieve 'MAX_PHYSMEM_BITS' from
vmcoreinfo (if available).

This patchset ensures backward compatibility for kernel versions in
which 'TCR_EL1.T1SZ' and 'MAX_PHYSMEM_BITS' are not available in
vmcoreinfo.

In the newer kernels (>= 5.4.0) the following patches export these
variables in the vmcoreinfo:
 - 1d50e5d0c505 ("crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to 
vmcoreinfo")
 - bbdbc11804ff ("arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo")

Cc: John Donnelly 
Cc: Kazuhito Hagio 
Cc: kexec@lists.infradead.org

Bhupesh Sharma (3):
  tree-wide: Retrieve 'MAX_PHYSMEM_BITS' from vmcoreinfo (if available)
  makedumpfile/arm64: Add support for ARMv8.2-LPA (52-bit PA support)
  makedumpfile/arm64: Add support for ARMv8.2-LVA (52-bit kernel VA
support)

 arch/arm.c |   8 +-
 arch/arm64.c   | 520 ++---
 arch/ia64.c|   7 +-
 arch/ppc.c |   8 +-
 arch/ppc64.c   |  49 +++--
 arch/s390x.c   |  29 +--
 arch/sparc64.c |   9 +-
 arch/x86.c |  34 ++--
 arch/x86_64.c  |  27 +--
 common.h   |  10 +
 makedumpfile.c |   4 +-
 makedumpfile.h |   6 +-
 12 files changed, 529 insertions(+), 182 deletions(-)

-- 
2.26.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v5 1/3] tree-wide: Retrieve 'MAX_PHYSMEM_BITS' from vmcoreinfo (if available)

2020-09-09 Thread Bhupesh Sharma
This patch adds a common feature for archs (except arm64, for which
similar support is added via subsequent patch) to retrieve
'MAX_PHYSMEM_BITS' from vmcoreinfo (if available).

I recently posted a kernel patch (see [0]) which appends
'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself rather than
in arch-specific code, so that user-space code can also benefit from
this addition to the vmcoreinfo and use it as a standard way of
determining 'SECTIONS_SHIFT' value in 'makedumpfile' utility.

This patch ensures backward compatibility for kernel versions in which
'MAX_PHYSMEM_BITS' is not available in vmcoreinfo.

[0]. http://lists.infradead.org/pipermail/kexec/2019-November/023960.html

Cc: Kazuhito Hagio 
Cc: John Donnelly 
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 arch/arm.c |  8 +++-
 arch/ia64.c|  7 ++-
 arch/ppc.c |  8 +++-
 arch/ppc64.c   | 49 -
 arch/s390x.c   | 29 ++---
 arch/sparc64.c |  9 +++--
 arch/x86.c | 34 --
 arch/x86_64.c  | 27 ---
 8 files changed, 109 insertions(+), 62 deletions(-)

diff --git a/arch/arm.c b/arch/arm.c
index af7442ac70bf..33536fc4dfc9 100644
--- a/arch/arm.c
+++ b/arch/arm.c
@@ -81,7 +81,13 @@ int
 get_machdep_info_arm(void)
 {
info->page_offset = SYMBOL(_stext) & 0xUL;
-   info->max_physmem_bits = _MAX_PHYSMEM_BITS;
+
+   /* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
+   if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
+   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
+   else
+   info->max_physmem_bits = _MAX_PHYSMEM_BITS;
+
info->kernel_start = SYMBOL(_stext);
info->section_size_bits = _SECTION_SIZE_BITS;
 
diff --git a/arch/ia64.c b/arch/ia64.c
index 6c33cc7c8288..fb44dda47172 100644
--- a/arch/ia64.c
+++ b/arch/ia64.c
@@ -85,7 +85,12 @@ get_machdep_info_ia64(void)
}
 
info->section_size_bits = _SECTION_SIZE_BITS;
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS;
+
+   /* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
+   if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
+   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
+   else
+   info->max_physmem_bits  = _MAX_PHYSMEM_BITS;
 
return TRUE;
 }
diff --git a/arch/ppc.c b/arch/ppc.c
index 37c6a3b60cd3..ed9447427a30 100644
--- a/arch/ppc.c
+++ b/arch/ppc.c
@@ -31,7 +31,13 @@ get_machdep_info_ppc(void)
unsigned long vmlist, vmap_area_list, vmalloc_start;
 
info->section_size_bits = _SECTION_SIZE_BITS;
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS;
+
+   /* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
+   if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
+   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
+   else
+   info->max_physmem_bits  = _MAX_PHYSMEM_BITS;
+
info->page_offset = __PAGE_OFFSET;
 
if (SYMBOL(_stext) != NOT_FOUND_SYMBOL)
diff --git a/arch/ppc64.c b/arch/ppc64.c
index 9d8f2525f608..a3984eebdced 100644
--- a/arch/ppc64.c
+++ b/arch/ppc64.c
@@ -466,30 +466,37 @@ int
 set_ppc64_max_physmem_bits(void)
 {
long array_len = ARRAY_LENGTH(mem_section);
-   /*
-* The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the
-* newer kernels 3.7 onwards uses 46 bits.
-*/
-
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS_ORIG ;
-   if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
-   || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
-   return TRUE;
-
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS_3_7;
-   if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
-   || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
-   return TRUE;
 
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS_4_19;
-   if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
-   || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
+   /* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
+   if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER) {
+   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
return TRUE;
+   } else {
+   /*
+* The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the
+* newer kernels 3.7 onwards uses 46 bits.
+*/
 
-   info->max_physmem_bits  = _MAX_PHYSMEM_BITS_4_20;
-   if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
-   || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
-   return TRUE;
+   info->max_physmem_bits  = _MAX_PHY

[PATCH v5 2/3] makedumpfile/arm64: Add support for ARMv8.2-LPA (52-bit PA support)

2020-09-09 Thread Bhupesh Sharma
ARMv8.2-LPA architecture extension (if available on underlying hardware)
can support 52-bit physical addresses, while the kernel virtual
addresses remain 48-bit.

Make sure that we read the 52-bit PA address capability from
'MAX_PHYSMEM_BITS' variable (if available in vmcoreinfo) and
accordingly change the pte_to_phy() mask values and also traverse
the page-table walk accordingly.

Also make sure that it works well for the existing 48-bit PA address
platforms and also on environments which use newer kernels with 52-bit
PA support but hardware which is not ARM8.2-LPA compliant.

Kernel commit 1d50e5d0c505 ("crash_core, vmcoreinfo: Append
'MAX_PHYSMEM_BITS' to vmcoreinfo") already supports adding
'MAX_PHYSMEM_BITS' variable to vmcoreinfo.

This patch is in accordance with ARMv8 Architecture Reference Manual

Cc: Kazuhito Hagio 
Cc: John Donnelly 
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 arch/arm64.c | 291 ---
 1 file changed, 204 insertions(+), 87 deletions(-)

diff --git a/arch/arm64.c b/arch/arm64.c
index 54d60b440850..709e0a506916 100644
--- a/arch/arm64.c
+++ b/arch/arm64.c
@@ -39,72 +39,185 @@ typedef struct {
unsigned long pte;
 } pte_t;
 
+#define __pte(x)   ((pte_t) { (x) } )
+#define __pmd(x)   ((pmd_t) { (x) } )
+#define __pud(x)   ((pud_t) { (x) } )
+#define __pgd(x)   ((pgd_t) { (x) } )
+
+static int lpa_52_bit_support_available;
 static int pgtable_level;
 static int va_bits;
 static unsigned long kimage_voffset;
 
-#define SZ_4K  (4 * 1024)
-#define SZ_16K (16 * 1024)
-#define SZ_64K (64 * 1024)
-#define SZ_128M(128 * 1024 * 1024)
+#define SZ_4K  4096
+#define SZ_16K 16384
+#define SZ_64K 65536
 
-#define PAGE_OFFSET_36 ((0xUL) << 36)
-#define PAGE_OFFSET_39 ((0xUL) << 39)
-#define PAGE_OFFSET_42 ((0xUL) << 42)
-#define PAGE_OFFSET_47 ((0xUL) << 47)
-#define PAGE_OFFSET_48 ((0xUL) << 48)
+#define PAGE_OFFSET_36 ((0xUL) << 36)
+#define PAGE_OFFSET_39 ((0xUL) << 39)
+#define PAGE_OFFSET_42 ((0xUL) << 42)
+#define PAGE_OFFSET_47 ((0xUL) << 47)
+#define PAGE_OFFSET_48 ((0xUL) << 48)
+#define PAGE_OFFSET_52 ((0xUL) << 52)
 
 #define pgd_val(x) ((x).pgd)
 #define pud_val(x) (pgd_val((x).pgd))
 #define pmd_val(x) (pud_val((x).pud))
 #define pte_val(x) ((x).pte)
 
-#define PAGE_MASK  (~(PAGESIZE() - 1))
-#define PGDIR_SHIFT((PAGESHIFT() - 3) * pgtable_level + 3)
-#define PTRS_PER_PGD   (1 << (va_bits - PGDIR_SHIFT))
-#define PUD_SHIFT  get_pud_shift_arm64()
-#define PUD_SIZE   (1UL << PUD_SHIFT)
-#define PUD_MASK   (~(PUD_SIZE - 1))
-#define PTRS_PER_PTE   (1 << (PAGESHIFT() - 3))
-#define PTRS_PER_PUD   PTRS_PER_PTE
-#define PMD_SHIFT  ((PAGESHIFT() - 3) * 2 + 3)
-#define PMD_SIZE   (1UL << PMD_SHIFT)
-#define PMD_MASK   (~(PMD_SIZE - 1))
+/* See 'include/uapi/linux/const.h' for definitions below */
+#define __AC(X,Y)  (X##Y)
+#define _AC(X,Y)   __AC(X,Y)
+#define _AT(T,X)   ((T)(X))
+
+/* See 'include/asm/pgtable-types.h' for definitions below */
+typedef unsigned long pteval_t;
+typedef unsigned long pmdval_t;
+typedef unsigned long pudval_t;
+typedef unsigned long pgdval_t;
+
+#define PAGE_SHIFT PAGESHIFT()
+
+/* See 'arch/arm64/include/asm/pgtable-hwdef.h' for definitions below */
+
+#define ARM64_HW_PGTABLE_LEVEL_SHIFT(n)((PAGE_SHIFT - 3) * (4 - (n)) + 
3)
+
+#define PTRS_PER_PTE   (1 << (PAGE_SHIFT - 3))
+
+/*
+ * PMD_SHIFT determines the size a level 2 page table entry can map.
+ */
+#define PMD_SHIFT  ARM64_HW_PGTABLE_LEVEL_SHIFT(2)
+#define PMD_SIZE   (_AC(1, UL) << PMD_SHIFT)
+#define PMD_MASK   (~(PMD_SIZE-1))
 #define PTRS_PER_PMD   PTRS_PER_PTE
 
-#define PAGE_PRESENT   (1 << 0)
+/*
+ * PUD_SHIFT determines the size a level 1 page table entry can map.
+ */
+#define PUD_SHIFT  ARM64_HW_PGTABLE_LEVEL_SHIFT(1)
+#define PUD_SIZE   (_AC(1, UL) << PUD_SHIFT)
+#define PUD_MASK   (~(PUD_SIZE-1))
+#define PTRS_PER_PUD   PTRS_PER_PTE
+
+/*
+ * PGDIR_SHIFT determines the size a top-level page table entry can map
+ * (depending on the configuration, this level can be 0, 1 or 2).
+ */
+#define PGDIR_SHIFTARM64_HW_PGTABLE_LEVEL_SHIFT(4 - 
(pgtable_level))
+#define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT)
+#define PGDIR_MASK 

[PATCH v5 3/3] makedumpfile/arm64: Add support for ARMv8.2-LVA (52-bit kernel VA support)

2020-09-09 Thread Bhupesh Sharma
With ARMv8.2-LVA architecture extension availability, arm64 hardware
which supports this extension can support upto 52-bit virtual
addresses. It is specially useful for having a 52-bit user-space virtual
address space while the kernel can still retain 48-bit/52-bit virtual
addressing.

Since at the moment we enable the support of this extension in the
kernel via a CONFIG flag (CONFIG_ARM64_VA_BITS_52), so there are
no clear mechanisms in user-space to determine this CONFIG
flag value and use it to determine the kernel-space VA address range
values.

'makedumpfile' can instead use 'TCR_EL1.T1SZ' value from vmcoreinfo
which indicates the size offset of the memory region addressed by
TTBR1_EL1 (and hence can be used for determining the
vabits_actual value).

Using the vmcoreinfo variable exported by kernel commit
 bbdbc11804ff ("arm64/crash_core: Export  TCR_EL1.T1SZ in vmcoreinfo"),
the user-space can use the following computation for determining whether
 an address lies in the linear map range (for newer kernels >= 5.4):

  #define __is_lm_address(addr) (!(((u64)addr) & BIT(vabits_actual - 1)))

Note that for the --mem-usage case though we need to calculate
vabits_actual value before the vmcoreinfo read functionality is ready,
so we can instead read the architecture register ID_AA64MMFR2_EL1
directly to see if the underlying hardware supports 52-bit addressing
and accordingly set vabits_actual as:

   read_id_aa64mmfr2_el1();
   if (hardware supports 52-bit addressing)
vabits_actual = 52;
   else
vabits_actual = va_bits value calculated via _stext symbol;

Also make sure that the page_offset, is_linear_addr(addr) and __pa()
calculations work both for older (< 5.4) and newer kernels (>= 5.4).

I have tested several combinations with both kernel categories
[for e.g. with different VA (39, 42, 48 and 52-bit) and PA combinations
(48 and 52-bit)] on at-least 3 different boards.

Unfortunately, this means that we need to call 'populate_kernel_version()'
earlier 'get_page_offset_arm64()' as 'info->kernel_version' remains
uninitialized before its first use otherwise.

This patch is in accordance with ARMv8 Architecture Reference Manual

Cc: Kazuhito Hagio 
Cc: John Donnelly 
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 arch/arm64.c   | 233 ++---
 common.h   |  10 +++
 makedumpfile.c |   4 +-
 makedumpfile.h |   6 +-
 4 files changed, 218 insertions(+), 35 deletions(-)

diff --git a/arch/arm64.c b/arch/arm64.c
index 709e0a506916..ccaa8641ca66 100644
--- a/arch/arm64.c
+++ b/arch/arm64.c
@@ -19,10 +19,23 @@
 
 #ifdef __aarch64__
 
+#include 
+#include 
 #include "../elf_info.h"
 #include "../makedumpfile.h"
 #include "../print_info.h"
 
+/* ID_AA64MMFR2_EL1 related helpers: */
+#define ID_AA64MMFR2_LVA_SHIFT 16
+#define ID_AA64MMFR2_LVA_MASK  (0xf << ID_AA64MMFR2_LVA_SHIFT)
+
+/* CPU feature ID registers */
+#define get_cpu_ftr(id) ({ 
\
+   unsigned long __val;
\
+   asm volatile("mrs %0, " __stringify(id) : "=r" (__val));
\
+   __val;  
\
+})
+
 typedef struct {
unsigned long pgd;
 } pgd_t;
@@ -47,6 +60,7 @@ typedef struct {
 static int lpa_52_bit_support_available;
 static int pgtable_level;
 static int va_bits;
+static int vabits_actual;
 static unsigned long kimage_voffset;
 
 #define SZ_4K  4096
@@ -58,7 +72,6 @@ static unsigned long kimage_voffset;
 #define PAGE_OFFSET_42 ((0xUL) << 42)
 #define PAGE_OFFSET_47 ((0xUL) << 47)
 #define PAGE_OFFSET_48 ((0xUL) << 48)
-#define PAGE_OFFSET_52 ((0xUL) << 52)
 
 #define pgd_val(x) ((x).pgd)
 #define pud_val(x) (pgd_val((x).pgd))
@@ -219,13 +232,25 @@ pmd_page_paddr(pmd_t pmd)
 #define pte_index(vaddr)   (((vaddr) >> PAGESHIFT()) & 
(PTRS_PER_PTE - 1))
 #define pte_offset(dir, vaddr) (pmd_page_paddr((*dir)) + 
pte_index(vaddr) * sizeof(pte_t))
 
+/*
+ * The linear kernel range starts at the bottom of the virtual address
+ * space. Testing the top bit for the start of the region is a
+ * sufficient check and avoids having to worry about the tag.
+ */
+#define is_linear_addr(addr)   ((info->kernel_version < KERNEL_VERSION(5, 4, 
0)) ? \
+   (!!((unsigned long)(addr) & (1UL << (vabits_actual - 1 : \
+   (!((unsigned long)(addr) & (1UL << (vabits_actual - 1)
+
 static unsigned long long
 __pa(unsigned long vaddr)
 {
if (kimage_voffset == NOT_FOUND_NUMBER ||
-   (vaddr >= PAGE_OFFSET))
-   return (vaddr - PAGE_OFFSET + 

Re: [PATCH] kexec: remove the 2GB size limit on initrd file

2020-09-02 Thread Bhupesh Sharma
Hi Robi,

On Wed, Sep 2, 2020 at 1:05 PM Robi Buranyi  wrote:
>
> Enable loading initrd files exceeding the INT_MAX size. Remove the
> INT_MAX limit completely, and let any initrd load if it fits in the
> memory.
>
> Signed-off-by: Robi Buranyi 
> ---
>  kernel/kexec_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index ca40bef75a61..659a9d165198 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -222,7 +222,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
> loff_t size;
>
> ret = kernel_read_file_from_fd(kernel_fd, >kernel_buf,
> -  , INT_MAX, READING_KEXEC_IMAGE);
> +  , 0, READING_KEXEC_IMAGE);
> if (ret)
> return ret;
> image->kernel_buf_len = size;
> @@ -242,7 +242,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
> /* It is possible that there no initramfs is being loaded */
> if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
> ret = kernel_read_file_from_fd(initrd_fd, >initrd_buf,
> -  , INT_MAX,
> +  , 0,
>READING_KEXEC_INITRAMFS);
> if (ret)
> goto out;
> --
> 2.28.0.402.g5ffc5be6b7-goog

Can you share some background about this fix? For example why is it
needed or what is failing at your end?
I think a 2GB initramfs is a good enough size to accommodate while
loading it via kexec_file_load(). Eventually the initramfs to be
loaded is passed via user-space as a command line argument to the
kexec_file_load() syscall, so we should be careful about the file
sizes we might be loading here.

I am just trying to understand what initramfs size limits you are
working with from a kexec_file_load() p-o-v.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-07-04 Thread Bhupesh Sharma
On Fri, Jul 3, 2020 at 10:09 PM Catalin Marinas  wrote:
>
> On Fri, Jul 03, 2020 at 12:55:03AM +0530, Bhupesh Sharma wrote:
> > On Fri, May 15, 2020 at 2:44 PM Bhupesh Sharma  wrote:
> > > On Thu, Apr 30, 2020 at 10:05 AM Bhupesh Sharma  
> > > wrote:
> > > > On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas 
> > > >  wrote:
> > > > >
> > > > > On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > > > > > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  
> > > > > > wrote:
> > > > > > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > > > > > >  arch/arm64/configs/defconfig | 1 +
> > > > > > > >  1 file changed, 1 insertion(+)
> > > > > > > >
> > > > > > > > diff --git a/arch/arm64/configs/defconfig 
> > > > > > > > b/arch/arm64/configs/defconfig
> > > > > > > > index 24e534d85045..fa122f4341a2 100644
> > > > > > > > --- a/arch/arm64/configs/defconfig
> > > > > > > > +++ b/arch/arm64/configs/defconfig
> > > > > > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > > > > > >  CONFIG_NUMA=y
> > > > > > > >  CONFIG_SECCOMP=y
> > > > > > > >  CONFIG_KEXEC=y
> > > > > > > > +CONFIG_KEXEC_FILE=y
> > > > > > > >  CONFIG_CRASH_DUMP=y
> > > > > > > >  CONFIG_XEN=y
> > > > > > > >  CONFIG_COMPAT=y
> > > > > > > > --
> > > > > > > > 2.7.4
> > > > > >
> > > > > > Thanks a lot  Mark.
> > > > > >
> > > > > > Hi Catalin, Will,
> > > > > >
> > > > > > Can you please help pick this patch in the arm tree. We have an
> > > > > > increasing number of user-cases from distro users
> > > > > > who want to use kexec_file_load() as the default interface for
> > > > > > kexec/kdump on arm64.
> > > > >
> > > > > We could pick it up if it doesn't conflict with the arm-soc tree. They
> > > > > tend to pick most of the defconfig changes these days (and could as 
> > > > > well
> > > > > pick this one).
> > > >
> > > > Thanks Catalin.
> > > > (+Cc Arnd)
> > > >
> > > > Hi Arnd,
> > > >
> > > > Can you please help pick this change via the arm-soc tree?
> > >
> > > Ping. Any updates on this defconfig patch.
> >
> > Ping. Seems there is no reply from Arnd on this patch.
> > Can you please help pull in this one as well. It has been pending for
> > quite some time now.
>
> I can queue it for 5.9.

Many thanks, Catalin.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA

2020-07-03 Thread Bhupesh Sharma
Hi Chen,

On Fri, Jul 3, 2020 at 10:54 AM chenzhou  wrote:
>
> Hi Bhupesh,
>
>
> On 2020/7/3 3:22, Bhupesh Sharma wrote:
> > Hi Will,
> >
> > On Thu, Jul 2, 2020 at 1:20 PM Will Deacon  wrote:
> >> On Thu, Jul 02, 2020 at 03:44:20AM +0530, Bhupesh Sharma wrote:
> >>> commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in
> >>> ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32.
> >>>
> >>> However as reported by Prabhakar, this breaks kdump kernel booting in
> >>> ThunderX2 like arm64 systems. I have noticed this on another ampere
> >>> arm64 machine. The OOM log in the kdump kernel looks like this:
> >>>
> >>>   [0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
> >>> allocations
> >>>   [0.247713] swapper/0: page allocation failure: order:1, 
> >>> mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >>>   <..snip..>
> >>>   [0.274706] Call trace:
> >>>   [0.277170]  dump_backtrace+0x0/0x208
> >>>   [0.280863]  show_stack+0x1c/0x28
> >>>   [0.284207]  dump_stack+0xc4/0x10c
> >>>   [0.287638]  warn_alloc+0x104/0x170
> >>>   [0.291156]  __alloc_pages_slowpath.constprop.106+0xb08/0xb48
> >>>   [0.296958]  __alloc_pages_nodemask+0x2ac/0x2f8
> >>>   [0.301530]  alloc_page_interleave+0x20/0x90
> >>>   [0.305839]  alloc_pages_current+0xdc/0xf8
> >>>   [0.309972]  atomic_pool_expand+0x60/0x210
> >>>   [0.314108]  __dma_atomic_pool_init+0x50/0xa4
> >>>   [0.318504]  dma_atomic_pool_init+0xac/0x158
> >>>   [0.322813]  do_one_initcall+0x50/0x218
> >>>   [0.326684]  kernel_init_freeable+0x22c/0x2d0
> >>>   [0.331083]  kernel_init+0x18/0x110
> >>>   [0.334600]  ret_from_fork+0x10/0x18
> >>>
> >>> This patch limits the crashkernel allocation to the first 1GB of
> >>> the RAM accessible (ZONE_DMA), as otherwise we might run into OOM
> >>> issues when crashkernel is executed, as it might have been originally
> >>> allocated from either a ZONE_DMA32 memory or mixture of memory chunks
> >>> belonging to both ZONE_DMA and ZONE_DMA32.
> >> How does this interact with this ongoing series:
> >>
> >> https://lore.kernel.org/r/20200628083458.40066-1-chenzho...@huawei.com
> >>
> >> (patch 4, in particular)
> > Many thanks for having a look at this patchset. I was not aware that
> > Chen had sent out a new version.
> > I had noted in the v9 review of the high/low range allocation
> > <https://lists.gt.net/linux/kernel/3726052#3726052> that I was working
> > on a generic solution (irrespective of the crashkernel, low and high
> > range allocation) which resulted in this patchset.
> >
> > The issue is two-fold: OOPs in memcfg layer (PATCH 1/2, which has been
> > Acked-by memcfg maintainer) and OOM in the kdump kernel due to
> > crashkernel allocation in ZONE_DMA32 regions(s) which is addressed by
> > this PATCH.
> >
> > I will have a closer look at the v10 patchset Chen shared, but seems
> > it needs some rework as per Dave's review comments which he shared
> > today.
> > IMO, in the meanwhile this patchset  can be used to fix the existing
> > kdump issue with upstream kernel.
> Thanks for your work.
> There is no progress on the issue for long time, so i sent my solution in v8 
> comments
> and sent v9 recently.

Thanks a lot for your inputs. Well, I was working on the OOPs seen
with cgroups layer even when the memory cgroup is disabled via kdump
command line. As the cgroup maintainer also noted during the review of
PATCH 1/2 of this series, it's quite a corner case and hence hard to
debug. Hence the delay in sending out this series.

> I think direct limiting the crashkernel in ZONE_DMA isn't a good idea:
> 1. For parameter "crashkernel=Y", reserving crashkernel in first 1G memory 
> will increase
> the probability of memory allocation failure.
> Previous discuss from https://lkml.org/lkml/2019/10/21/725:
> "With ZONE_DMA=y, this config will fail to reserve 512M CMA on a server"

That is correct. However, we have limited options anyways at the
moment, hence the need for the crashkernel hi/low support series which
you are already working on. Unfortunately as I noted in the review of
the v10 series today, it still needs rework to fix
the OOM issue seen on ThunderX2 and ampere boards with crashkernel=X
kind of format.

See <http://lists.inf

Re: [PATCH v10 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-07-03 Thread Bhupesh Sharma
Hi Chen,

On Fri, Jul 3, 2020 at 9:24 AM Chen Zhou  wrote:
>
> This patch series enable reserving crashkernel above 4G in arm64.
>
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> when there is no enough low memory.
> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> will boot failure because there is no low memory available for allocation.
> 3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
> the arm64 kdump. If the memory reserved for crash dump kernel falled in
> ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, introduce crashkernel=X,low to reserve specified
> size low memory.
> Crashkernel=X tries to reserve memory for the crash dump kernel under
> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> size low memory for crash kdump kernel devices firstly and then reserve
> memory above 4G.
>
> When crashkernel is reserved above 4G in memory and crashkernel=X,low
> is specified simultaneously, kernel should reserve specified size low memory
> for crash dump kernel devices. So there may be two crash kernel regions, one
> is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use of
> kexec-tools, rename the low region as "Crash kernel (low)", and pass the
> low region by reusing DT property "linux,usable-memory-range". We made the low
> memory region as the last range of "linux,usable-memory-range" to keep
> compatibility with existing user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> The previous changes and discussions can be retrieved from:
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
>
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
>
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
>
> Chen Zhou (5):
>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>   arm64: kdump: add memory for devices by DT property
> linux,usable-memory-range
>   arm64: kdump: fix kdump broken with ZONE_DMA reintroduced
>   kdump: update Documentation about 

Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-07-02 Thread Bhupesh Sharma
Hi Catalin,

On Fri, May 15, 2020 at 2:44 PM Bhupesh Sharma  wrote:
>
> Hi Arnd,
>
> On Thu, Apr 30, 2020 at 10:05 AM Bhupesh Sharma  wrote:
> >
> > On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas  
> > wrote:
> > >
> > > On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > > > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  
> > > > wrote:
> > > > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > > > >  arch/arm64/configs/defconfig | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/arch/arm64/configs/defconfig 
> > > > > > b/arch/arm64/configs/defconfig
> > > > > > index 24e534d85045..fa122f4341a2 100644
> > > > > > --- a/arch/arm64/configs/defconfig
> > > > > > +++ b/arch/arm64/configs/defconfig
> > > > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > > > >  CONFIG_NUMA=y
> > > > > >  CONFIG_SECCOMP=y
> > > > > >  CONFIG_KEXEC=y
> > > > > > +CONFIG_KEXEC_FILE=y
> > > > > >  CONFIG_CRASH_DUMP=y
> > > > > >  CONFIG_XEN=y
> > > > > >  CONFIG_COMPAT=y
> > > > > > --
> > > > > > 2.7.4
> > > >
> > > > Thanks a lot  Mark.
> > > >
> > > > Hi Catalin, Will,
> > > >
> > > > Can you please help pick this patch in the arm tree. We have an
> > > > increasing number of user-cases from distro users
> > > > who want to use kexec_file_load() as the default interface for
> > > > kexec/kdump on arm64.
> > >
> > > We could pick it up if it doesn't conflict with the arm-soc tree. They
> > > tend to pick most of the defconfig changes these days (and could as well
> > > pick this one).
> >
> > Thanks Catalin.
> > (+Cc Arnd)
> >
> > Hi Arnd,
> >
> > Can you please help pick this change via the arm-soc tree?
>
> Ping. Any updates on this defconfig patch.

Ping. Seems there is no reply from Arnd on this patch.
Can you please help pull in this one as well. It has been pending for
quite some time now.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA

2020-07-02 Thread Bhupesh Sharma
Hi Will,

On Thu, Jul 2, 2020 at 1:20 PM Will Deacon  wrote:
>
> On Thu, Jul 02, 2020 at 03:44:20AM +0530, Bhupesh Sharma wrote:
> > commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in
> > ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32.
> >
> > However as reported by Prabhakar, this breaks kdump kernel booting in
> > ThunderX2 like arm64 systems. I have noticed this on another ampere
> > arm64 machine. The OOM log in the kdump kernel looks like this:
> >
> >   [0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
> > allocations
> >   [0.247713] swapper/0: page allocation failure: order:1, 
> > mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >   <..snip..>
> >   [0.274706] Call trace:
> >   [0.277170]  dump_backtrace+0x0/0x208
> >   [0.280863]  show_stack+0x1c/0x28
> >   [0.284207]  dump_stack+0xc4/0x10c
> >   [0.287638]  warn_alloc+0x104/0x170
> >   [0.291156]  __alloc_pages_slowpath.constprop.106+0xb08/0xb48
> >   [0.296958]  __alloc_pages_nodemask+0x2ac/0x2f8
> >   [0.301530]  alloc_page_interleave+0x20/0x90
> >   [0.305839]  alloc_pages_current+0xdc/0xf8
> >   [0.309972]  atomic_pool_expand+0x60/0x210
> >   [0.314108]  __dma_atomic_pool_init+0x50/0xa4
> >   [0.318504]  dma_atomic_pool_init+0xac/0x158
> >   [0.322813]  do_one_initcall+0x50/0x218
> >   [0.326684]  kernel_init_freeable+0x22c/0x2d0
> >   [0.331083]  kernel_init+0x18/0x110
> >   [0.334600]  ret_from_fork+0x10/0x18
> >
> > This patch limits the crashkernel allocation to the first 1GB of
> > the RAM accessible (ZONE_DMA), as otherwise we might run into OOM
> > issues when crashkernel is executed, as it might have been originally
> > allocated from either a ZONE_DMA32 memory or mixture of memory chunks
> > belonging to both ZONE_DMA and ZONE_DMA32.
>
> How does this interact with this ongoing series:
>
> https://lore.kernel.org/r/20200628083458.40066-1-chenzho...@huawei.com
>
> (patch 4, in particular)

Many thanks for having a look at this patchset. I was not aware that
Chen had sent out a new version.
I had noted in the v9 review of the high/low range allocation
<https://lists.gt.net/linux/kernel/3726052#3726052> that I was working
on a generic solution (irrespective of the crashkernel, low and high
range allocation) which resulted in this patchset.

The issue is two-fold: OOPs in memcfg layer (PATCH 1/2, which has been
Acked-by memcfg maintainer) and OOM in the kdump kernel due to
crashkernel allocation in ZONE_DMA32 regions(s) which is addressed by
this PATCH.

I will have a closer look at the v10 patchset Chen shared, but seems
it needs some rework as per Dave's review comments which he shared
today.
IMO, in the meanwhile this patchset  can be used to fix the existing
kdump issue with upstream kernel.

> > Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
> > Cc: Johannes Weiner 
> > Cc: Michal Hocko 
> > Cc: Vladimir Davydov 
> > Cc: James Morse 
> > Cc: Mark Rutland 
> > Cc: Will Deacon 
> > Cc: Catalin Marinas 
> > Cc: cgro...@vger.kernel.org
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: kexec@lists.infradead.org
> > Reported-by: Prabhakar Kushwaha 
> > Signed-off-by: Bhupesh Sharma 
> > ---
> >  arch/arm64/mm/init.c | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 1e93cfc7c47a..02ae4d623802 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -91,8 +91,15 @@ static void __init reserve_crashkernel(void)
> >   crash_size = PAGE_ALIGN(crash_size);
> >
> >   if (crash_base == 0) {
> > - /* Current arm64 boot protocol requires 2MB alignment */
> > - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> > + /* Current arm64 boot protocol requires 2MB alignment.
> > +  * Also limit the crashkernel allocation to the first
> > +  * 1GB of the RAM accessible (ZONE_DMA), as otherwise we
> > +  * might run into OOM issues when crashkernel is executed,
> > +  * as it might have been originally allocated from
> > +  * either a ZONE_DMA32 memory or mixture of memory
> > +  * chunks belonging to both ZONE_DMA and ZONE_DMA32.
> > +  */
>
> This comment needs help. Why does putting the crashkernel in ZONE_DMA
> prevent "OOM issues"?

Sure, I can work on adding more details in the comment so that it
explains the potential OOM issue(s) better.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] mm/memcontrol: Fix OOPS inside mem_cgroup_get_nr_swap_pages()

2020-07-02 Thread Bhupesh Sharma
Hi Michal,

On Thu, Jul 2, 2020 at 11:30 AM Michal Hocko  wrote:
>
> On Thu 02-07-20 03:44:19, Bhupesh Sharma wrote:
> > Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
> > function in a corner case seen on some arm64 boards when kdump kernel
> > runs with "cgroup_disable=memory" passed to the kdump kernel via
> > bootargs.
> >
> > The root-cause behind the same is that currently mem_cgroup_swap_init()
> > function is implemented as a subsys_initcall() call instead of a
> > core_initcall(), this means 'cgroup_memory_noswap' still
> > remains set to the default value (false) even when memcg is disabled via
> > "cgroup_disable=memory" boot parameter.
> >
> > This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages()
> > function in corner cases:
> >
> >   [0.265617] Unable to handle kernel NULL pointer dereference at 
> > virtual address 0188
> >   [0.274495] Mem abort info:
> >   [0.277311]   ESR = 0x9606
> >   [0.280389]   EC = 0x25: DABT (current EL), IL = 32 bits
> >   [0.285751]   SET = 0, FnV = 0
> >   [0.288830]   EA = 0, S1PTW = 0
> >   [0.291995] Data abort info:
> >   [0.294897]   ISV = 0, ISS = 0x0006
> >   [0.298765]   CM = 0, WnR = 0
> >   [0.301757] [0188] user address but active_mm is swapper
> >   [0.308174] Internal error: Oops: 9606 [#1] SMP
> >   [0.313097] Modules linked in:
> >   <..snip..>
> >   [0.331384] pstate: 0049 (nzcv daif +PAN -UAO BTYPE=--)
> >   [0.337014] pc : mem_cgroup_get_nr_swap_pages+0x9c/0xf4
> >   [0.342289] lr : mem_cgroup_get_nr_swap_pages+0x68/0xf4
> >   [0.347564] sp : fe0012b6f800
> >   [0.350905] x29: fe0012b6f800 x28: fe00116b3000
> >   [0.356268] x27: fe0012b6fb00 x26: 0020
> >   [0.361631] x25:  x24: fc00723ffe28
> >   [0.366994] x23: fe0010d5b468 x22: fe00116bfa00
> >   [0.372357] x21: fe0010aabda8 x20: 
> >   [0.377720] x19:  x18: 0010
> >   [0.383082] x17: 43e612f2 x16: a9863ed7
> >   [0.388445] x15:  x14: 202c303d70617773
> >   [0.393808] x13: 6f6e5f79726f6d65 x12: 6d5f70756f726763
> >   [0.399170] x11: 2073656761705f70 x10: 6177735f726e5f74
> >   [0.404533] x9 : fe00100e9580 x8 : fe0010628160
> >   [0.409895] x7 : 00a8 x6 : fe00118f5e5e
> >   [0.415258] x5 : 0001 x4 : 
> >   [0.420621] x3 :  x2 : 
> >   [0.425983] x1 :  x0 : fc0060079000
> >   [0.431346] Call trace:
> >   [0.433809]  mem_cgroup_get_nr_swap_pages+0x9c/0xf4
> >   [0.438735]  shrink_lruvec+0x404/0x4f8
> >   [0.442516]  shrink_node+0x1a8/0x688
> >   [0.446121]  do_try_to_free_pages+0xe8/0x448
> >   [0.450429]  try_to_free_pages+0x110/0x230
> >   [0.454563]  __alloc_pages_slowpath.constprop.106+0x2b8/0xb48
> >   [0.460366]  __alloc_pages_nodemask+0x2ac/0x2f8
> >   [0.464938]  alloc_page_interleave+0x20/0x90
> >   [0.469246]  alloc_pages_current+0xdc/0xf8
> >   [0.473379]  atomic_pool_expand+0x60/0x210
> >   [0.477514]  __dma_atomic_pool_init+0x50/0xa4
> >   [0.481910]  dma_atomic_pool_init+0xac/0x158
> >   [0.486220]  do_one_initcall+0x50/0x218
> >   [0.490091]  kernel_init_freeable+0x22c/0x2d0
> >   [0.494489]  kernel_init+0x18/0x110
> >   [0.498007]  ret_from_fork+0x10/0x18
> >   [0.501614] Code: aa1403e3 91106000 97f82a27 1411 (f940c663)
> >   [0.507770] ---[ end trace 9795948475817de4 ]---
> >   [0.512429] Kernel panic - not syncing: Fatal exception
> >   [0.517705] Rebooting in 10 seconds..
> >
> > Cc: Johannes Weiner 
> > Cc: Michal Hocko 
> > Cc: Vladimir Davydov 
> > Cc: James Morse 
> > Cc: Mark Rutland 
> > Cc: Will Deacon 
> > Cc: Catalin Marinas 
> > Cc: cgro...@vger.kernel.org
> > Cc: linux...@kvack.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: kexec@lists.infradead.org
>
> Fixes: eccb52e78809 ("mm: memcontrol: prepare swap controller setup for 
> integration")
>
> > Reported-by: Prabhakar Kushwaha 
> > Signed-off-by: Bhupesh Sharma 
>
> This is subtle as hell, I have to say. I find the ordering in the init
> calls very unintuitive and extremely hard to follow. The 

Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2020-07-02 Thread Bhupesh Sharma
On Thu, Jul 2, 2020 at 10:45 PM Catalin Marinas  wrote:
>
> On Thu, 14 May 2020 00:22:35 +0530, Bhupesh Sharma wrote:
> > Apologies for the delayed update. Its been quite some time since I
> > posted the last version (v5), but I have been really caught up in some
> > other critical issues.
> >
> > Changes since v5:
> > 
> > - v5 can be viewed here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> > - Addressed review comments from James Morse and Boris.
> > - Added Tested-by received from John on v5 patchset.
> > - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
> >   patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
> >   applied.
> >
> > [...]
>
> Applied to arm64 (for-next/vmcoreinfo), thanks!
>
> [1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
>   https://git.kernel.org/arm64/c/1d50e5d0c505
> [2/2] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
>   https://git.kernel.org/arm64/c/bbdbc11804ff

Thanks Catalin for pulling in the changes.

Dave and James, many thanks for reviewing the same as well.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA

2020-07-01 Thread Bhupesh Sharma
commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in
ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32.

However as reported by Prabhakar, this breaks kdump kernel booting in
ThunderX2 like arm64 systems. I have noticed this on another ampere
arm64 machine. The OOM log in the kdump kernel looks like this:

  [0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
allocations
  [0.247713] swapper/0: page allocation failure: order:1, 
mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
  <..snip..>
  [0.274706] Call trace:
  [0.277170]  dump_backtrace+0x0/0x208
  [0.280863]  show_stack+0x1c/0x28
  [0.284207]  dump_stack+0xc4/0x10c
  [0.287638]  warn_alloc+0x104/0x170
  [0.291156]  __alloc_pages_slowpath.constprop.106+0xb08/0xb48
  [0.296958]  __alloc_pages_nodemask+0x2ac/0x2f8
  [0.301530]  alloc_page_interleave+0x20/0x90
  [0.305839]  alloc_pages_current+0xdc/0xf8
  [0.309972]  atomic_pool_expand+0x60/0x210
  [0.314108]  __dma_atomic_pool_init+0x50/0xa4
  [0.318504]  dma_atomic_pool_init+0xac/0x158
  [0.322813]  do_one_initcall+0x50/0x218
  [0.326684]  kernel_init_freeable+0x22c/0x2d0
  [0.331083]  kernel_init+0x18/0x110
  [0.334600]  ret_from_fork+0x10/0x18

This patch limits the crashkernel allocation to the first 1GB of
the RAM accessible (ZONE_DMA), as otherwise we might run into OOM
issues when crashkernel is executed, as it might have been originally
allocated from either a ZONE_DMA32 memory or mixture of memory chunks
belonging to both ZONE_DMA and ZONE_DMA32.

Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Reported-by: Prabhakar Kushwaha 
Signed-off-by: Bhupesh Sharma 
---
 arch/arm64/mm/init.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1e93cfc7c47a..02ae4d623802 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -91,8 +91,15 @@ static void __init reserve_crashkernel(void)
crash_size = PAGE_ALIGN(crash_size);
 
if (crash_base == 0) {
-   /* Current arm64 boot protocol requires 2MB alignment */
-   crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
+   /* Current arm64 boot protocol requires 2MB alignment.
+* Also limit the crashkernel allocation to the first
+* 1GB of the RAM accessible (ZONE_DMA), as otherwise we
+* might run into OOM issues when crashkernel is executed,
+* as it might have been originally allocated from
+* either a ZONE_DMA32 memory or mixture of memory
+* chunks belonging to both ZONE_DMA and ZONE_DMA32.
+*/
+   crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
crash_size, SZ_2M);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
@@ -101,6 +108,11 @@ static void __init reserve_crashkernel(void)
}
} else {
/* User specifies base address explicitly. */
+   if (crash_base + crash_size > arm64_dma_phys_limit) {
+   pr_warn("cannot reserve crashkernel: region is 
allocatable only in ZONE_DMA range\n");
+   return;
+   }
+
if (!memblock_is_region_memory(crash_base, crash_size)) {
pr_warn("cannot reserve crashkernel: region is not 
memory\n");
return;
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2] arm64/kdump: Fix OOPS and OOM issues in kdump kernel

2020-07-01 Thread Bhupesh Sharma
Prabhakar recently reported a kdump kernel boot failure on ThunderX2
arm64 plaforms (which I was able to reproduce on ampere arm64 machines
as well), (see [1]), which is seen when a corner case is hit on some
arm64 boards when kdump kernel runs with "cgroup_disable=memory" passed
to the kdump kernel (via bootargs) and the crashkernel was originally
allocated from either a ZONE_DMA32 memory or mixture of memory chunks
belonging to both ZONE_DMA and ZONE_DMA32 regions.

While [PATCH 1/2] fixes the OOPS inside mem_cgroup_get_nr_swap_pages()
function, [PATCH 2/2] fixes the OOM seen inside the kdump kernel by
allocating the crashkernel inside ZONE_DMA region only.

[1]. https://marc.info/?l=kexec=158954035710703=4

Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Reported-by: Prabhakar Kushwaha 
Signed-off-by: Bhupesh Sharma 

Bhupesh Sharma (2):
  mm/memcontrol: Fix OOPS inside mem_cgroup_get_nr_swap_pages()
  arm64: Allocate crashkernel always in ZONE_DMA

 arch/arm64/mm/init.c | 16 ++--
 mm/memcontrol.c  |  9 -
 2 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] mm/memcontrol: Fix OOPS inside mem_cgroup_get_nr_swap_pages()

2020-07-01 Thread Bhupesh Sharma
Prabhakar reported an OOPS inside mem_cgroup_get_nr_swap_pages()
function in a corner case seen on some arm64 boards when kdump kernel
runs with "cgroup_disable=memory" passed to the kdump kernel via
bootargs.

The root-cause behind the same is that currently mem_cgroup_swap_init()
function is implemented as a subsys_initcall() call instead of a
core_initcall(), this means 'cgroup_memory_noswap' still
remains set to the default value (false) even when memcg is disabled via
"cgroup_disable=memory" boot parameter.

This may result in premature OOPS inside mem_cgroup_get_nr_swap_pages()
function in corner cases:

  [0.265617] Unable to handle kernel NULL pointer dereference at virtual 
address 0188
  [0.274495] Mem abort info:
  [0.277311]   ESR = 0x9606
  [0.280389]   EC = 0x25: DABT (current EL), IL = 32 bits
  [0.285751]   SET = 0, FnV = 0
  [0.288830]   EA = 0, S1PTW = 0
  [0.291995] Data abort info:
  [0.294897]   ISV = 0, ISS = 0x0006
  [0.298765]   CM = 0, WnR = 0
  [0.301757] [0188] user address but active_mm is swapper
  [0.308174] Internal error: Oops: 9606 [#1] SMP
  [0.313097] Modules linked in:
  <..snip..>
  [0.331384] pstate: 0049 (nzcv daif +PAN -UAO BTYPE=--)
  [0.337014] pc : mem_cgroup_get_nr_swap_pages+0x9c/0xf4
  [0.342289] lr : mem_cgroup_get_nr_swap_pages+0x68/0xf4
  [0.347564] sp : fe0012b6f800
  [0.350905] x29: fe0012b6f800 x28: fe00116b3000
  [0.356268] x27: fe0012b6fb00 x26: 0020
  [0.361631] x25:  x24: fc00723ffe28
  [0.366994] x23: fe0010d5b468 x22: fe00116bfa00
  [0.372357] x21: fe0010aabda8 x20: 
  [0.377720] x19:  x18: 0010
  [0.383082] x17: 43e612f2 x16: a9863ed7
  [0.388445] x15:  x14: 202c303d70617773
  [0.393808] x13: 6f6e5f79726f6d65 x12: 6d5f70756f726763
  [0.399170] x11: 2073656761705f70 x10: 6177735f726e5f74
  [0.404533] x9 : fe00100e9580 x8 : fe0010628160
  [0.409895] x7 : 00a8 x6 : fe00118f5e5e
  [0.415258] x5 : 0001 x4 : 
  [0.420621] x3 :  x2 : 
  [0.425983] x1 :  x0 : fc0060079000
  [0.431346] Call trace:
  [0.433809]  mem_cgroup_get_nr_swap_pages+0x9c/0xf4
  [0.438735]  shrink_lruvec+0x404/0x4f8
  [0.442516]  shrink_node+0x1a8/0x688
  [0.446121]  do_try_to_free_pages+0xe8/0x448
  [0.450429]  try_to_free_pages+0x110/0x230
  [0.454563]  __alloc_pages_slowpath.constprop.106+0x2b8/0xb48
  [0.460366]  __alloc_pages_nodemask+0x2ac/0x2f8
  [0.464938]  alloc_page_interleave+0x20/0x90
  [0.469246]  alloc_pages_current+0xdc/0xf8
  [0.473379]  atomic_pool_expand+0x60/0x210
  [0.477514]  __dma_atomic_pool_init+0x50/0xa4
  [0.481910]  dma_atomic_pool_init+0xac/0x158
  [0.486220]  do_one_initcall+0x50/0x218
  [0.490091]  kernel_init_freeable+0x22c/0x2d0
  [0.494489]  kernel_init+0x18/0x110
  [0.498007]  ret_from_fork+0x10/0x18
  [0.501614] Code: aa1403e3 91106000 97f82a27 1411 (f940c663)
  [0.507770] ---[ end trace 9795948475817de4 ]---
  [0.512429] Kernel panic - not syncing: Fatal exception
  [0.517705] Rebooting in 10 seconds..

Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Reported-by: Prabhakar Kushwaha 
Signed-off-by: Bhupesh Sharma 
---
 mm/memcontrol.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 19622328e4b5..8323e4b7b390 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7186,6 +7186,13 @@ static struct cftype memsw_files[] = {
{ },/* terminate */
 };
 
+/*
+ * If mem_cgroup_swap_init() is implemented as a subsys_initcall()
+ * instead of a core_initcall(), this could mean cgroup_memory_noswap still
+ * remains set to false even when memcg is disabled via "cgroup_disable=memory"
+ * boot parameter. This may result in premature OOPS inside 
+ * mem_cgroup_get_nr_swap_pages() function in corner cases.
+ */
 static int __init mem_cgroup_swap_init(void)
 {
/* No memory control -> no swap control */
@@ -7200,6 +7207,6 @@ static int __init mem_cgroup_swap_init(void)
 
return 0;
 }
-subsys_initcall(mem_cgroup_swap_init);
+core_initcall(mem_cgroup_swap_init);
 
 #endif /* CONFIG_MEMCG_SWAP */
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64: continue loading even if kaslr-seed is not wiped to zero

2020-06-17 Thread Bhupesh Sharma
Hi Xunlin,

On Tue, Jun 16, 2020 at 9:03 AM Xulin Sun  wrote:
>
> The commit c3f043241a866a (arm64: Add support to supply 'kaslr-seed' to 
> secondary kernel)
> add kaslr-seed support. And consider the primary kernel reads the 'kaslr-seed'
> and wipes it to 0. But in the situation, 'CONFIG_RANDOMIZE_BASE' was not set 
> to
> y in the primary kernel and ATF firmware has set the 'kaslr-seed' dtb property
> with non-zero, Thus will return error.
>
> So in the above case, continue loading the segments with no kaslr
> supported situation.
>
> Signed-off-by: Xulin Sun 
> ---
>  kexec/arch/arm64/kexec-arm64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> index 2992bce..540f4d7 100644
> --- a/kexec/arch/arm64/kexec-arm64.c
> +++ b/kexec/arch/arm64/kexec-arm64.c
> @@ -506,8 +506,7 @@ static int setup_2nd_dtb(struct dtb *dtb, char 
> *command_line, int on_crash)
> if (kaslr_seed != 0) {
> dbgprintf("%s: kaslr-seed is not wiped to 0.\n",
> __func__);
> -   result = -EINVAL;
> -   goto on_error;
> +   goto unable_kaslr;
> }
>
> /*
> @@ -550,6 +549,7 @@ static int setup_2nd_dtb(struct dtb *dtb, char 
> *command_line, int on_crash)
> }
> }
>
> +unable_kaslr:
> if (on_crash) {
> /* add linux,elfcorehdr */
> nodeoffset = fdt_path_offset(new_buf, "/chosen");
> --
> 2.17.1

Sorry, but this seems like an ATF issue which you are trying to fix in
kexec-tools.
See 'Documentation/devicetree/bindings/chosen.txt' for the details of
the 'kaslr-seed' property:

kaslr-seed
---

This property is used when booting with CONFIG_RANDOMIZE_BASE as the
entropy used to randomize the kernel image base address location. Since
it is used directly, this value is intended only for KASLR, and should
not be used for other purposes (as it may leak information about KASLR
offsets). It is parsed as a u64 value, e.g.

/ {
chosen {
kaslr-seed = <0xfeedbeef 0xc0def00d>;
};
};

So, if CONFIG_RANDOMIZE_BASE (or kaslr) is disabled, this value should
not be available in the patched DTB read from the kernel, as otherwise
we have a possible security vulnerability as we are leaking out the
kernel text address which can be used by snooping applications to
inject malicious code in the kernel.

For example on my qualcomm arm64 platform, if CONFIG_RANDOMIZE_BASE is
set to n, user-space tools like 'dtc' are not able to find the
'kaslr-seed' property in the /chosen node (and same is the case with
kexec-tools):

# dtc -I dtb -O dts /sys/firmware/fdt | grep -A 10 -i chosen
chosen {
   ... no 'kaslr-seed' node
}

Also confirm that the kernel symbol base address is not randomized on
successive reboots (in this case) via:
# cat /proc/kallsyms

Whereas, if I boot the kernel with CONFIG_RANDOMIZE_BASE (or kaslr)
enabled, I can see that the 'kaslr-seed' property is wiped to '0' (as
expected by the kernel') and can be seen in the output of:

# dtc -I dtb -O dts /sys/firmware/fdt | grep -A 10 -i chosen
chosen {
kaslr-seed = <0x0 0x0>
}

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-06-16 Thread Bhupesh Sharma
Hello Bharat,

On Wed, Jun 10, 2020 at 10:17 PM Bharat Gooty  wrote:
>
> Hello Bhupesh,
> V6 patch set on Linux 5.7, did not help.
> I have applied makedump file
> http://lists.infradead.org/pipermail/kexec/2019-November/023963.html changes
> also (makedump-1.6.6). Tried to apply it on makedumpfile 1.6.7.  Patch set_2
> failed. Would like to know, if you have V5 patch set for makedump file
> changes. With makedump 1.6.6, able to collect the vmore file.
> I used latest crash utility
> (https://www.redhat.com/archives/crash-utility/2019-November/msg00014.html
> changes are present)
> When I used crash utility, following is the error:
>
> Thanks,
> -Bharat
>
>
> -Original Message-
> From: Scott Branden [mailto:scott.bran...@broadcom.com]
> Sent: Thursday, April 30, 2020 4:34 AM
> To: Bhupesh Sharma; Amit Kachhap
> Cc: Mark Rutland; x...@kernel.org; Will Deacon; Linux Doc Mailing List;
> Catalin Marinas; Ard Biesheuvel; kexec mailing list; Linux Kernel Mailing
> List; Kazuhito Hagio; James Morse; Dave Anderson; bhupesh linux;
> linuxppc-...@lists.ozlabs.org; linux-arm-kernel; Steve Capper; Ray Jui;
> Bharat Gooty
> Subject: Re: Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ
> in vmcoreinfo
>
> Hi Bhupesh,
>
> On 2020-02-23 10:25 p.m., Bhupesh Sharma wrote:
> > Hi Amit,
> >
> > On Fri, Feb 21, 2020 at 2:36 PM Amit Kachhap  wrote:
> >> Hi Bhupesh,
> >>
> >> On 1/13/20 5:44 PM, Bhupesh Sharma wrote:
> >>> Hi James,
> >>>
> >>> On 01/11/2020 12:30 AM, Dave Anderson wrote:
> >>>> - Original Message -
> >>>>> Hi Bhupesh,
> >>>>>
> >>>>> On 25/12/2019 19:01, Bhupesh Sharma wrote:
> >>>>>> On 12/12/2019 04:02 PM, James Morse wrote:
> >>>>>>> On 29/11/2019 19:59, Bhupesh Sharma wrote:
> >>>>>>>> vabits_actual variable on arm64 indicates the actual VA space size,
> >>>>>>>> and allows a single binary to support both 48-bit and 52-bit VA
> >>>>>>>> spaces.
> >>>>>>>>
> >>>>>>>> If the ARMv8.2-LVA optional feature is present, and we are running
> >>>>>>>> with a 64KB page size; then it is possible to use 52-bits of
> >>>>>>>> address
> >>>>>>>> space for both userspace and kernel addresses. However, any kernel
> >>>>>>>> binary that supports 52-bit must also be able to fall back to
> >>>>>>>> 48-bit
> >>>>>>>> at early boot time if the hardware feature is not present.
> >>>>>>>>
> >>>>>>>> Since TCR_EL1.T1SZ indicates the size offset of the memory region
> >>>>>>>> addressed by TTBR1_EL1 (and hence can be used for determining the
> >>>>>>>> vabits_actual value) it makes more sense to export the same in
> >>>>>>>> vmcoreinfo rather than vabits_actual variable, as the name of the
> >>>>>>>> variable can change in future kernel versions, but the
> >>>>>>>> architectural
> >>>>>>>> constructs like TCR_EL1.T1SZ can be used better to indicate
> >>>>>>>> intended
> >>>>>>>> specific fields to user-space.
> >>>>>>>>
> >>>>>>>> User-space utilities like makedumpfile and crash-utility, need to
> >>>>>>>> read/write this value from/to vmcoreinfo
> >>>>>>> (write?)
> >>>>>> Yes, also write so that the vmcoreinfo from an (crashing) arm64
> >>>>>> system can
> >>>>>> be used for
> >>>>>> analysis of the root-cause of panic/crash on say an x86_64 host using
> >>>>>> utilities like
> >>>>>> crash-utility/gdb.
> >>>>> I read this as as "User-space [...] needs to write to vmcoreinfo".
> >>> That's correct. But for writing to vmcore dump in the kdump kernel, we
> >>> need to read the symbols from the vmcoreinfo in the primary kernel.
> >>>
> >>>>>>>> for determining if a virtual address lies in the linear map range.
> >>>>>>> I think this is a fragile example. The debugger shouldn't need to
> >>>>>>> know
> >>>>>>> this.
> >>>>>> Well that the current 

Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2020-06-15 Thread Bhupesh Sharma
Hello Catalin, Will,

On Tue, Jun 2, 2020 at 10:54 AM Bhupesh Sharma  wrote:
>
> Hello,
>
> On Thu, May 14, 2020 at 12:22 AM Bhupesh Sharma  wrote:
> >
> > Apologies for the delayed update. Its been quite some time since I
> > posted the last version (v5), but I have been really caught up in some
> > other critical issues.
> >
> > Changes since v5:
> > 
> > - v5 can be viewed here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> > - Addressed review comments from James Morse and Boris.
> > - Added Tested-by received from John on v5 patchset.
> > - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
> >   patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
> >   applied.
> >
> > Changes since v4:
> > 
> > - v4 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
> > - Addressed comments from Dave and added patches for documenting
> >   new variables appended to vmcoreinfo documentation.
> > - Added testing report shared by Akashi for PATCH 2/5.
> >
> > Changes since v3:
> > 
> > - v3 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-March/022590.html
> > - Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo
> >   instead of PTRS_PER_PGD.
> > - Added a new patch (via [PATCH 3/3]), which fixes a simple typo in
> >   'Documentation/arm64/memory.rst'
> >
> > Changes since v2:
> > 
> > - v2 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
> > - Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
> >   ifdef sections, as suggested by Kazu.
> > - Updated vmcoreinfo documentation to add description about
> >   'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).
> >
> > Changes since v1:
> > 
> > - v1 was sent out as a single patch which can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-February/022411.html
> >
> > - v2 breaks the single patch into two independent patches:
> >   [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
> >   [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code 
> > (all archs)
> >
> > This patchset primarily fixes the regression reported in user-space
> > utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
> > with the availability of 52-bit address space feature in underlying
> > kernel. These regressions have been reported both on CPUs which don't
> > support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
> > and also on prototype platforms (like ARMv8 FVP simulator model) which
> > support ARMv8.2 extensions and are running newer kernels.
> >
> > The reason for these regressions is that right now user-space tools
> > have no direct access to these values (since these are not exported
> > from the kernel) and hence need to rely on a best-guess method of
> > determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported
> > by underlying kernel.
> >
> > Exporting these values via vmcoreinfo will help user-land in such cases.
> > In addition, as per suggestion from makedumpfile maintainer (Kazu),
> > it makes more sense to append 'MAX_PHYSMEM_BITS' to
> > vmcoreinfo in the core code itself rather than in arm64 arch-specific
> > code, so that the user-space code for other archs can also benefit from
> > this addition to the vmcoreinfo and use it as a standard way of
> > determining 'SECTIONS_SHIFT' value in user-land.
> >
> > Cc: Boris Petkov 
> > Cc: Ingo Molnar 
> > Cc: Thomas Gleixner 
> > Cc: Jonathan Corbet 
> > Cc: James Morse 
> > Cc: Mark Rutland 
> > Cc: Will Deacon 
> > Cc: Steve Capper 
> > Cc: Catalin Marinas 
> > Cc: Ard Biesheuvel 
> > Cc: Michael Ellerman 
> > Cc: Paul Mackerras 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Dave Anderson 
> > Cc: Kazuhito Hagio 
> > Cc: John Donnelly 
> > Cc: scott.bran...@broadcom.com
> > Cc: Amit Kachhap 
> > Cc: x...@kernel.org
> > Cc: linuxppc-...@lists.ozlabs.org
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: linux-...@vger.kernel.org
> > Cc: kexec@lists.infradead.org
> >
> > Bhupesh Sharma (2):
> >   crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
> >   arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinf

Re: [PATCH] kexec: dump kmessage before machine_kexec

2020-06-08 Thread Bhupesh Sharma
Hi Pavel,

On Sat, Jun 6, 2020 at 1:16 AM Pavel Tatashin  wrote:
>
> kmsg_dump(KMSG_DUMP_SHUTDOWN) is called before
> machine_restart(), machine_halt(), machine_power_off(), the only one that
> is missing is  machine_kexec().
>
> The dmesg output that it contains can be used to study the shutdown
> performance of both kernel and systemd during kexec reboot.
>
> Here is example of dmesg data collected after kexec:
>
> root@dplat-cp22:~# cat /sys/fs/pstore/dmesg-ramoops-0 | tail
> ...
> <6>[   70.914592] psci: CPU3 killed (polled 0 ms)
> <5>[   70.915705] CPU4: shutdown
> <6>[   70.916643] psci: CPU4 killed (polled 4 ms)
> <5>[   70.917715] CPU5: shutdown
> <6>[   70.918725] psci: CPU5 killed (polled 0 ms)
> <5>[   70.919704] CPU6: shutdown
> <6>[   70.920726] psci: CPU6 killed (polled 4 ms)
> <5>[   70.921642] CPU7: shutdown
> <6>[   70.922650] psci: CPU7 killed (polled 0 ms)
>
> Signed-off-by: Pavel Tatashin 
> ---
>  kernel/kexec_core.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..50027f759a97 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -37,6 +37,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -1181,6 +1182,7 @@ int kernel_kexec(void)
> machine_shutdown();
> }
>
> +   kmsg_dump(KMSG_DUMP_SHUTDOWN);
> machine_kexec(kexec_image);
>
>  #ifdef CONFIG_KEXEC_JUMP
> --
> 2.25.1

LGTM, so:

Reviewed-by: Bhupesh Sharma 

Thanks.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Re: [RESEND PATCH v5 5/5] Documentation/vmcoreinfo: Add documentation for 'TCR_EL1.T1SZ'

2020-06-03 Thread Bhupesh Sharma
Hello Scott,

On Thu, Jun 4, 2020 at 12:17 AM Scott Branden
 wrote:
>
> Hi Bhupesh,
>
> Would be great to get this patch series upstreamed?
>
> On 2019-12-25 10:49 a.m., Bhupesh Sharma wrote:
> > Hi James,
> >
> > On 12/12/2019 04:02 PM, James Morse wrote:
> >> Hi Bhupesh,
> >
> > I am sorry this review mail skipped my attention due to holidays and
> > focus on other urgent issues.
> >
> >> On 29/11/2019 19:59, Bhupesh Sharma wrote:
> >>> Add documentation for TCR_EL1.T1SZ variable being added to
> >>> vmcoreinfo.
> >>>
> >>> It indicates the size offset of the memory region addressed by
> >>> TTBR1_EL1
> >>
> >>> and hence can be used for determining the vabits_actual value.
> >>
> >> used for determining random-internal-kernel-variable, that might not
> >> exist tomorrow.
> >>
> >> Could you describe how this is useful/necessary if a debugger wants
> >> to walk the page
> >> tables from the core file? I think this is a better argument.
> >>
> >> Wouldn't the documentation be better as part of the patch that adds
> >> the export?
> >> (... unless these have to go via different trees? ..)
> >
> > Ok, will fix the same in v6 version.
> >
> >>> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> >>> b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> >>> index 447b64314f56..f9349f9d3345 100644
> >>> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> >>> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> >>> @@ -398,6 +398,12 @@ KERNELOFFSET
> >>>   The kernel randomization offset. Used to compute the page offset. If
> >>>   KASLR is disabled, this value is zero.
> >>>   +TCR_EL1.T1SZ
> >>> +
> >>> +
> >>> +Indicates the size offset of the memory region addressed by TTBR1_EL1
> >>
> >>> +and hence can be used for determining the vabits_actual value.
> >>
> >> 'vabits_actual' may not exist when the next person comes to read this
> >> documentation (its
> >> going to rot really quickly).
> >>
> >> I think the first half of this text is enough to say what this is
> >> for. You should include
> >> words to the effect that its the hardware value that goes with
> >> swapper_pg_dir. You may
> >> want to point readers to the arm-arm for more details on what the
> >> value means.
> >
> > Ok, got it. Fixed this in v6, which should be on its way shortly.
> I can't seem to find v6?

Oops. I remember Cc'ing you to the v6 patchset (may be my email client
messed up), anyways here is the v6 patchset for your reference:
<http://lists.infradead.org/pipermail/kexec/2020-May/025095.html>

Do share your review/test comments on the same.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v6 2/2] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-06-03 Thread Bhupesh Sharma
Hi Kamlakant,

Many thanks for having a look at the patchset.

On Wed, Jun 3, 2020 at 4:50 PM Kamlakant Patel  wrote:
>
> Hi Bhupesh,
>
> > -Original Message-
> > From: kexec  On Behalf Of Bhupesh
> > Sharma
> > Sent: Thursday, May 14, 2020 12:23 AM
> > To: linux-arm-ker...@lists.infradead.org; x...@kernel.org
> > Cc: Mark Rutland ; Kazuhito Hagio  > ha...@ab.jp.nec.com>; Steve Capper ; Catalin
> > Marinas ; bhsha...@redhat.com; Ard Biesheuvel
> > ; kexec@lists.infradead.org; linux-
> > ker...@vger.kernel.org; James Morse ; Dave
> > Anderson ; bhupesh.li...@gmail.com; Will Deacon
> > 
> > Subject: [PATCH v6 2/2] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
> >
> > vabits_actual variable on arm64 indicates the actual VA space size, and 
> > allows a
> > single binary to support both 48-bit and 52-bit VA spaces.
> >
> > If the ARMv8.2-LVA optional feature is present, and we are running with a 
> > 64KB
> > page size; then it is possible to use 52-bits of address space for both 
> > userspace
> > and kernel addresses. However, any kernel binary that supports 52-bit must 
> > also
> > be able to fall back to 48-bit at early boot time if the hardware feature 
> > is not
> > present.
> >
> > Since TCR_EL1.T1SZ indicates the size offset of the memory region addressed 
> > by
> > TTBR1_EL1 (and hence can be used for determining the vabits_actual value) it
> > makes more sense to export the same in vmcoreinfo rather than vabits_actual
> > variable, as the name of the variable can change in future kernel versions, 
> > but
> > the architectural constructs like TCR_EL1.T1SZ can be used better to 
> > indicate
> > intended specific fields to user-space.
> >
> > User-space utilities like makedumpfile and crash-utility, need to read this 
> > value
> > from vmcoreinfo for determining if a virtual address lies in the linear map 
> > range.
> >
> > While at it also add documentation for TCR_EL1.T1SZ variable being added to
> > vmcoreinfo.
> >
> > It indicates the size offset of the memory region addressed by TTBR1_EL1
> >
> > Cc: James Morse 
> > Cc: Mark Rutland 
> > Cc: Will Deacon 
> > Cc: Steve Capper 
> > Cc: Catalin Marinas 
> > Cc: Ard Biesheuvel 
> > Cc: Dave Anderson 
> > Cc: Kazuhito Hagio 
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux-ker...@vger.kernel.org
> > Cc: kexec@lists.infradead.org
> > Tested-by: John Donnelly 
> > Signed-off-by: Bhupesh Sharma 
> > ---
> >  Documentation/admin-guide/kdump/vmcoreinfo.rst | 11 +++
> >  arch/arm64/include/asm/pgtable-hwdef.h |  1 +
> >  arch/arm64/kernel/crash_core.c | 10 ++
> >  3 files changed, 22 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> > b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> > index 2a632020f809..2baad0bfb09d 100644
> > --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> > +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> > @@ -404,6 +404,17 @@ KERNELPACMASK
> >  The mask to extract the Pointer Authentication Code from a kernel virtual
> > address.
> >
> > +TCR_EL1.T1SZ
> > +
> > +
> > +Indicates the size offset of the memory region addressed by TTBR1_EL1.
> > +The region size is 2^(64-T1SZ) bytes.
> > +
> > +TTBR1_EL1 is the table base address register specified by ARMv8-A
> > +architecture which is used to lookup the page-tables for the Virtual
> > +addresses in the higher VA range (refer to ARMv8 ARM document for more
> > +details).
> > +
> >  arm
> >  ===
> >
> > diff --git a/arch/arm64/include/asm/pgtable-hwdef.h
> > b/arch/arm64/include/asm/pgtable-hwdef.h
> > index 6bf5e650da78..a1861af97ac9 100644
> > --- a/arch/arm64/include/asm/pgtable-hwdef.h
> > +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> > @@ -216,6 +216,7 @@
> >  #define TCR_TxSZ(x)  (TCR_T0SZ(x) | TCR_T1SZ(x))
> >  #define TCR_TxSZ_WIDTH   6
> >  #define TCR_T0SZ_MASK(((UL(1) << TCR_TxSZ_WIDTH) - 1) <<
> > TCR_T0SZ_OFFSET)
> > +#define TCR_T1SZ_MASK(((UL(1) << TCR_TxSZ_WIDTH) - 1) <<
> > TCR_T1SZ_OFFSET)
> >
> >  #define TCR_EPD0_SHIFT   7
> >  #define TCR_EPD0_MASK(UL(1) << TCR_EPD0_SHIFT)
> > diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
> > index 1f646b0

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-03 Thread Bhupesh Sharma
Hi All,

On Wed, Jun 3, 2020 at 9:03 PM John Donnelly  wrote:
>
>
>
> > On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:
> >
> > Hi,
> >
> >
> > On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> >> Hi Chen,
> >>
> >> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly  
> >> wrote:
> >>>
> >>>
> >>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha 
> >>>>  wrote:
> >>>>
> >>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly 
> >>>>  wrote:
> >>>>> Hi .  See below !
> >>>>>
> >>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma  wrote:
> >>>>>>
> >>>>>> Hi John,
> >>>>>>
> >>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly 
> >>>>>>  wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>>
> >>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> >>>>>>>> Hi Chen,
> >>>>>>>>
> >>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou  
> >>>>>>>> wrote:
> >>>>>>>>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>>>>>>>
> >>>>>>>>> There are following issues in arm64 kdump:
> >>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will 
> >>>>>>>>> fail
> >>>>>>>>> when there is no enough low memory.
> >>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel 
> >>>>>>>>> above 4G,
> >>>>>>>>> in this case, if swiotlb or DMA buffers are required, crash dump 
> >>>>>>>>> kernel
> >>>>>>>>> will boot failure because there is no low memory available for 
> >>>>>>>>> allocation.
> >>>>>>>>>
> >>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
> >>>>>>>> with bootargs as [2] of primary kernel.
> >>>>>>>> This error observed on ThunderX2  ARM64 platform.
> >>>>>>>>
> >>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this patch 
> >>>>>>>> set
> >>>>>>>> and 
> >>>>>>>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
> >>>>>>>> Also **without** this patch-set
> >>>>>>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$;
> >>>>>>>>
> >>>>>>>> This issue comes whenever crashkernel memory is reserved after 
> >>>>>>>> 0xc000_.
> >>>>>>>> More details discussed earlier in
> >>>>>>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
> >>>>>>>>   without any
> >>>>>>>> solution
> >>>>>>>>
> >>>>>>>> This patch-set is expected to solve similar kind of issue.
> >>>>>>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
> >>>>>>>> observation should be considered/fixed. .
> >>>>>>>>
> >>>>>>>> --pk
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
> >>>>>>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
> >>>>>>>> [   30.367696] NET: Registered protocol family 16
> >>>>>>>> [   30.369973] swapper/0: page allocation failure: order:6,
> >>>>>>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
> >>>>>>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc

Re: [PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2020-06-01 Thread Bhupesh Sharma
Hello,

On Thu, May 14, 2020 at 12:22 AM Bhupesh Sharma  wrote:
>
> Apologies for the delayed update. Its been quite some time since I
> posted the last version (v5), but I have been really caught up in some
> other critical issues.
>
> Changes since v5:
> 
> - v5 can be viewed here:
>   http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
> - Addressed review comments from James Morse and Boris.
> - Added Tested-by received from John on v5 patchset.
> - Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
>   patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
>   applied.
>
> Changes since v4:
> 
> - v4 can be seen here:
>   http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
> - Addressed comments from Dave and added patches for documenting
>   new variables appended to vmcoreinfo documentation.
> - Added testing report shared by Akashi for PATCH 2/5.
>
> Changes since v3:
> 
> - v3 can be seen here:
>   http://lists.infradead.org/pipermail/kexec/2019-March/022590.html
> - Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo
>   instead of PTRS_PER_PGD.
> - Added a new patch (via [PATCH 3/3]), which fixes a simple typo in
>   'Documentation/arm64/memory.rst'
>
> Changes since v2:
> 
> - v2 can be seen here:
>   http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
> - Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
>   ifdef sections, as suggested by Kazu.
> - Updated vmcoreinfo documentation to add description about
>   'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).
>
> Changes since v1:
> 
> - v1 was sent out as a single patch which can be seen here:
>   http://lists.infradead.org/pipermail/kexec/2019-February/022411.html
>
> - v2 breaks the single patch into two independent patches:
>   [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
>   [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code 
> (all archs)
>
> This patchset primarily fixes the regression reported in user-space
> utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
> with the availability of 52-bit address space feature in underlying
> kernel. These regressions have been reported both on CPUs which don't
> support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
> and also on prototype platforms (like ARMv8 FVP simulator model) which
> support ARMv8.2 extensions and are running newer kernels.
>
> The reason for these regressions is that right now user-space tools
> have no direct access to these values (since these are not exported
> from the kernel) and hence need to rely on a best-guess method of
> determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported
> by underlying kernel.
>
> Exporting these values via vmcoreinfo will help user-land in such cases.
> In addition, as per suggestion from makedumpfile maintainer (Kazu),
> it makes more sense to append 'MAX_PHYSMEM_BITS' to
> vmcoreinfo in the core code itself rather than in arm64 arch-specific
> code, so that the user-space code for other archs can also benefit from
> this addition to the vmcoreinfo and use it as a standard way of
> determining 'SECTIONS_SHIFT' value in user-land.
>
> Cc: Boris Petkov 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: Jonathan Corbet 
> Cc: James Morse 
> Cc: Mark Rutland 
> Cc: Will Deacon 
> Cc: Steve Capper 
> Cc: Catalin Marinas 
> Cc: Ard Biesheuvel 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Benjamin Herrenschmidt 
> Cc: Dave Anderson 
> Cc: Kazuhito Hagio 
> Cc: John Donnelly 
> Cc: scott.bran...@broadcom.com
> Cc: Amit Kachhap 
> Cc: x...@kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: kexec@lists.infradead.org
>
> Bhupesh Sharma (2):
>   crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
>   arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
>
>  Documentation/admin-guide/kdump/vmcoreinfo.rst | 16 
>  arch/arm64/include/asm/pgtable-hwdef.h |  1 +
>  arch/arm64/kernel/crash_core.c | 10 ++
>  kernel/crash_core.c|  1 +
>  4 files changed, 28 insertions(+)

Ping. @James Morse , Others

Please share if you have some comments regarding this patchset.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-01 Thread Bhupesh Sharma
Hi John,

On Tue, Jun 2, 2020 at 1:01 AM John Donnelly  wrote:
>
> Hi,
>
>
> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
> > Hi Chen,
> >
> > On Thu, May 21, 2020 at 3:05 PM Chen Zhou  wrote:
> >> This patch series enable reserving crashkernel above 4G in arm64.
> >>
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >> when there is no enough low memory.
> >> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >> will boot failure because there is no low memory available for allocation.
> >>
> >> To solve these issues, introduce crashkernel=X,low to reserve specified
> >> size low memory.
> >> Crashkernel=X tries to reserve memory for the crash dump kernel under
> >> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
> >> size low memory for crash kdump kernel devices firstly and then reserve
> >> memory above 4G.
> >>
> >> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
> >> is specified simultaneously, kernel should reserve specified size low 
> >> memory
> >> for crash dump kernel devices. So there may be two crash kernel regions, 
> >> one
> >> is below 4G, the other is above 4G.
> >> In order to distinct from the high region and make no effect to the use of
> >> kexec-tools, rename the low region as "Crash kernel (low)", and add DT 
> >> property
> >> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
> >>
> >> Besides, we need to modify kexec-tools:
> >> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
> >>
> >> The previous changes and discussions can be retrieved from:
> >>
> >> Changes since [v7]
> >> - Move x86 CRASH_ALIGN to 2M
> >> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> >> - Update Documentation/devicetree/bindings/chosen.txt
> >> Add corresponding documentation to 
> >> Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
> >> - Add Tested-by from Jhon and pk
> >>
> >> Changes since [v6]
> >> - Fix build errors reported by kbuild test robot.
> >>
> >> Changes since [v5]
> >> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> >> - Delete crashkernel=X,high.
> >> - Modify crashkernel=X,low.
> >> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> >> memory for crash kdump kernel devices firstly and then reserve memory 
> >> above 4G.
> >> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and 
> >> then
> >> pass to crash dump kernel by DT property "linux,low-memory-range".
> >> - Update Documentation/admin-guide/kdump/kdump.rst.
> >>
> >> Changes since [v4]
> >> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> >>
> >> Changes since [v3]
> >> - Add memblock_cap_memory_ranges back for multiple ranges.
> >> - Fix some compiling warnings.
> >>
> >> Changes since [v2]
> >> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> >> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> >> patch.
> >>
> >> Changes since [v1]:
> >> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> >> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> >> in fdt_enforce_memory_region().
> >> There are at most two crash kernel regions, for two crash kernel regions
> >> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> >> and then remove the memory range in the middle.
> >>
> >> [1]: 
> >> https://urldefense.com/v3/__http://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvpn1uM1$
> >> [v1]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/2/1174__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbt0xN9PE$
> >> [v2]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/86__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbub7yUQH$
> >> [v3]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/9/306__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbnc4zPPV$
> >> [v4]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/4/15/273__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbvsAsZLu$
> >> [v5]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/5/6/1360__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbl24n-79$
> >> [v6]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/8/30/142__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbs7r8G2a$
> >> [v7]: 
> >> https://urldefense.com/v3/__https://lkml.org/lkml/2019/12/23/411__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiFUH90G$
> >>
> >> Chen 

Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump

2020-05-20 Thread Bhupesh Sharma
Hi John,

On Wed, May 20, 2020 at 1:53 AM John Donnelly
 wrote:
>
>
>
> > On May 19, 2020, at 5:21 AM, Arnd Bergmann  wrote:
> >
> > On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou  wrote:
> >>
> >> Hi all,
> >>
> >> Friendly ping...
> >
> > I was asked about this patch series, and see that you last posted it in
> > December. I think you should rebase it to linux-5.7-rc6 and post the
> > entire series again to make progress, as it's unlikely that any maintainer
> > would pick up the patches from last year.
> >
> > For the contents, everything seems reasonable to me, but I noticed that
> > you are adding a property to the /chosen node without adding the
> > corresponding documentation to
> > Documentation/devicetree/bindings/chosen.txt
> >
> > Please add that, and Cc the devicetree maintainers on the updated
> > patch.
> >
> > Arnd
> >
> >> On 2019/12/23 23:23, Chen Zhou wrote:
> >>> This patch series enable reserving crashkernel above 4G in arm64.
> >>>
> >>> There are following issues in arm64 kdump:
> >>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
> >>> when there is no enough low memory.
> >>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
> >>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
> >>> will boot failure because there is no low memory available for allocation.
> >>>
> >>> The previous changes and discussions can be retrieved from:
> >>>
> >>> Changes since [v6]
> >>> - Fix build errors reported by kbuild test robot.
> > ...
>
>
>  Hi
>
> We found
>
> https://lkml.org/lkml/2020/4/30/1583
>
> Has cured our Out-Of-Memory kdump failures.
>
> FromHenry Willard
> Subject [PATCH] mm: Limit boost_watermark on small zones.
>
> I am currently not on linux-ker...@vger.kernel.org. dlist for all to see  
> this message so you may want to rebase and see if this cures your OoM issue 
> and share the results.

This is a very interesting finding. Thanks a lot for sharing the same.
I am working on further avoiding OOM issues with arm64 kdump kernels.
I will experiment more with this patch and get back with more details.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-05-15 Thread Bhupesh Sharma
Hi Arnd,

On Thu, Apr 30, 2020 at 10:05 AM Bhupesh Sharma  wrote:
>
> On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas  
> wrote:
> >
> > On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  wrote:
> > > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > > >  arch/arm64/configs/defconfig | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/arch/arm64/configs/defconfig 
> > > > > b/arch/arm64/configs/defconfig
> > > > > index 24e534d85045..fa122f4341a2 100644
> > > > > --- a/arch/arm64/configs/defconfig
> > > > > +++ b/arch/arm64/configs/defconfig
> > > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > > >  CONFIG_NUMA=y
> > > > >  CONFIG_SECCOMP=y
> > > > >  CONFIG_KEXEC=y
> > > > > +CONFIG_KEXEC_FILE=y
> > > > >  CONFIG_CRASH_DUMP=y
> > > > >  CONFIG_XEN=y
> > > > >  CONFIG_COMPAT=y
> > > > > --
> > > > > 2.7.4
> > >
> > > Thanks a lot  Mark.
> > >
> > > Hi Catalin, Will,
> > >
> > > Can you please help pick this patch in the arm tree. We have an
> > > increasing number of user-cases from distro users
> > > who want to use kexec_file_load() as the default interface for
> > > kexec/kdump on arm64.
> >
> > We could pick it up if it doesn't conflict with the arm-soc tree. They
> > tend to pick most of the defconfig changes these days (and could as well
> > pick this one).
>
> Thanks Catalin.
> (+Cc Arnd)
>
> Hi Arnd,
>
> Can you please help pick this change via the arm-soc tree?

Ping. Any updates on this defconfig patch.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2020-05-13 Thread Bhupesh Sharma
Right now user-space tools like 'makedumpfile' and 'crash' need to rely
on a best-guess method of determining value of 'MAX_PHYSMEM_BITS'
supported by underlying kernel.

This value is used in user-space code to calculate the bit-space
required to store a section for SPARESMEM (similar to the existing
calculation method used in the kernel implementation):

  #define SECTIONS_SHIFT(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)

Now, regressions have been reported in user-space utilities
like 'makedumpfile' and 'crash' on arm64, with the recently added
kernel support for 52-bit physical address space, as there is
no clear method of determining this value in user-space
(other than reading kernel CONFIG flags).

As per suggestion from makedumpfile maintainer (Kazu), it makes more
sense to append 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself
rather than in arch-specific code, so that the user-space code for other
archs can also benefit from this addition to the vmcoreinfo and use it
as a standard way of determining 'SECTIONS_SHIFT' value in user-land.

A reference 'makedumpfile' implementation which reads the
'MAX_PHYSMEM_BITS' value from vmcoreinfo in a arch-independent fashion
is available here:

While at it also update vmcoreinfo documentation for 'MAX_PHYSMEM_BITS'
variable being added to vmcoreinfo.

'MAX_PHYSMEM_BITS' defines the maximum supported physical address
space memory.

Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Tested-by: John Donnelly 
Signed-off-by: Bhupesh Sharma 
---
 Documentation/admin-guide/kdump/vmcoreinfo.rst | 5 +
 kernel/crash_core.c| 1 +
 2 files changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index e4ee8b2db604..2a632020f809 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -93,6 +93,11 @@ It exists in the sparse memory mapping model, and it is also 
somewhat
 similar to the mem_map variable, both of them are used to translate an
 address.
 
+MAX_PHYSMEM_BITS
+
+
+Defines the maximum supported physical address space memory.
+
 page
 
 
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 9f1557b98468..18175687133a 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
VMCOREINFO_STRUCT_SIZE(mem_section);
VMCOREINFO_OFFSET(mem_section, section_mem_map);
+   VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
 #endif
VMCOREINFO_STRUCT_SIZE(page);
VMCOREINFO_STRUCT_SIZE(pglist_data);
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 2/2] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-05-13 Thread Bhupesh Sharma
vabits_actual variable on arm64 indicates the actual VA space size,
and allows a single binary to support both 48-bit and 52-bit VA
spaces.

If the ARMv8.2-LVA optional feature is present, and we are running
with a 64KB page size; then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.

Since TCR_EL1.T1SZ indicates the size offset of the memory region
addressed by TTBR1_EL1 (and hence can be used for determining the
vabits_actual value) it makes more sense to export the same in
vmcoreinfo rather than vabits_actual variable, as the name of the
variable can change in future kernel versions, but the architectural
constructs like TCR_EL1.T1SZ can be used better to indicate intended
specific fields to user-space.

User-space utilities like makedumpfile and crash-utility, need to
read this value from vmcoreinfo for determining if a virtual
address lies in the linear map range.

While at it also add documentation for TCR_EL1.T1SZ variable being
added to vmcoreinfo.

It indicates the size offset of the memory region addressed by TTBR1_EL1

Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Catalin Marinas 
Cc: Ard Biesheuvel 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Tested-by: John Donnelly 
Signed-off-by: Bhupesh Sharma 
---
 Documentation/admin-guide/kdump/vmcoreinfo.rst | 11 +++
 arch/arm64/include/asm/pgtable-hwdef.h |  1 +
 arch/arm64/kernel/crash_core.c | 10 ++
 3 files changed, 22 insertions(+)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 2a632020f809..2baad0bfb09d 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -404,6 +404,17 @@ KERNELPACMASK
 The mask to extract the Pointer Authentication Code from a kernel virtual
 address.
 
+TCR_EL1.T1SZ
+
+
+Indicates the size offset of the memory region addressed by TTBR1_EL1.
+The region size is 2^(64-T1SZ) bytes.
+
+TTBR1_EL1 is the table base address register specified by ARMv8-A
+architecture which is used to lookup the page-tables for the Virtual
+addresses in the higher VA range (refer to ARMv8 ARM document for
+more details).
+
 arm
 ===
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index 6bf5e650da78..a1861af97ac9 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -216,6 +216,7 @@
 #define TCR_TxSZ(x)(TCR_T0SZ(x) | TCR_T1SZ(x))
 #define TCR_TxSZ_WIDTH 6
 #define TCR_T0SZ_MASK  (((UL(1) << TCR_TxSZ_WIDTH) - 1) << 
TCR_T0SZ_OFFSET)
+#define TCR_T1SZ_MASK  (((UL(1) << TCR_TxSZ_WIDTH) - 1) << 
TCR_T1SZ_OFFSET)
 
 #define TCR_EPD0_SHIFT 7
 #define TCR_EPD0_MASK  (UL(1) << TCR_EPD0_SHIFT)
diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
index 1f646b07e3e9..314391a156ee 100644
--- a/arch/arm64/kernel/crash_core.c
+++ b/arch/arm64/kernel/crash_core.c
@@ -7,6 +7,14 @@
 #include 
 #include 
 #include 
+#include 
+
+static inline u64 get_tcr_el1_t1sz(void);
+
+static inline u64 get_tcr_el1_t1sz(void)
+{
+   return (read_sysreg(tcr_el1) & TCR_T1SZ_MASK) >> TCR_T1SZ_OFFSET;
+}
 
 void arch_crash_save_vmcoreinfo(void)
 {
@@ -16,6 +24,8 @@ void arch_crash_save_vmcoreinfo(void)
kimage_voffset);
vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
PHYS_OFFSET);
+   vmcoreinfo_append_str("NUMBER(TCR_EL1_T1SZ)=0x%llx\n",
+   get_tcr_el1_t1sz());
vmcoreinfo_append_str("KERNELOFFSET=%lx\n", kaslr_offset());
vmcoreinfo_append_str("NUMBER(KERNELPACMASK)=0x%llx\n",
system_supports_address_auth() ?
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 0/2] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2020-05-13 Thread Bhupesh Sharma
Apologies for the delayed update. Its been quite some time since I
posted the last version (v5), but I have been really caught up in some
other critical issues.

Changes since v5:

- v5 can be viewed here:
  http://lists.infradead.org/pipermail/kexec/2019-November/024055.html
- Addressed review comments from James Morse and Boris.
- Added Tested-by received from John on v5 patchset.
- Rebased against arm64 (for-next/ptr-auth) branch which has Amit's
  patchset for ARMv8.3-A Pointer Authentication feature vmcoreinfo
  applied.

Changes since v4:

- v4 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
- Addressed comments from Dave and added patches for documenting
  new variables appended to vmcoreinfo documentation.
- Added testing report shared by Akashi for PATCH 2/5.

Changes since v3:

- v3 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-March/022590.html
- Addressed comments from James and exported TCR_EL1.T1SZ in vmcoreinfo
  instead of PTRS_PER_PGD.
- Added a new patch (via [PATCH 3/3]), which fixes a simple typo in
  'Documentation/arm64/memory.rst'

Changes since v2:

- v2 can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-March/022531.html
- Protected 'MAX_PHYSMEM_BITS' vmcoreinfo variable under CONFIG_SPARSEMEM
  ifdef sections, as suggested by Kazu.
- Updated vmcoreinfo documentation to add description about
  'MAX_PHYSMEM_BITS' variable (via [PATCH 3/3]).

Changes since v1:

- v1 was sent out as a single patch which can be seen here:
  http://lists.infradead.org/pipermail/kexec/2019-February/022411.html

- v2 breaks the single patch into two independent patches:
  [PATCH 1/2] appends 'PTRS_PER_PGD' to vmcoreinfo for arm64 arch, whereas
  [PATCH 2/2] appends 'MAX_PHYSMEM_BITS' to vmcoreinfo in core kernel code (all 
archs)

This patchset primarily fixes the regression reported in user-space
utilities like 'makedumpfile' and 'crash-utility' on arm64 architecture
with the availability of 52-bit address space feature in underlying
kernel. These regressions have been reported both on CPUs which don't
support ARMv8.2 extensions (i.e. LVA, LPA) and are running newer kernels
and also on prototype platforms (like ARMv8 FVP simulator model) which
support ARMv8.2 extensions and are running newer kernels.

The reason for these regressions is that right now user-space tools
have no direct access to these values (since these are not exported
from the kernel) and hence need to rely on a best-guess method of
determining value of 'vabits_actual' and 'MAX_PHYSMEM_BITS' supported
by underlying kernel.

Exporting these values via vmcoreinfo will help user-land in such cases.
In addition, as per suggestion from makedumpfile maintainer (Kazu),
it makes more sense to append 'MAX_PHYSMEM_BITS' to
vmcoreinfo in the core code itself rather than in arm64 arch-specific
code, so that the user-space code for other archs can also benefit from
this addition to the vmcoreinfo and use it as a standard way of
determining 'SECTIONS_SHIFT' value in user-land.

Cc: Boris Petkov 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: Jonathan Corbet 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Catalin Marinas 
Cc: Ard Biesheuvel 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Benjamin Herrenschmidt 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: John Donnelly 
Cc: scott.bran...@broadcom.com
Cc: Amit Kachhap 
Cc: x...@kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: kexec@lists.infradead.org

Bhupesh Sharma (2):
  crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
  arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

 Documentation/admin-guide/kdump/vmcoreinfo.rst | 16 
 arch/arm64/include/asm/pgtable-hwdef.h |  1 +
 arch/arm64/kernel/crash_core.c | 10 ++
 kernel/crash_core.c|  1 +
 4 files changed, 28 insertions(+)

-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 2/2] net: qed: Disable SRIOV functionality inside kdump kernel

2020-05-11 Thread Bhupesh Sharma
Since we have kdump kernel(s) running under severe memory constraint
it makes sense to disable the qed SRIOV functionality when running the
kdump kernel as kdump configurations on several distributions don't
support SRIOV targets for saving the vmcore (see [1] for example).

Currently the qed SRIOV functionality ends up consuming memory in
the kdump kernel, when we don't really use the same.

An example log seen in the kdump kernel with the SRIOV functionality
enabled can be seen below (obtained via memstrack tool, see [2]):
 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)

This patch disables the SRIOV functionality inside kdump kernel and with
the same applied the memory consumption goes down:
 dracut-pre-pivot[671]:  Report format module_summary: 
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 
124.7MB (1995 pages)

[1]. 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: Ariel Elior 
Cc: gr-everest-linux...@marvell.com
Cc: Manish Chopra 
Cc: David S. Miller 
Signed-off-by: Bhupesh Sharma 
---
 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++---
 drivers/net/ethernet/qlogic/qede/qede_main.c |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.h 
b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
index 368e88565783..aabeaf03135e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
@@ -32,6 +32,7 @@
 
 #ifndef _QED_SRIOV_H
 #define _QED_SRIOV_H
+#include 
 #include 
 #include "qed_vf.h"
 
@@ -40,9 +41,12 @@
 #define QED_VF_ARRAY_LENGTH (3)
 
 #ifdef CONFIG_QED_SRIOV
-#define IS_VF(cdev) ((cdev)->b_is_vf)
-#define IS_PF(cdev) (!((cdev)->b_is_vf))
-#define IS_PF_SRIOV(p_hwfn) (!!((p_hwfn)->cdev->p_iov_info))
+#define IS_VF(cdev) (is_kdump_kernel() ? \
+(0) : ((cdev)->b_is_vf))
+#define IS_PF(cdev) (is_kdump_kernel() ? \
+(1) : !((cdev)->b_is_vf))
+#define IS_PF_SRIOV(p_hwfn) (is_kdump_kernel() ? \
+(0) : !!((p_hwfn)->cdev->p_iov_info))
 #else
 #define IS_VF(cdev) (0)
 #define IS_PF(cdev) (1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 1a83d1fd8ccd..28afa0c49fe8 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1194,7 +1194,7 @@ static int qede_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
case QEDE_PRIVATE_VF:
if (debug & QED_LOG_VERBOSE_MASK)
dev_err(>dev, "Probing a VF\n");
-   is_vf = true;
+   is_vf = is_kdump_kernel() ? false : true;
break;
default:
if (debug & QED_LOG_VERBOSE_MASK)
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel

2020-05-11 Thread Bhupesh Sharma
Normally kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs.

Currently the qed* ethernet driver ends up consuming a lot of memory in
the kdump kernel, leading to kdump kernel panic when one tries to save
the vmcore via ssh/nfs (thus utilizing the services of the underlying
qed* network interfaces).

An example OOM message log seen in the kdump kernel can be seen here
[1], with crashkernel size reservation of 512M.

Using tools like memstrack (see [2]), we can track the modules taking up
the bulk of memory in the kdump kernel and organize the memory usage
output as per 'highest allocator first'. An example log for the OOM case
indicates that the qed* modules end up allocating approximately 216M
memory, which is a large part of the total crashkernel size:

 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 
65.3MB (1045 pages)

This patch reduces the default RX and TX ring count from 1024 to 64
when running inside kdump kernel, which leads to a significant memory
saving.

An example log with the patch applied shows the reduced memory
allocation in the kdump kernel:
 dracut-pre-pivot[674]:  Report format module_summary: 
 dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 
141.8MB (2268 pages)
 <..snip..>
[dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 
4.9MB (78 pages)

Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
network interface after applying this patch.

[1] OOM log:


 kworker/0:6: page allocation failure: order:6,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
 kworker/0:6 cpuset=/ mems_allowed=0
 CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
 Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
 01/18/2019
 Workqueue: events work_for_cpu_fn
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x24/0x30
  dump_stack+0x90/0xb4
  warn_alloc+0xf4/0x178
  __alloc_pages_nodemask+0xcac/0xd58
  alloc_pages_current+0x8c/0xf8
  kmalloc_order_trace+0x38/0x108
  qed_iov_alloc+0x40/0x248 [qed]
  qed_resc_alloc+0x224/0x518 [qed]
  qed_slowpath_start+0x254/0x928 [qed]
   __qede_probe+0xf8/0x5e0 [qede]
  qede_probe+0x68/0xd8 [qede]
  local_pci_probe+0x44/0xa8
  work_for_cpu_fn+0x20/0x30
  process_one_work+0x1ac/0x3e8
  worker_thread+0x44/0x448
  kthread+0x130/0x138
  ret_from_fork+0x10/0x18
  Cannot start slowpath
  qede: probe of :05:00.1 failed with error -12

[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: Ariel Elior 
Cc: gr-everest-linux...@marvell.com
Cc: Manish Chopra 
Cc: David S. Miller 
Signed-off-by: Bhupesh Sharma 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 234c6f30effb..234c7e35ee1e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -574,12 +574,14 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev, __be16 
proto,
 #define RX_RING_SIZE   ((u16)BIT(RX_RING_SIZE_POW))
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
 #define NUM_RX_BDS_MIN 128
+#define NUM_RX_BDS_KDUMP_MIN   63
 #define NUM_RX_BDS_DEF ((u16)BIT(10) - 1)
 
 #define TX_RING_SIZE_POW   13
 #define TX_RING_SIZE   ((u16)BIT(TX_RING_SIZE_POW))
 #define NUM_TX_BDS_MAX (TX_RING_SIZE - 1)
 #define NUM_TX_BDS_MIN 128
+#define NUM_TX_BDS_KDUMP_MIN   63
 #define NUM_TX_BDS_DEF NUM_TX_BDS_MAX
 
 #define QEDE_MIN_PKT_LEN   64
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 34fa3917eb33..1a83d1fd8ccd 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -29,6 +29,7 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
  */
+#include 
 #include 
 #include 
 #include 
@@ -707,8 +708,14 @@ static struct qede_dev *qede_alloc_etherdev(struct qed_dev 
*cdev,
edev->dp_module = dp_module;
edev->dp_level = dp_level;
edev->ops = qed_ops;
-   edev->q_num_rx_buffers = NUM_RX_BDS_DEF;
-   edev->q_num_tx_buffers = NUM_TX_BDS_DEF;
+
+   if (is_kdump_kernel()) {
+   edev->q_num_rx_buffers = NUM_RX_BDS_KDUMP_MIN;
+   edev->q_num_tx_buffers = NUM_TX_BDS_KDUMP_MIN;
+   } else {
+   edev->q_num_rx_buffers = NUM_RX

[PATCH v2 0/2] net: Optimize the qed* allocations inside kdump kernel

2020-05-11 Thread Bhupesh Sharma
Changes since v1:

- v1 can be seen here: 
http://lists.infradead.org/pipermail/kexec/2020-May/024935.html
- Addressed review comments received on v1:
  * Removed unnecessary paranthesis.
  * Used a different macro for minimum RX/TX ring count value in kdump
kernel.

Since kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs, large memory allocations done by a network driver
can cause the crashkernel to panic with OOM.

The qed* drivers take up approximately 214MB memory when run in the
kdump kernel with the default configuration settings presently used in
the driver. With an usual crashkernel size of 512M, this allocation
is equal to almost half of the total crashkernel size allocated.

See some logs obtained via memstrack tool (see [1]) below:
 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 
65.3MB (1045 pages)

This patchset tries to reduce the overall memory allocation profile of
the qed* driver when they run in the kdump kernel. With these
optimization we can see a saving of approx 85M in the kdump kernel:
 dracut-pre-pivot[671]:  Report format module_summary: 
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 
124.7MB (1995 pages)
 <..snip..>
 dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 
4.6MB (74 pages)

And the kdump kernel can save vmcore successfully via both ssh and nfs
interfaces.

This patchset contains two patches:
[PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
[PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
  normally not a supported kdump target for saving
  vmcore).

[1]. Memstrack tool: https://github.com/ryncsn/memstrack

Bhupesh Sharma (2):
  net: qed*: Reduce RX and TX default ring count when running inside
kdump kernel
  net: qed: Disable SRIOV functionality inside kdump kernel

 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++---
 drivers/net/ethernet/qlogic/qede/qede.h  |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 13 ++---
 3 files changed, 19 insertions(+), 6 deletions(-)

-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [EXT] [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel

2020-05-06 Thread Bhupesh Sharma
Hello Igor,

On Wed, May 6, 2020 at 12:21 PM Igor Russkikh  wrote:
>
>
>
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -574,13 +575,13 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev,
> > __be16 proto,
> >  #define RX_RING_SIZE ((u16)BIT(RX_RING_SIZE_POW))
> >  #define NUM_RX_BDS_MAX   (RX_RING_SIZE - 1)
> >  #define NUM_RX_BDS_MIN   128
> > -#define NUM_RX_BDS_DEF   ((u16)BIT(10) - 1)
> > +#define NUM_RX_BDS_DEF   ((is_kdump_kernel()) ? ((u16)BIT(6) - 
> > 1) :
> > ((u16)BIT(10) - 1))
> >
> >  #define TX_RING_SIZE_POW 13
> >  #define TX_RING_SIZE ((u16)BIT(TX_RING_SIZE_POW))
> >  #define NUM_TX_BDS_MAX   (TX_RING_SIZE - 1)
> >  #define NUM_TX_BDS_MIN   128
> > -#define NUM_TX_BDS_DEF   NUM_TX_BDS_MAX
> > +#define NUM_TX_BDS_DEF   ((is_kdump_kernel()) ? ((u16)BIT(6) - 
> > 1) :
> > NUM_TX_BDS_MAX)
> >
>
> Hi Bhupesh,
>
> Thanks for looking into this. We are also analyzing how to reduce qed* memory
> usage even more.
>
> Patch is good, but may I suggest not to introduce conditional logic into the
> defines but instead just add two new defines like NUM_[RT]X_BDS_MIN and check
> for is_kdump_kernel() in the code explicitly?
>
> if (is_kdump_kernel()) {
> edev->q_num_rx_buffers = NUM_RX_BDS_MIN;
> edev->q_num_tx_buffers = NUM_TX_BDS_MIN;
> } else {
> edev->q_num_rx_buffers = NUM_RX_BDS_DEF;
> edev->q_num_tx_buffers = NUM_TX_BDS_DEF;
> }
>
> This may make configuration logic more explicit. If future we may want adding
> more specific configs under this `if`.

Thanks for the review comments.
The suggestions seem fine to me. I will incorporate them in v2.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel

2020-05-05 Thread Bhupesh Sharma
Hi David,

On Wed, May 6, 2020 at 2:54 AM David Miller  wrote:
>
> From: Bhupesh Sharma 
> Date: Wed,  6 May 2020 00:34:40 +0530
>
> > -#define NUM_RX_BDS_DEF   ((u16)BIT(10) - 1)
> > +#define NUM_RX_BDS_DEF   ((is_kdump_kernel()) ? ((u16)BIT(6) - 
> > 1) : ((u16)BIT(10) - 1))
>
> These parenthesis are very excessive and unnecessary.  At the
> very least remove the parenthesis around is_kdump_kernel().

Thanks a lot for the review.
Sure, will fix this in the v2.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] net: qed: Disable SRIOV functionality inside kdump kernel

2020-05-05 Thread Bhupesh Sharma
Since we have kdump kernel(s) running under severe memory constraint
it makes sense to disable the qed SRIOV functionality when running the
kdump kernel as kdump configurations on several distributions don't
support SRIOV targets for saving the vmcore (see [1] for example).

Currently the qed SRIOV functionality ends up consuming memory in
the kdump kernel, when we don't really use the same.

An example log seen in the kdump kernel with the SRIOV functionality
enabled can be seen below (obtained via memstrack tool, see [2]):
 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)

This patch disables the SRIOV functionality inside kdump kernel and with
the same applied the memory consumption goes down:
 dracut-pre-pivot[671]:  Report format module_summary: 
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 
124.7MB (1995 pages)

[1]. 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: Ariel Elior 
Cc: gr-everest-linux...@marvell.com
Cc: Manish Chopra 
Cc: David S. Miller 
Signed-off-by: Bhupesh Sharma 
---
 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++---
 drivers/net/ethernet/qlogic/qede/qede_main.c |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.h 
b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
index 368e88565783..f2ebd9a76e20 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
@@ -32,6 +32,7 @@
 
 #ifndef _QED_SRIOV_H
 #define _QED_SRIOV_H
+#include 
 #include 
 #include "qed_vf.h"
 
@@ -40,9 +41,12 @@
 #define QED_VF_ARRAY_LENGTH (3)
 
 #ifdef CONFIG_QED_SRIOV
-#define IS_VF(cdev) ((cdev)->b_is_vf)
-#define IS_PF(cdev) (!((cdev)->b_is_vf))
-#define IS_PF_SRIOV(p_hwfn) (!!((p_hwfn)->cdev->p_iov_info))
+#define IS_VF(cdev) ((is_kdump_kernel()) ? \
+(0) : ((cdev)->b_is_vf))
+#define IS_PF(cdev) ((is_kdump_kernel()) ? \
+(1) : !((cdev)->b_is_vf))
+#define IS_PF_SRIOV(p_hwfn) ((is_kdump_kernel()) ? \
+(0) : !!((p_hwfn)->cdev->p_iov_info))
 #else
 #define IS_VF(cdev) (0)
 #define IS_PF(cdev) (1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 34fa3917eb33..f557ae90ce7c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1187,7 +1187,7 @@ static int qede_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
case QEDE_PRIVATE_VF:
if (debug & QED_LOG_VERBOSE_MASK)
dev_err(>dev, "Probing a VF\n");
-   is_vf = true;
+   is_vf = is_kdump_kernel() ? false : true;
break;
default:
if (debug & QED_LOG_VERBOSE_MASK)
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2] net: Optimize the qed* allocations inside kdump kernel

2020-05-05 Thread Bhupesh Sharma
Since kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs, large memory allocations done by a network driver
can cause the crashkernel to panic with OOM.

The qed* drivers take up approximately 214MB memory when run in the
kdump kernel with the default configuration settings presently used in
the driver. With an usual crashkernel size of 512M, this allocation
is equal to almost half of the total crashkernel size allocated.

See some logs obtained via memstrack tool (see [1]) below:
 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 
65.3MB (1045 pages)

This patchset tries to reduce the overall memory allocation profile of
the qed* driver when they run in the kdump kernel. With these
optimization we can see a saving of approx 85M in the kdump kernel:
 dracut-pre-pivot[671]:  Report format module_summary: 
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 
124.7MB (1995 pages)
 <..snip..>
 dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 
4.6MB (74 pages)

And the kdump kernel can save vmcore successfully via both ssh and nfs
interfaces.

This patchset contains two patches:
[PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
[PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
  normally not a supported kdump target for saving
  vmcore).

[1]. Memstrack tool: https://github.com/ryncsn/memstrack

-
Bhupesh Sharma (2):
  net: qed*: Reduce RX and TX default ring count when running inside
kdump kernel
  net: qed: Disable SRIOV functionality inside kdump kernel

 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++---
 drivers/net/ethernet/qlogic/qede/qede.h  |  5 +++--
 drivers/net/ethernet/qlogic/qede/qede_main.c |  2 +-
 3 files changed, 11 insertions(+), 6 deletions(-)

-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel

2020-05-05 Thread Bhupesh Sharma
Normally kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs.

Currently the qed* ethernet driver ends up consuming a lot of memory in
the kdump kernel, leading to kdump kernel panic when one tries to save
the vmcore via ssh/nfs (thus utilizing the services of the underlying
qed* network interfaces).

An example OOM message log seen in the kdump kernel can be seen here
[1], with crashkernel size reservation of 512M.

Using tools like memstrack (see [2]), we can track the modules taking up
the bulk of memory in the kdump kernel and organize the memory usage
output as per 'highest allocator first'. An example log for the OOM case
indicates that the qed* modules end up allocating approximately 216M
memory, which is a large part of the total crashkernel size:

 dracut-pre-pivot[676]:  Report format module_summary: 
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 
149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 
65.3MB (1045 pages)

This patch reduces the default RX and TX ring count from 1024 to 64
when running inside kdump kernel, which leads to a significant memory
saving.

An example log with the patch applied shows the reduced memory
allocation in the kdump kernel:
 dracut-pre-pivot[674]:  Report format module_summary: 
 dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 
141.8MB (2268 pages)
 <..snip..>
[dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 
4.9MB (78 pages)

Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
network interface after applying this patch.

[1] OOM log:


 kworker/0:6: page allocation failure: order:6,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
 kworker/0:6 cpuset=/ mems_allowed=0
 CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
 Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
 01/18/2019
 Workqueue: events work_for_cpu_fn
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x24/0x30
  dump_stack+0x90/0xb4
  warn_alloc+0xf4/0x178
  __alloc_pages_nodemask+0xcac/0xd58
  alloc_pages_current+0x8c/0xf8
  kmalloc_order_trace+0x38/0x108
  qed_iov_alloc+0x40/0x248 [qed]
  qed_resc_alloc+0x224/0x518 [qed]
  qed_slowpath_start+0x254/0x928 [qed]
   __qede_probe+0xf8/0x5e0 [qede]
  qede_probe+0x68/0xd8 [qede]
  local_pci_probe+0x44/0xa8
  work_for_cpu_fn+0x20/0x30
  process_one_work+0x1ac/0x3e8
  worker_thread+0x44/0x448
  kthread+0x130/0x138
  ret_from_fork+0x10/0x18
  Cannot start slowpath
  qede: probe of :05:00.1 failed with error -12

[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: Ariel Elior 
Cc: gr-everest-linux...@marvell.com
Cc: Manish Chopra 
Cc: David S. Miller 
Signed-off-by: Bhupesh Sharma 
---
 drivers/net/ethernet/qlogic/qede/qede.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 234c6f30effb..b55ab32ef0b3 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -32,6 +32,7 @@
 #ifndef _QEDE_H_
 #define _QEDE_H_
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -574,13 +575,13 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev, __be16 
proto,
 #define RX_RING_SIZE   ((u16)BIT(RX_RING_SIZE_POW))
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
 #define NUM_RX_BDS_MIN 128
-#define NUM_RX_BDS_DEF ((u16)BIT(10) - 1)
+#define NUM_RX_BDS_DEF ((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : 
((u16)BIT(10) - 1))
 
 #define TX_RING_SIZE_POW   13
 #define TX_RING_SIZE   ((u16)BIT(TX_RING_SIZE_POW))
 #define NUM_TX_BDS_MAX (TX_RING_SIZE - 1)
 #define NUM_TX_BDS_MIN 128
-#define NUM_TX_BDS_DEF NUM_TX_BDS_MAX
+#define NUM_TX_BDS_DEF ((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : 
NUM_TX_BDS_MAX)
 
 #define QEDE_MIN_PKT_LEN   64
 #define QEDE_RX_HDR_SIZE   256
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-04-29 Thread Bhupesh Sharma
On Tue, Apr 28, 2020 at 3:37 PM Catalin Marinas  wrote:
>
> On Tue, Apr 28, 2020 at 01:55:58PM +0530, Bhupesh Sharma wrote:
> > On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  wrote:
> > > On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > > >  arch/arm64/configs/defconfig | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > > > index 24e534d85045..fa122f4341a2 100644
> > > > --- a/arch/arm64/configs/defconfig
> > > > +++ b/arch/arm64/configs/defconfig
> > > > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> > > >  CONFIG_NUMA=y
> > > >  CONFIG_SECCOMP=y
> > > >  CONFIG_KEXEC=y
> > > > +CONFIG_KEXEC_FILE=y
> > > >  CONFIG_CRASH_DUMP=y
> > > >  CONFIG_XEN=y
> > > >  CONFIG_COMPAT=y
> > > > --
> > > > 2.7.4
> >
> > Thanks a lot  Mark.
> >
> > Hi Catalin, Will,
> >
> > Can you please help pick this patch in the arm tree. We have an
> > increasing number of user-cases from distro users
> > who want to use kexec_file_load() as the default interface for
> > kexec/kdump on arm64.
>
> We could pick it up if it doesn't conflict with the arm-soc tree. They
> tend to pick most of the defconfig changes these days (and could as well
> pick this one).

Thanks Catalin.
(+Cc Arnd)

Hi Arnd,

Can you please help pick this change via the arm-soc tree?

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-04-28 Thread Bhupesh Sharma
On Wed, Apr 8, 2020 at 4:17 PM Mark Rutland  wrote:
>
> On Tue, Apr 07, 2020 at 04:01:40AM +0530, Bhupesh Sharma wrote:
> > kexec_file_load() syscall interface is now supported for
> > arm64 architecture as well via commits:
> > 3751e728cef2 ("arm64: kexec_file: add crash dump support") and
> > 3ddd9992a590 ("arm64: enable KEXEC_FILE config")].
> >
> > This patch enables config KEXEC_FILE by default in the
> > arm64 defconfig, so that user-space tools like kexec-tools
> > can use the same as the default interface for kexec/kdump
> > on arm64.
> >
> > Cc: AKASHI Takahiro 
> > Cc: Catalin Marinas 
> > Cc: James Morse 
> > Cc: Mark Rutland 
> > Cc: Will Deacon 
> > Cc: kexec@lists.infradead.org
> >
> > Signed-off-by: Bhupesh Sharma 
>
> FWIW:
>
> Acked-by: Mark Rutland 
>
> Mark.
>
> > ---
> >  arch/arm64/configs/defconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index 24e534d85045..fa122f4341a2 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
> >  CONFIG_NUMA=y
> >  CONFIG_SECCOMP=y
> >  CONFIG_KEXEC=y
> > +CONFIG_KEXEC_FILE=y
> >  CONFIG_CRASH_DUMP=y
> >  CONFIG_XEN=y
> >  CONFIG_COMPAT=y
> > --
> > 2.7.4
> >

Thanks a lot  Mark.

Hi Catalin, Will,

Can you please help pick this patch in the arm tree. We have an
increasing number of user-cases from distro users
who want to use kexec_file_load() as the default interface for
kexec/kdump on arm64.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image

2020-04-12 Thread Bhupesh SHARMA
On Mon, Apr 13, 2020 at 1:26 AM Eric W. Biederman  wrote:
>
>
> The only benefit of kexec_file_load is that it is simple enough from a
> kernel perspective that signatures can be checked.
>
> kexec_load in every other respect is the more capable and functional
> interface.  It makes no sense to get rid of it.
>
> It does make sense to reload with a loaded kernel on memory hotplug.
> That is simple and easy.  If we are going to handle something in the
> kernel it should simple an automated unloading of the kernel on memory
> hotplug.
>
>
> I think it would be irresponsible to deprecate kexec_load on any
> platform.
>
> I also suspect that kexec_file_load could be taught to copy the dtb
> on arm32 if someone wants to deal with signatures.
>
> We definitely can not even think of deprecating kexec_load until
> architecture that supports it also supports kexec_file_load and everyone
> is happy with that interface.  That is Linus's no regression rule.

TBH, I have seen several active users of kexec_load on arm32
environments and we have been trying to help them with kexec issues on
arm32 in recent past as well.

So, I agree with Eric's view that probably deprecating this in favour
of kexec_file_load will break these existing environment.

I tried to do some work at the start of this year to add
kexec_file_load support for arm32 in my spare cycles, but I gave up as
the arm32 hardware had a broken firmware and couldn't boot latest
upstream kernel.

May be I try to find some spare cycles in the coming days to do it.

But I think since kexec_load is an important interface on these arm32
boards for supporting existing kexec-based bootloaders, we should
continue supporting the same until kexec_file_load is supported/mature
enough for arm32.

Thanks,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] arm64/defconfig: Enable CONFIG_KEXEC_FILE

2020-04-06 Thread Bhupesh Sharma
kexec_file_load() syscall interface is now supported for
arm64 architecture as well via commits:
3751e728cef2 ("arm64: kexec_file: add crash dump support") and
3ddd9992a590 ("arm64: enable KEXEC_FILE config")].

This patch enables config KEXEC_FILE by default in the
arm64 defconfig, so that user-space tools like kexec-tools
can use the same as the default interface for kexec/kdump
on arm64.

Cc: AKASHI Takahiro 
Cc: Catalin Marinas 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: kexec@lists.infradead.org

Signed-off-by: Bhupesh Sharma 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 24e534d85045..fa122f4341a2 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -66,6 +66,7 @@ CONFIG_SCHED_SMT=y
 CONFIG_NUMA=y
 CONFIG_SECCOMP=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_CRASH_DUMP=y
 CONFIG_XEN=y
 CONFIG_COMPAT=y
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] net: ena: Add PCI shutdown handler to allow safe kexec

2020-03-24 Thread Bhupesh Sharma

Hi Guilherme,

On 03/20/2020 06:25 PM, Guilherme G. Piccoli wrote:

Currently ENA only provides the PCI remove() handler, used during rmmod
for example. This is not called on shutdown/kexec path; we are potentially
creating a failure scenario on kexec:

(a) Kexec is triggered, no shutdown() / remove() handler is called for ENA;
instead pci_device_shutdown() clears the master bit of the PCI device,
stopping all DMA transactions;

(b) Kexec reboot happens and the device gets enabled again, likely having
its FW with that DMA transaction buffered; then it may trigger the (now
invalid) memory operation in the new kernel, corrupting kernel memory area.

This patch aims to prevent this, by implementing a shutdown() handler
quite similar to the remove() one - the difference being the handling
of the netdev, which is unregistered on remove(), but following the
convention observed in other drivers, it's only detached on shutdown().

This prevents an odd issue in AWS Nitro instances, in which after the 2nd
kexec the next one will fail with an initrd corruption, caused by a wild
DMA write to invalid kernel memory. The lspci output for the adapter
present in my instance is:

00:05.0 Ethernet controller [0200]: Amazon.com, Inc. Elastic Network
Adapter (ENA) [1d0f:ec20]


Thanks for the patch.


Suggested-by: Gavin Shan 
Signed-off-by: Guilherme G. Piccoli 
---


The idea for this patch came from an informal conversation with my
friend Gavin Shan, based on his past experience with similar issues.
I'd like to thank him for the great suggestion!

As a test metric, I've performed 1000 kexecs with this patch, whereas
without this one, the 3rd kexec failed with initrd corruption. Also,
one test that I've done before writing the patch was just to rmmod
the driver before the kexecs, and it worked fine too.

I suggest we add this patch in stable releases as well.
Thanks in advance for reviews,


This patch fixes the repetitive kexec reboot issues that I was facing 
for some time on the aws nitro (t3) machines. Normally the kexec reboots 
would not runs more than ~ 3 to 5 times on the machine.


Now with this patch, I can runs hundreds of repetitive nested kexec 
reboots on the aws nitro machines without any failure.


So, I think this is a really good patch and should be applied to stable 
trees as well.


Please feel free to add:

Tested-and-Reviewed-by: Bhupesh Sharma 

Thanks,
Bhupesh


Guilherme


  drivers/net/ethernet/amazon/ena/ena_netdev.c | 51 
  1 file changed, 41 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 0b2fd96b93d7..7a5c01ff2ee8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -4325,13 +4325,15 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
  
  /*/
  
-/* ena_remove - Device Removal Routine

+/* __ena_shutoff - Helper used in both PCI remove/shutdown routines
   * @pdev: PCI device information struct
+ * @shutdown: Is it a shutdown operation? If false, means it is a removal
   *
- * ena_remove is called by the PCI subsystem to alert the driver
- * that it should release a PCI device.
+ * __ena_shutoff is a helper routine that does the real work on shutdown and
+ * removal paths; the difference between those paths is with regards to whether
+ * dettach or unregister the netdevice.
   */
-static void ena_remove(struct pci_dev *pdev)
+static void __ena_shutoff(struct pci_dev *pdev, bool shutdown)
  {
struct ena_adapter *adapter = pci_get_drvdata(pdev);
struct ena_com_dev *ena_dev;
@@ -4350,13 +4352,17 @@ static void ena_remove(struct pci_dev *pdev)
  
  	cancel_work_sync(>reset_task);
  
-	rtnl_lock();

+   rtnl_lock(); /* lock released inside the below if-else block */
ena_destroy_device(adapter, true);
-   rtnl_unlock();
-
-   unregister_netdev(netdev);
-
-   free_netdev(netdev);
+   if (shutdown) {
+   netif_device_detach(netdev);
+   dev_close(netdev);
+   rtnl_unlock();
+   } else {
+   rtnl_unlock();
+   unregister_netdev(netdev);
+   free_netdev(netdev);
+   }
  
  	ena_com_rss_destroy(ena_dev);
  
@@ -4371,6 +4377,30 @@ static void ena_remove(struct pci_dev *pdev)

vfree(ena_dev);
  }
  
+/* ena_remove - Device Removal Routine

+ * @pdev: PCI device information struct
+ *
+ * ena_remove is called by the PCI subsystem to alert the driver
+ * that it should release a PCI device.
+ */
+
+static void ena_remove(struct pci_dev *pdev)
+{
+   __ena_shutoff(pdev, false);
+}
+
+/* ena_shutdown - Device Shutdown Routine
+ * @pdev: PCI device information struct
+ *
+ * ena_shutdown is called by the PCI subsystem to alert the driver that
+ * a shutdown/reboot (or kexec) is happening and device m

Re: About kexec issues in AWS nitro instances (RH bz 1758323)

2020-03-23 Thread Bhupesh Sharma
Hi Guilherme,

On Mon, Mar 23, 2020 at 8:16 PM Guilherme G. Piccoli
 wrote:
>
> On 22/03/2020 18:16, Bhupesh Sharma wrote:
> > Hello Guilherme,
> >
> > On Fri, Mar 20, 2020 at 9:10 PM Guilherme G. Piccoli
> >  wrote:
> >
> > Thanks for writing again. I was caught up in trying several other
> > suggestions/code-snippets to further debug this.
> > I tried several combinations - turning iommu off, turning off swiotlb
> > in the kexec kernel and testing various combinations with
> > retain_initrd added to the kexec kernel's bootargs.
> >
> > But nothing seems to fix the nested repetitive kexec reboot attempts
> > on the aws t3 machines I have. It just becomes better on few instances
> > (i.e. the kexec reboots would survive around 10 nested repetitive
> > attempts), while on the other(s) the failure can be seen quite
> > frequently (approx ~3 kexec reboot attempts).
>
> Hi Bhupesh, thanks for the tests! Indeed, this problem is difficult to
> prevent with those parameters, and it's quite interesting to see how it
> may vary among instances.

Indeed.

> > [...]
> > This is a really good debug and resulting patch.
> > I ran almost ~60 kexec repetitive attempts last night and also
> > repeated the same today morning and
> > the issue seems to get fixed for me with upstream kernel 5.6.0-rc6+
> > with this patch.
> >
> > I am leaving a test running with RHEL kernel + this patch overnight
> > and will have more updates to share by tomorrow morning.
>
> Thanks a lot =)
> I couldn't fail to give due credit to my friend Gavin Shan for the great
> suggestion that resulted in the patch! Let me know your results with the
> patch Bhupesh, and your Tested-by on it is much appreciated.
>
>
> >
> >> Bhupesh, I've noticed that suddenly the Red Hat bugzilla got private -
> >
> > Oops. I will check.
> >
> >> is it okay to add me in CC list so I can see it?
> >
> > Sure. I tried doing it, but seems Bugzilla is not happy as it keeps
> > complaining that you are not registered on BZ,
> > I will try to find out internally how to get around the issue.
> >
>
> Great! If you need me to sign-up in Bugzilla, I can do it. Just let me
> know the steps and I'd be glad in doing that.

Yes, please. I checked internally. If you can sign-up for Bugzilla, I
can directly add you to the Cc field of the Bugzilla work-item.

> >> Thanks for all the collaboration, I hope the issue was figured and solved!
> >
> > Sure. Thanks a lot for your inputs and trying the suggestions I posted
> > on the Bugzilla ticket.
> > I will soon share an update with RHEL/Fedora kernel kexec tests with
> > this patch applied and also reply with a Tested-by for the upstream
> > patch in the relevant thread.
> >
> > Thanks,
> > Bhupesh
> >
>
> Thank you, I appreciate the tests and collaboration =)
> Cheers,

No problem. The good news is that two runs of approx. ~200 runs of
nested kexec reboots worked even with RHEL/Fedora + your patch on the
aws t3 instance for me.

So, this looks like a real good patch to have upstream. Thanks a lot
for sharing and working on it.

I will go ahead and add my Tested-by for the upstream patch as well.

Thanks for all your help,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: About kexec issues in AWS nitro instances (RH bz 1758323)

2020-03-22 Thread Bhupesh Sharma
Hello Guilherme,

On Fri, Mar 20, 2020 at 9:10 PM Guilherme G. Piccoli
 wrote:

Thanks for writing again. I was caught up in trying several other
suggestions/code-snippets to further debug this.
I tried several combinations - turning iommu off, turning off swiotlb
in the kexec kernel and testing various combinations with
retain_initrd added to the kexec kernel's bootargs.

But nothing seems to fix the nested repetitive kexec reboot attempts
on the aws t3 machines I have. It just becomes better on few instances
(i.e. the kexec reboots would survive around 10 nested repetitive
attempts), while on the other(s) the failure can be seen quite
frequently (approx ~3 kexec reboot attempts).

> Well, it seems we have some good results with this patch [0] - the idea
> behind the issue is that ena network driver has no PCI shutdown()
> handler, which would be called to gently quiesce the device before the
> kexec. The PCI stack in this case clears the master bit of the device
> configuration space, effectively stopping all the DMA transactions. But
> then, when the system boots the kexec'ed kernel, the network device
> firmware may send a memory write regarding that stopped DMA transaction
> (that is now invalid), corrupting some random kernel memory area.
>
> I've ran 1000 kexecs tests with mainline (5.6-rc5) + this patch and no
> failures were observed. Also, I'm running a test with Ubuntu 5.3 kernel
> + this patch and achieved > 450 runs now, with no failures (test is
> ongoing).
>
> I've tried to dump the initrd content (could be useful now to identify
> the corruption signature, maybe matching some ena admin queue periodic
> task) but I had trouble collecting the dmesg in case of failure. It gets
> huge and requires a big ramoops allocation, which unfortunately prevents
> the issue from happening (I guess the corruption ends-up happening in
> the ramoops reserved area, not initrd area anymore).

This is a really good debug and resulting patch.
I ran almost ~60 kexec repetitive attempts last night and also
repeated the same today morning and
the issue seems to get fixed for me with upstream kernel 5.6.0-rc6+
with this patch.

I am leaving a test running with RHEL kernel + this patch overnight
and will have more updates to share by tomorrow morning.

> Bhupesh, I've noticed that suddenly the Red Hat bugzilla got private -

Oops. I will check.

> is it okay to add me in CC list so I can see it?

Sure. I tried doing it, but seems Bugzilla is not happy as it keeps
complaining that you are not registered on BZ,
I will try to find out internally how to get around the issue.

> Thanks for all the collaboration, I hope the issue was figured and solved!

Sure. Thanks a lot for your inputs and trying the suggestions I posted
on the Bugzilla ticket.
I will soon share an update with RHEL/Fedora kernel kexec tests with
this patch applied and also reply with a Tested-by for the upstream
patch in the relevant thread.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH makedumpfile] Align PMD_SECTION_MASK with PHYS_MASK

2020-03-17 Thread Bhupesh Sharma
On Wed, Mar 18, 2020 at 2:35 AM Michal Suchánek  wrote:
>
> On Wed, Mar 18, 2020 at 01:49:05AM +0530, Bhupesh Sharma wrote:
> > On Wed, Mar 18, 2020 at 1:05 AM Michal Suchánek  wrote:
> > >
> > > On Tue, Mar 17, 2020 at 02:14:22PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > > > Hi Michal,
> > > >
> > > > Thank you for the patch.
> > > >
> > > > > -Original Message-
> > > > > Reportedly on some arm64 systems makedumpfile loops forever exhausting
> > > > > all memory when filtering kernel core. It turns out the reason is it
> > > > > cannot resolve some addresses because the PMD mask is wrong. When
> > > > > physical address mask allows up to 48bits pmd mask should allow the
> > > > > same.
> > > > > I suppose you would need a system that needs physical addresses over 
> > > > > 1TB
> > > > > to be able to reproduce this. This may be either because you have a 
> > > > > lot
> > > > > of memory or because the firmware mapped some memory above 1TB for 
> > > > > some
> > > > > reason.
> > > > >
> > > > > Signed-off-by: Michal Suchanek 
> > > > > ---
> > > > >  arch/arm64.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/arch/arm64.c b/arch/arm64.c
> > > > > index 43164ccc32d4..54d60b440850 100644
> > > > > --- a/arch/arm64.c
> > > > > +++ b/arch/arm64.c
> > > > > @@ -81,7 +81,7 @@ static unsigned long kimage_voffset;
> > > > >   * Remove the highest order bits that are not a part of the
> > > > >   * physical address in a section
> > > > >   */
> > > > > -#define PMD_SECTION_MASK   ((1UL << 40) - 1)
> > > > > +#define PMD_SECTION_MASK   ((1UL << PHYS_MASK_SHIFT) - 1)
> > > > >
> > > > >  #define PMD_TYPE_MASK  3
> > > > >  #define PMD_TYPE_SECT  1
> > > > > --
> > > > > 2.23.0
> > > > >
> > > >
> > > > Then I'd prefer to remove PMD_SECTION_MASK and use PHYS_MASK instead.
> > > > Is it OK?  Keeping PMD_SECTION_MASK looks a little confusing to me.
> > >
> > > This code will need to be changed for 52bit support. It remains to be
> > > seen if the mask will be still the same after that. I would go with just
> > > the minimal fix for now to not complicate things.
> >
> > Exactly. I am planning to send out the latest refresh of the kernel
> > and makedumpfile changes for 52-bit makedumpfile/crash support this
> > week.
> >
> > If we can wait for the same, I think it would be better as the code
> > changes/names would be more streamlined and similar to Linux
> > conventions.
> >
> > Please let me know if that makes sense.
>
> I think both is useful. This is a minimal patch that can be applied to
> historical versions of makedumpfile in distributions. This seems to have
> been broken for quite a while already.
>
> And while 52bit support is nice I don't have the hardware to test it so
> it is obviously not that useful for me and many other makedumpfile
> users.

Well the 52-bit changes will still support older CPUs which don't
support the 52-bit ARMv8.2 extensions.
Also as we discussed in the review of the last version, they will
support older kernel + makedumpfile combinations as we need to support
them as well.

In-fact that would be one of the major changes in the latest respin.

However if Kazu is Ok with taking this fix, I have no issues with the
same as well.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH makedumpfile] Align PMD_SECTION_MASK with PHYS_MASK

2020-03-17 Thread Bhupesh Sharma
On Wed, Mar 18, 2020 at 1:05 AM Michal Suchánek  wrote:
>
> On Tue, Mar 17, 2020 at 02:14:22PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > Hi Michal,
> >
> > Thank you for the patch.
> >
> > > -Original Message-
> > > Reportedly on some arm64 systems makedumpfile loops forever exhausting
> > > all memory when filtering kernel core. It turns out the reason is it
> > > cannot resolve some addresses because the PMD mask is wrong. When
> > > physical address mask allows up to 48bits pmd mask should allow the
> > > same.
> > > I suppose you would need a system that needs physical addresses over 1TB
> > > to be able to reproduce this. This may be either because you have a lot
> > > of memory or because the firmware mapped some memory above 1TB for some
> > > reason.
> > >
> > > Signed-off-by: Michal Suchanek 
> > > ---
> > >  arch/arm64.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm64.c b/arch/arm64.c
> > > index 43164ccc32d4..54d60b440850 100644
> > > --- a/arch/arm64.c
> > > +++ b/arch/arm64.c
> > > @@ -81,7 +81,7 @@ static unsigned long kimage_voffset;
> > >   * Remove the highest order bits that are not a part of the
> > >   * physical address in a section
> > >   */
> > > -#define PMD_SECTION_MASK   ((1UL << 40) - 1)
> > > +#define PMD_SECTION_MASK   ((1UL << PHYS_MASK_SHIFT) - 1)
> > >
> > >  #define PMD_TYPE_MASK  3
> > >  #define PMD_TYPE_SECT  1
> > > --
> > > 2.23.0
> > >
> >
> > Then I'd prefer to remove PMD_SECTION_MASK and use PHYS_MASK instead.
> > Is it OK?  Keeping PMD_SECTION_MASK looks a little confusing to me.
>
> This code will need to be changed for 52bit support. It remains to be
> seen if the mask will be still the same after that. I would go with just
> the minimal fix for now to not complicate things.

Exactly. I am planning to send out the latest refresh of the kernel
and makedumpfile changes for 52-bit makedumpfile/crash support this
week.

If we can wait for the same, I think it would be better as the code
changes/names would be more streamlined and similar to Linux
conventions.

Please let me know if that makes sense.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: QUESTION : dracut debugging for kdump

2020-03-09 Thread Bhupesh Sharma
On Tue, Mar 10, 2020 at 1:18 AM John Donnelly
 wrote:
>
>
>
> > On Mar 9, 2020, at 1:51 PM, Bhupesh Sharma  wrote:
> >
> > Hi John,
> >
> > On Mon, Mar 9, 2020 at 10:38 PM John Donnelly
> >  wrote:
> >>
> >> Hi kexec/kdump  team.
> >>
> >> I am not sure if this is the appropriate dlist to ask .  If not ,  I 
> >> apologize .
> >>
> >>
> >> I am having difficulties mounting a iSCSI target under kdump .
> >
> > We have had several known issues with iSCSI targets under kdump which
> > have been fixed, so just wanted to check which kexec-tools version you
> > are using: upstream or fedora?
> >
> > Can you please share the output of:
> > $ kexec -v
>
> # kexec -v
> kexec-tools 2.0.15
> /usr/sbin/kexec
> # rpm -qf   `which kexec `
> kexec-tools-2.0.15-33.0.9.el7.x86_64

Ok, so it seems to me that you are using a RHEL-7 kexec-tools version
for x86_64.
I am not sure if this list is an appropriate forum for the same.

I would suggest that you can look at a newer RHEL-7 kexec-tools
version to try (which have several kexec-tools iSCSI issues fixed), or
search the RH Bugzilla for iSCSI related issues and fixes (for
example: <https://bugzilla.redhat.com/show_bug.cgi?id=1566331>).

If you suspect that this could be a new/regression issue, I would
suggest that you open a new BZ against the RHEL-7 kexec-tools
component, so that someone from RH can help you with it.

> >> The target is discovered ,
> >>
> >> [ 154.118729] sd 2:0:0:0: [sda] Attached SCSI disk
> >> [ OK ] Found device ZFS_Storage_7350 4.
> >> Starting File System Check on /dev/...a-5ef4-4838-b5e7-dff852dfc673...
> >> [ OK ] Started File System Check on 
> >> /dev/d...46a-5ef4-4838-b5e7-dff852dfc673.
> >> [ 275.766578] dracut-initqueue[420]: Warning: dracut-initqueue timeout - 
> >> startinging
> >>
> >>
> >> The fsck step never finishes, and dracut timeout eventually drops into the 
> >> rescue shell.
> >>
> >>
> >> I can see it is attempting to fsck root from the UUID :
> >>
> >>
> >> # blkid | grep b5
> >> /dev/sda4: UUID="fb01846a-5ef4-4838-b5e7-dff852dfc673" TYPE="xfs" 
> >> PARTUUID="f8831f2d-b2c6-4b27-97db-0513e4d3fe42"
> >> 3:02
> >> " /dev/sda4 30G 3.5G 27G 12% /
> >>
> >>
> >> And I mount root manually and look around.
> >>
> >> # kdump:/# mkdir /mnt
> >> # kdump:/# mount /dev/sda4 /mnt
> >> [ 542.856035] SGI XFS with ACLs, security attributes, realtime, scrub, 
> >> repair, no debug enabled
> >> [ 542.885016] XFS (sda4): Mounting V4 Filesystem
> >> [ 542.938507] XFS (sda4): Starting recovery (logdev: internal)
> >> [ 542.994720] XFS (sda4): Ending recovery (logdev: internal)
> >> [ 543.018918] xfs filesystem being mounted at /mnt supports timestamps 
> >> until 2038 (0x7fff)
> >> kdump:/#
> >> kdump:/#
> >> kdump:/# chroot /mnt/ /usr/bin/bash
> >> bash-4.2#
> >>
> >>   Now my iSCSI  target is mounted as /mnt
> >>
> >>
> >> Is there a way start dracut so it stops BEFORE the fsck step  ?  Not after 
> >> it fails ?
> >
> > I think you can try using 'rd.break' dracut option:
> >
> >   
> > rd.break={cmdline|pre-udev|pre-trigger|initqueue|pre-mount|mount|pre-pivot|cleanup}
> >   drop to a shell on defined breakpoint
> >
>
>   Thanks for the hint !  I will try these.

Ok.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: QUESTION : dracut debugging for kdump

2020-03-09 Thread Bhupesh Sharma
Hi John,

On Mon, Mar 9, 2020 at 10:38 PM John Donnelly
 wrote:
>
> Hi kexec/kdump  team.
>
> I am not sure if this is the appropriate dlist to ask .  If not ,  I 
> apologize .
>
>
> I am having difficulties mounting a iSCSI target under kdump .

We have had several known issues with iSCSI targets under kdump which
have been fixed, so just wanted to check which kexec-tools version you
are using: upstream or fedora?

Can you please share the output of:
$ kexec -v

> The target is discovered ,
>
> [ 154.118729] sd 2:0:0:0: [sda] Attached SCSI disk
> [ OK ] Found device ZFS_Storage_7350 4.
> Starting File System Check on /dev/...a-5ef4-4838-b5e7-dff852dfc673...
> [ OK ] Started File System Check on /dev/d...46a-5ef4-4838-b5e7-dff852dfc673.
> [ 275.766578] dracut-initqueue[420]: Warning: dracut-initqueue timeout - 
> startinging
>
>
> The fsck step never finishes, and dracut timeout eventually drops into the 
> rescue shell.
>
>
> I can see it is attempting to fsck root from the UUID :
>
>
> # blkid | grep b5
> /dev/sda4: UUID="fb01846a-5ef4-4838-b5e7-dff852dfc673" TYPE="xfs" 
> PARTUUID="f8831f2d-b2c6-4b27-97db-0513e4d3fe42"
> 3:02
> " /dev/sda4 30G 3.5G 27G 12% /
>
>
> And I mount root manually and look around.
>
> # kdump:/# mkdir /mnt
> # kdump:/# mount /dev/sda4 /mnt
> [ 542.856035] SGI XFS with ACLs, security attributes, realtime, scrub, 
> repair, no debug enabled
> [ 542.885016] XFS (sda4): Mounting V4 Filesystem
> [ 542.938507] XFS (sda4): Starting recovery (logdev: internal)
> [ 542.994720] XFS (sda4): Ending recovery (logdev: internal)
> [ 543.018918] xfs filesystem being mounted at /mnt supports timestamps until 
> 2038 (0x7fff)
> kdump:/#
> kdump:/#
> kdump:/# chroot /mnt/ /usr/bin/bash
> bash-4.2#
>
>Now my iSCSI  target is mounted as /mnt
>
>
> Is there a way start dracut so it stops BEFORE the fsck step  ?  Not after it 
> fails ?

I think you can try using 'rd.break' dracut option:

   
rd.break={cmdline|pre-udev|pre-trigger|initqueue|pre-mount|mount|pre-pivot|cleanup}
   drop to a shell on defined breakpoint

you can specify the same in the kdump bootargs, by modifying a system
configuration file (for e.g. its '/etc/sysconfig/kdump' on fedora/rhel
systems).

For example, you can use 'rd.break=cmdline' to drop to the dracut
shell to try and see if you can stop it before the 'fsck' step.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: About kexec issues in AWS nitro instances (RH bz 1758323)

2020-03-04 Thread Bhupesh Sharma
Hi,

On Mon, Mar 2, 2020 at 1:39 PM Dave Young  wrote:
>
> On 03/02/20 at 12:20am, Bhupesh Sharma wrote:
> > Hi Guilherme,
> >
> > On Sat, Feb 29, 2020 at 10:37 PM Guilherme G. Piccoli
> >  wrote:
> > >
> > > Hi Bhupesh and Dave (and everybody CC'ed here), I'm Guilherme Piccoli
> > > and I'm working in the same issue observed in RH bugzilla 1758323 [0] -
> > > or at least, it seems to be the the same heh
> >
> > Ok.
> >
> > > The reported issue in my case was that the 2nd kexec fails on Nitro
> > > instanced, and indeed it's reproducible. More than this, it shows as an
> > > initrd corruption. I've found 2 workarounds, using the "new" kexec
> > > syscall (by doing kexec -s -l) and keep the initrd memory "un-freed",
> > > using the kernel parameter "retain_initrd".
> >
> > I have a couple of questions:
> > - How do you conclude that you see an initrd corruption across kexec?
> > Do you print the initial hex contents of initrd across kexec?
>
> I'm also interested if any of you can dump the initrd memory in kernel
> printk log, and then save to somewhere to compare with the original
> initrd content.

I did several overnight tests on the aws machine and can confirm kexec
reboot failure issue (multiple tries) can be seen even with
'retain_initrd' in the kernel bootargs or by using kexec_file_load
('kexec -s -l') instead of plain kexec_load ('kexec -l').

- Here are my observations:

1. Adding 'retain_initrd' to the bootargs, helps delay the kexec
reboot failure (when successive kexec reboots are executed), but the
(possible ?) initrd corruption is still seen (as per the panic logs
from the kexec kernel).

2. I printed the first 4M of initrd file via kernel code (both in the
primary and kexec kernel, see
<https://bugzilla.redhat.com/attachment.cgi?id=1667523> and
<https://bugzilla.redhat.com/attachment.cgi?id=1667521>) and
interestingly the first 4M contents are _exactly_ similar for primary
and kexec kernel, even though we see a (possible ?) initrd corruption.
See logs below from kexec kernel in case of panic:

[4.229170] Call Trace:
[4.234379]  dump_stack+0x5c/0x80
[4.239840]  panic+0xe7/0x2a9
[4.245291]  do_exit.cold.22+0x59/0x81
[4.251025]  do_group_exit+0x3a/0xa0
[4.256784]  __x64_sys_exit_group+0x14/0x20
[4.262905]  do_syscall_64+0x5b/0x1a0
[4.268537]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[4.275784] RIP: 0033:0x7ff749106e2e
[4.281469] Code: Bad RIP value.
[4.286981] RSP: 002b:7fffb6d707f8 EFLAGS: 0206 ORIG_RAX:
00e7
[4.298381] RAX: ffda RBX: 7ff74910f528 RCX: 7ff749106e2e
[4.305616] RDX: 007f RSI: 003c RDI: 007f
[4.313064] RBP: 7ff749306000 R08: 00e7 R09: 7fffb6d70708
[4.320369] R10:  R11: 0206 R12: 
[4.327671] R13: 0022 R14: 7ff749306148 R15: 7ff749306030
[4.335396] Kernel Offset: 0x2a40 from 0x8100
(relocation range: 0x8000-0xbfff)
[4.348002] ---[ end Kernel panic - not syncing: Attempted to kill
init! exitcode=0x7f00
[4.348002]  ]---
2020-03-03T09:01:27+00:00

3. So the root-cause seems to be something else. I will do some more
debugging to evaluate the same.

4. I added two scripts (via
<https://bugzilla.redhat.com/attachment.cgi?id=1667561> and
<https://bugzilla.redhat.com/attachment.cgi?id=1667560>) which provide
an automated reproducer.

This reproducer can be run on the Host machine and launches repeated
kexec reboots on the aws machine.

Normally approx. 5-12 runs of the master script (i.e. kexec reboots)
can lead to a panic in the kexec kernel which indicates a (possible ?)
initrd corruption.

@Guilherme: Can you please help verify the observations on your setup
(both amazon and upstream kernel) using the automated test script?
Thanks.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: About kexec issues in AWS nitro instances (RH bz 1758323)

2020-03-01 Thread Bhupesh Sharma
Hi Guilherme,

On Sat, Feb 29, 2020 at 10:37 PM Guilherme G. Piccoli
 wrote:
>
> Hi Bhupesh and Dave (and everybody CC'ed here), I'm Guilherme Piccoli
> and I'm working in the same issue observed in RH bugzilla 1758323 [0] -
> or at least, it seems to be the the same heh

Ok.

> The reported issue in my case was that the 2nd kexec fails on Nitro
> instanced, and indeed it's reproducible. More than this, it shows as an
> initrd corruption. I've found 2 workarounds, using the "new" kexec
> syscall (by doing kexec -s -l) and keep the initrd memory "un-freed",
> using the kernel parameter "retain_initrd".

I have a couple of questions:
- How do you conclude that you see an initrd corruption across kexec?
Do you print the initial hex contents of initrd across kexec?
- Also do you try repeated/nested kexec and see initrd corruption
after several kexec reboot attempts?

I have the following observations on my Nitro instance:
- With upstream kernel (5.6.0-rc3), I am seeing that the repeated
kexec attempts even with 'kexec -s -l' and using 'retain_initrd' in
the kernel bootargs, can I lead to kexec reboot failures. Although the
frequency of the failure goes down drastically with these, as compared
to vanilla 'kexec -s' invocation.

Here are the aws console logs on the nitro console with kernel
5.6.0-rc3+ on an x86_64 instance when the 'kexec -s -l' or 'kexec -l'
with 'retain_initrd' fails:

login: [   80.077578] Unregister pv shared memory for cpu 1
[   80.081755] Unregister pv shared memory for cpu 0
[   80.209953] kexec_core: Starting new kernel
2020-02-29T19:20:16+00:00
<.. no console logs after this (even after adding earlycon) ..>

- Note that there are no updated console log from the kexec kernel in
the failure case, so I am not sure if this was caused by some other
issue or the initrd corruption only.

- With the above, one needs to execute kexec reboot repeatedly and
normally in the ~ 11-15 kexec reboot run, you can see a kexec reboot
failure.

> I've noticed that your interesting investigation in the BZ led to
> SWIOTLB as a potential culprit, but trying with "swiotlb=noforce" or
> even "iommu=off" didn't help me.
> Also, worth notice a weird behavior: seems Amazon Linux 2 (based on
> kernel 4.14) sometimes works, or better saying, in some instances it
> works. I have 2x t3.large instances, in one of them I can make the
> Amazon Linux works (and to isolate potential out-of-tree patches, I've
> used Amazon Linux 2 config file and built a mainline 4.14, which also
> works in that particular instance).

That's good news, I am not sure about Amazon Linux (I am not sure if
the source for the same is available without buying a license).

I can share that "swiotlb=noforce" worked for me on one instance, but
the same was not reproducible on other nitro instances, so I think the
background issue is initrd corruption, but not able to pin-point at
the root-cause of the corruption yet.

BTW, have you been able to try the following kexec-tools fix as well
(see [1]) and see if this fixes the initrd corruption with 'kexec -s
-l' and 'kexec -l' (i.e. without using 'retain_initrd' bootargs)

[1]. http://lists.infradead.org/pipermail/kexec/2020-February/024531.html

> The reason for this email is to ask if you managed to figure the issue
> root-cause, or have some leads. I continue the debug here, but it's a
> bit difficult without access to AWS hypervisor (and it seems like a
> hypervisor issue for me). The fact that preserving the initrd memory
> prevents the problem seems to indicate that after freeing such
> high-address memory, the hypervisor somewhat manages to use that
> regardless if some other code is using that...ending up corrupting the
> initrd.
>
> I've also looped the kexec list in order to grow the audience, maybe
> somebody already faced that kind of issues and have some ideas.
> A collaboration in this debug would be greatly appreciate by me, it's a
> quite interesting issue and I'm looking forward to understand what's
> going on.
>
> Thanks in advance,

Thanks a lot for your email.
Let's continue discussing and hopefully we will have a fix for the issue soon.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] kexec: support parsing the string "Reserved" to get the correct e820 reserved region

2020-02-23 Thread Bhupesh Sharma
Hi Lianbo,

On Mon, Feb 24, 2020 at 12:07 PM Lianbo Jiang  wrote:
>
> When loading kernel and initramfs for kexec, kexec-tools could get the
> e820 reserved region from "/proc/iomem" in order to rebuild the e820
> ranges for kexec kernel, but there may be the string "Reserved" in the
> "/proc/iomem", which caused the failure of parsing. For example:
>
>  #cat /proc/iomem|grep -i reserved
> -0fff : Reserved
> 7f338000-7f34dfff : Reserved
> 7f3cd000-8fff : Reserved
> f17f-f17f1fff : Reserved
> fe00- : Reserved
>
> Currently, kexec-tools can not handle the above case because the memcmp()
> is case sensitive when comparing the string.
>
> So, let's fix this corner and make sure that the string "reserved" and
> "Reserved" in the "/proc/iomem" are both parsed appropriately.
>
> Signed-off-by: Lianbo Jiang 
> ---
> Note:
> Please follow up this commit below about kdump fix.
> 1ac3e4a57000 ("kdump: fix an error that can not parse the e820 reserved 
> region")
>
> Changes since v1:
> [1] use strncasecmp() instead of introducing another 'else-if'(
> suggested by Bhupesh)
>
>  kexec/arch/i386/kexec-x86-common.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kexec/arch/i386/kexec-x86-common.c 
> b/kexec/arch/i386/kexec-x86-common.c
> index 61ea19380ab2..9303704a0714 100644
> --- a/kexec/arch/i386/kexec-x86-common.c
> +++ b/kexec/arch/i386/kexec-x86-common.c
> @@ -90,7 +90,7 @@ static int get_memory_ranges_proc_iomem(struct memory_range 
> **range, int *ranges
> if (memcmp(str, "System RAM\n", 11) == 0) {
> type = RANGE_RAM;
> }
> -   else if (memcmp(str, "reserved\n", 9) == 0) {
> +   else if (strncasecmp(str, "reserved\n", 9) == 0) {
> type = RANGE_RESERVED;
> }
> else if (memcmp(str, "ACPI Tables\n", 12) == 0) {
> --
> 2.17.1
>

Thanks for the changes. V2 seems fine to me, so:

Acked-by: Bhupesh Sharma 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-02-23 Thread Bhupesh Sharma
Hi Amit,

On Fri, Feb 21, 2020 at 2:36 PM Amit Kachhap  wrote:
>
> Hi Bhupesh,
>
> On 1/13/20 5:44 PM, Bhupesh Sharma wrote:
> > Hi James,
> >
> > On 01/11/2020 12:30 AM, Dave Anderson wrote:
> >>
> >> - Original Message -
> >>> Hi Bhupesh,
> >>>
> >>> On 25/12/2019 19:01, Bhupesh Sharma wrote:
> >>>> On 12/12/2019 04:02 PM, James Morse wrote:
> >>>>> On 29/11/2019 19:59, Bhupesh Sharma wrote:
> >>>>>> vabits_actual variable on arm64 indicates the actual VA space size,
> >>>>>> and allows a single binary to support both 48-bit and 52-bit VA
> >>>>>> spaces.
> >>>>>>
> >>>>>> If the ARMv8.2-LVA optional feature is present, and we are running
> >>>>>> with a 64KB page size; then it is possible to use 52-bits of address
> >>>>>> space for both userspace and kernel addresses. However, any kernel
> >>>>>> binary that supports 52-bit must also be able to fall back to 48-bit
> >>>>>> at early boot time if the hardware feature is not present.
> >>>>>>
> >>>>>> Since TCR_EL1.T1SZ indicates the size offset of the memory region
> >>>>>> addressed by TTBR1_EL1 (and hence can be used for determining the
> >>>>>> vabits_actual value) it makes more sense to export the same in
> >>>>>> vmcoreinfo rather than vabits_actual variable, as the name of the
> >>>>>> variable can change in future kernel versions, but the architectural
> >>>>>> constructs like TCR_EL1.T1SZ can be used better to indicate intended
> >>>>>> specific fields to user-space.
> >>>>>>
> >>>>>> User-space utilities like makedumpfile and crash-utility, need to
> >>>>>> read/write this value from/to vmcoreinfo
> >>>>>
> >>>>> (write?)
> >>>>
> >>>> Yes, also write so that the vmcoreinfo from an (crashing) arm64
> >>>> system can
> >>>> be used for
> >>>> analysis of the root-cause of panic/crash on say an x86_64 host using
> >>>> utilities like
> >>>> crash-utility/gdb.
> >>>
> >>> I read this as as "User-space [...] needs to write to vmcoreinfo".
> >
> > That's correct. But for writing to vmcore dump in the kdump kernel, we
> > need to read the symbols from the vmcoreinfo in the primary kernel.
> >
> >>>>>> for determining if a virtual address lies in the linear map range.
> >>>>>
> >>>>> I think this is a fragile example. The debugger shouldn't need to know
> >>>>> this.
> >>>>
> >>>> Well that the current user-space utility design, so I am not sure we
> >>>> can
> >>>> tweak that too much.
> >>>>
> >>>>>> The user-space computation for determining whether an address lies in
> >>>>>> the linear map range is the same as we have in kernel-space:
> >>>>>>
> >>>>>> #define __is_lm_address(addr)(!(((u64)addr) &
> >>>>>> BIT(vabits_actual -
> >>>>>> 1)))
> >>>>>
> >>>>> This was changed with 14c127c957c1 ("arm64: mm: Flip kernel VA
> >>>>> space"). If
> >>>>> user-space
> >>>>> tools rely on 'knowing' the kernel memory layout, they must have to
> >>>>> constantly be fixed
> >>>>> and updated. This is a poor argument for adding this to something that
> >>>>> ends up as ABI.
> >>>>
> >>>> See above. The user-space has to rely on some ABI/guaranteed
> >>>> hardware-symbols which can be
> >>>> used for 'determining' the kernel memory layout.
> >>>
> >>> I disagree. Everything and anything in the kernel will change. The
> >>> ABI rules apply to
> >>> stuff exposed via syscalls and kernel filesystems. It does not apply
> >>> to kernel internals,
> >>> like the memory layout we used yesterday. 14c127c957c1 is a case in
> >>> point.
> >>>
> >>> A debugger trying to rely on this sort of thing would have to play
> >>> catchup whenever it
> >>> changes.
> >>
> >> Exactly.  That's the whole point.
&

Re: [PATCH] kexec: support parsing the string "Reserved" to get the correct e820 reserved region

2020-02-12 Thread Bhupesh Sharma
Hi Lianbo,

Thanks for the patch.

On Wed, Feb 12, 2020 at 6:27 PM Lianbo Jiang  wrote:
>
> When loading kernel and initramfs for kexec, kexec-tools could get the
> e820 reserved region from "/proc/iomem" in order to rebuild the e820
> ranges for kexec kernel, but there may be the string "Reserved" in the
> "/proc/iomem", which caused the failure of parsing. For example:
>
>  #cat /proc/iomem|grep -i reserved
> -0fff : Reserved
> 7f338000-7f34dfff : Reserved
> 7f3cd000-8fff : Reserved
> f17f-f17f1fff : Reserved
> fe00- : Reserved
>
> Currently, kexec-tools can not handle the above case because the memcmp()
> is case sensitive when comparing the string.
>
> So, let's fix this corner and make sure that the string "reserved" and
> "Reserved" in the "/proc/iomem" are both parsed appropriately.
>
> Signed-off-by: Lianbo Jiang 
> ---
> Note:
> Please follow up this commit below about kdump fix.
> 1ac3e4a57000 ("kdump: fix an error that can not parse the e820 reserved 
> region")
>
>  kexec/arch/i386/kexec-x86-common.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/kexec/arch/i386/kexec-x86-common.c 
> b/kexec/arch/i386/kexec-x86-common.c
> index 61ea19380ab2..86bcc8c0677e 100644
> --- a/kexec/arch/i386/kexec-x86-common.c
> +++ b/kexec/arch/i386/kexec-x86-common.c
> @@ -93,6 +93,9 @@ static int get_memory_ranges_proc_iomem(struct memory_range 
> **range, int *ranges
> else if (memcmp(str, "reserved\n", 9) == 0) {
> type = RANGE_RESERVED;
> }
> +   else if (memcmp(str, "Reserved\n", 9) == 0) {
> +   type = RANGE_RESERVED;
> +   }

Instead of introducing another 'else-if' case here, can we use
strncasecmp() instead.

It  compares the two input strings (say s1 and s2), ignoring the case
of the characters. Also it only compares the first n bytes of s1 (so
the format is the same as memcmp).

In this way, we can be sure to future-proof the kexec-tools code check
from future notation of the "Reserved" field in terms of the case used
to denote the "Reserved" string.

What's your view on the same?

Regards,
Bhupesh

> else if (memcmp(str, "ACPI Tables\n", 12) == 0) {
> type = RANGE_ACPI;
> }
> --
> 2.17.1
>
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] printing the final constructed kernel command line

2020-01-13 Thread Bhupesh Sharma

On 01/07/2020 03:40 AM, Eric DeVolder wrote:

Bhupesh,
I'm finding myself slammed with other tasks, if you'd like to kick start 
this patch, then please feel free!

eric


Ok Eric,

Let me try to send patches to fix the same.

Thanks,
Bhupesh



On 12/23/19 12:50 AM, Bhupesh Sharma wrote:
On Thu, Dec 19, 2019 at 11:27 PM Eric DeVolder 
 wrote:


Bhupesh,
Thank you. For the formal patch, would you be ok with a two phase 
approach, first where we add in

the dbgprintf(),


Sure Eric, I think you can send the patch with the dbgprintf() right
away. It seems a straight forward change and should be acceptable to
other reviewers I believe.


and followed later by a consolidation of the --command-line, --append,
--reuse-cmdline option code?


Actually, I did some work a few months ago (on the request of an arm32
kexec-tools user) on this consolidation, but I never got the time to
complete the same.

I will try to find out some time over this week to consolidate these
features and send an RFC patch. I will Cc you to the same. Hopefully
that should do the trick.

Thanks,
Bhupesh


On 12/19/19 7:34 AM, Bhupesh Sharma wrote:

Hi Eric,

On 12/19/2019 12:30 AM, Eric DeVolder wrote:

Thanks Bhupesh for the feedback, responses below!
eric

On 12/17/19 1:59 PM, Bhupesh Sharma wrote:

Hi Eric,

On 12/17/2019 02:02 AM, Eric DeVolder wrote:
The --command-line, --append, and --reuse-cmdline options to 
kexec can

be used in combination to craft a kernel command line for a kernel
loaded via kexec. In addition, the kexec tool may also manipulate
further the command line, eg.  elfcorehdr addition.


Thanks for proposing this change. I have some comments/queries 
(see below).



To aid in debugging kdump/kexec related issues, it would be helpful
for kexec to print the final constructed kernel command line 
argument.


For example, the following simple change (for i386/x86_64):

diff --git a/kexec/arch/i386/x86-linux-setup.c 
b/kexec/arch/i386/x86-linux-setup.c

index 057ee14..6dc4adc 100644
--- a/kexec/arch/i386/x86-linux-setup.c
+++ b/kexec/arch/i386/x86-linux-setup.c
@@ -57,6 +57,8 @@ void setup_linux_bootloader_parameters_high(
   char *cmdline_ptr;
   unsigned long initrd_base, initrd_addr_max;

+printf("Final kernel cmdline: '%s'\n", cmdline);
+


If we want to add this for debugging purposes, its better to have 
dbgprintf() instead of printf()
here. This will make sure that the debug message is printed only 
when '-d' flag is specified

while calling kexec utility from command-line.


Yes! I used printf() merely to provide an example of what is possible.


Ok.


   /* Say I'm a boot loader */
   real_mode->loader_type = LOADER_TYPE_KEXEC << 4;

results in the following on a systemd-based system (formatted to fit
in 70 char lines):

% systemctl status -l kdump.service
● kdump.service - Crash recovery kernel arming
 Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled;
  vendor preset: enabled)
 Active: active (exited) since Mon 2019-12-16 14:59:21 EST;
  2min 53s ago
Process: 14058 ExecStop=/usr/bin/kdumpctl stop (code=exited,
 status=0/SUCCESS)
Process: 14073 ExecStart=/usr/bin/kdumpctl start (code=exited,
 status=0/SUCCESS)
   Main PID: 14073 (code=exited, status=0/SUCCESS)

Dec 16 14:59:18 vm364 kdumpctl[14058]: Stopping kdump: [OK]
Dec 16 14:59:18 vm364 systemd[1]: Stopped Crash recovery kernel 
arming.
Dec 16 14:59:18 vm364 systemd[1]: Starting Crash recovery kernel 
arming...
Dec 16 14:59:21 vm364 kdumpctl[14073]: Final kernel cmdline: 
'BOOT_IMAGE=
   /vmlinuz-4.14.35-1902.7.3.1.el7uek.x86_64 ro rhgb quiet 
LANG=en_US.UTF-8
   irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off 
numa=off

   udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
   transparent_hugepage=never nokaslr novmcoredd 
disable_cpu_apicid=0

   elfcorehdr=901492K'
Dec 16 14:59:21 vm364 systemd[1]: Started Crash recovery kernel 
arming.

Dec 16 14:59:21 vm364 kdumpctl[14073]: kexec: loaded kdump kernel
Dec 16 14:59:21 vm364 kdumpctl[14073]: Starting kdump: [OK]

and the output is also available in /var/log/messages.


Since kdumpctl is a distribution specific script (Used by 
Fedora/RHEL), which invokes 'kexec'
under the hood, we can discuss the features supported by 'kexec' 
rather than the distribution
specific scripts (discussion regarding which are probably more 
suited to the Fedora kexec list:

ke...@lists.fedoraproject.org)


Agreed, this RFC is for a change to kexec, noting that wrapper 
scripts such as kdumpctl are

insufficient to provide the functionality requested.




There might also be an opportunity to consolidate handling of the
kernel command line, as most arch targets have the --command-line,
--append, and --reuse-cmdline options, though each arch 
independently

codes the support for these options.


This seems like a good idea, more on the same below ...

Note: Simply printing the cmdline in scripts such as kdump

Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2020-01-13 Thread Bhupesh Sharma

Hi James,

On 01/11/2020 12:30 AM, Dave Anderson wrote:


- Original Message -

Hi Bhupesh,

On 25/12/2019 19:01, Bhupesh Sharma wrote:

On 12/12/2019 04:02 PM, James Morse wrote:

On 29/11/2019 19:59, Bhupesh Sharma wrote:

vabits_actual variable on arm64 indicates the actual VA space size,
and allows a single binary to support both 48-bit and 52-bit VA
spaces.

If the ARMv8.2-LVA optional feature is present, and we are running
with a 64KB page size; then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.

Since TCR_EL1.T1SZ indicates the size offset of the memory region
addressed by TTBR1_EL1 (and hence can be used for determining the
vabits_actual value) it makes more sense to export the same in
vmcoreinfo rather than vabits_actual variable, as the name of the
variable can change in future kernel versions, but the architectural
constructs like TCR_EL1.T1SZ can be used better to indicate intended
specific fields to user-space.

User-space utilities like makedumpfile and crash-utility, need to
read/write this value from/to vmcoreinfo


(write?)


Yes, also write so that the vmcoreinfo from an (crashing) arm64 system can
be used for
analysis of the root-cause of panic/crash on say an x86_64 host using
utilities like
crash-utility/gdb.


I read this as as "User-space [...] needs to write to vmcoreinfo".


That's correct. But for writing to vmcore dump in the kdump kernel, we 
need to read the symbols from the vmcoreinfo in the primary kernel.



for determining if a virtual address lies in the linear map range.


I think this is a fragile example. The debugger shouldn't need to know
this.


Well that the current user-space utility design, so I am not sure we can
tweak that too much.


The user-space computation for determining whether an address lies in
the linear map range is the same as we have in kernel-space:

#define __is_lm_address(addr)(!(((u64)addr) & BIT(vabits_actual -
1)))


This was changed with 14c127c957c1 ("arm64: mm: Flip kernel VA space"). If
user-space
tools rely on 'knowing' the kernel memory layout, they must have to
constantly be fixed
and updated. This is a poor argument for adding this to something that
ends up as ABI.


See above. The user-space has to rely on some ABI/guaranteed
hardware-symbols which can be
used for 'determining' the kernel memory layout.


I disagree. Everything and anything in the kernel will change. The ABI rules 
apply to
stuff exposed via syscalls and kernel filesystems. It does not apply to kernel 
internals,
like the memory layout we used yesterday. 14c127c957c1 is a case in point.

A debugger trying to rely on this sort of thing would have to play catchup 
whenever it
changes.


Exactly.  That's the whole point.

The crash utility and makedumpfile are not in the same league as other 
user-space tools.
They have always had to "play catchup" precisely because they depend upon 
kernel internals,
which constantly change.


I agree with you and DaveA here. Software user-space debuggers are 
dependent on kernel internals (which can change from time-to-time) and 
will have to play catch-up (which has been the case since the very start).


Unfortunately we don't have any clear ABI for software debugging tools - 
may be something to look for in future.


A case in point is gdb/kgdb, which still needs to run with KASLR 
turned-off (nokaslr) for debugging, as it confuses gdb which resolve 
kernel symbol address from symbol table of vmlinux. But we can 
work-around the same in makedumpfile/crash by reading the 'kaslr_offset' 
value. And I have several users telling me now they cannot use gdb on 
KASLR enabled kernel to debug panics, but can makedumpfile + crash 
combination to achieve the same.


So, we should be looking to fix these utilities which are broken since 
the 52-bit changes for arm64. Accordingly, I will try to send the v6

soon while incorporating the comments posted on the v5.

Thanks,
Bhupesh





___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RESEND PATCH v5 2/5] arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo

2019-12-25 Thread Bhupesh Sharma

Hi James,

On 12/12/2019 04:02 PM, James Morse wrote:

Hi Bhupesh,

On 29/11/2019 19:59, Bhupesh Sharma wrote:

vabits_actual variable on arm64 indicates the actual VA space size,
and allows a single binary to support both 48-bit and 52-bit VA
spaces.

If the ARMv8.2-LVA optional feature is present, and we are running
with a 64KB page size; then it is possible to use 52-bits of address
space for both userspace and kernel addresses. However, any kernel
binary that supports 52-bit must also be able to fall back to 48-bit
at early boot time if the hardware feature is not present.

Since TCR_EL1.T1SZ indicates the size offset of the memory region
addressed by TTBR1_EL1 (and hence can be used for determining the
vabits_actual value) it makes more sense to export the same in
vmcoreinfo rather than vabits_actual variable, as the name of the
variable can change in future kernel versions, but the architectural
constructs like TCR_EL1.T1SZ can be used better to indicate intended
specific fields to user-space.

User-space utilities like makedumpfile and crash-utility, need to
read/write this value from/to vmcoreinfo


(write?)


Yes, also write so that the vmcoreinfo from an (crashing) arm64 system 
can be used for analysis of the root-cause of panic/crash on say an 
x86_64 host using utilities like crash-utility/gdb.



for determining if a virtual address lies in the linear map range.


I think this is a fragile example. The debugger shouldn't need to know this.


Well that the current user-space utility design, so I am not sure we can 
tweak that too much.



The user-space computation for determining whether an address lies in
the linear map range is the same as we have in kernel-space:

   #define __is_lm_address(addr)(!(((u64)addr) & BIT(vabits_actual - 
1)))


This was changed with 14c127c957c1 ("arm64: mm: Flip kernel VA space"). If 
user-space
tools rely on 'knowing' the kernel memory layout, they must have to constantly 
be fixed
and updated. This is a poor argument for adding this to something that ends up 
as ABI.


See above. The user-space has to rely on some ABI/guaranteed 
hardware-symbols which can be used for 'determining' the kernel memory 
layout.



I think a better argument is walking the kernel page tables from the core dump.
Core code's vmcoreinfo exports the location of the kernel page tables, but in 
the example
above you can't walk them without knowing how T1SZ was configured.


Sure, both makedumpfile and crash-utility (which walks the kernel page 
tables from the core dump) use this (and similar) information currently 
in the user-space.



On older kernels, user-space that needs this would have to assume the value it 
computes
from VA_BITs (also in vmcoreinfo) is the value in use.


Yes, backward compatibility has been handled in the user-space already.


---%<---

I have sent out user-space patches for makedumpfile and crash-utility
to add features for obtaining vabits_actual value from TCR_EL1.T1SZ (see
[0] and [1]).

Akashi reported that he was able to use this patchset and the user-space
changes to get user-space working fine with the 52-bit kernel VA
changes (see [2]).

[0]. http://lists.infradead.org/pipermail/kexec/2019-November/023966.html
[1]. http://lists.infradead.org/pipermail/kexec/2019-November/024006.html
[2]. http://lists.infradead.org/pipermail/kexec/2019-November/023992.html

---%<---

This probably belongs in the cover letter instead of the commit log.


Ok.


(From-memory: one of vmcore/kcore is virtually addressed, the other physically. 
Does this
fix your poblem in both cases?)



diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
index ca4c3e12d8c5..f78310ba65ea 100644
--- a/arch/arm64/kernel/crash_core.c
+++ b/arch/arm64/kernel/crash_core.c
@@ -7,6 +7,13 @@
  #include 
  #include 


You need to include asm/sysreg.h for read_sysreg(), and asm/pgtable-hwdef.h for 
the macros
you added.


Ok. Will check as I did not get any compilation errors without the same 
and build-bot also did not raise a flag for the missing include files.



+static inline u64 get_tcr_el1_t1sz(void);



Why do you need to do this?


Without this I was getting a missing declaration error, while compiling 
the code.



+static inline u64 get_tcr_el1_t1sz(void)
+{
+   return (read_sysreg(tcr_el1) & TCR_T1SZ_MASK) >> TCR_T1SZ_OFFSET;
+}


(We don't modify this one, and its always the same one very CPU, so this is 
fine.
This function is only called once when the stringy vmcoreinfo elf_note is 
created...)


Right.


  void arch_crash_save_vmcoreinfo(void)
  {
VMCOREINFO_NUMBER(VA_BITS);
@@ -15,5 +22,7 @@ void arch_crash_save_vmcoreinfo(void)
kimage_voffset);
vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n",
PHYS_OFFSET);
+   vmcoreinfo_a

Re: [RESEND PATCH v5 5/5] Documentation/vmcoreinfo: Add documentation for 'TCR_EL1.T1SZ'

2019-12-25 Thread Bhupesh Sharma

Hi James,

On 12/12/2019 04:02 PM, James Morse wrote:

Hi Bhupesh,


I am sorry this review mail skipped my attention due to holidays and 
focus on other urgent issues.



On 29/11/2019 19:59, Bhupesh Sharma wrote:

Add documentation for TCR_EL1.T1SZ variable being added to
vmcoreinfo.

It indicates the size offset of the memory region addressed by TTBR1_EL1



and hence can be used for determining the vabits_actual value.


used for determining random-internal-kernel-variable, that might not exist 
tomorrow.

Could you describe how this is useful/necessary if a debugger wants to walk the 
page
tables from the core file? I think this is a better argument.

Wouldn't the documentation be better as part of the patch that adds the export?
(... unless these have to go via different trees? ..)


Ok, will fix the same in v6 version.


diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 447b64314f56..f9349f9d3345 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -398,6 +398,12 @@ KERNELOFFSET
  The kernel randomization offset. Used to compute the page offset. If
  KASLR is disabled, this value is zero.
  
+TCR_EL1.T1SZ

+
+
+Indicates the size offset of the memory region addressed by TTBR1_EL1



+and hence can be used for determining the vabits_actual value.


'vabits_actual' may not exist when the next person comes to read this 
documentation (its
going to rot really quickly).

I think the first half of this text is enough to say what this is for. You 
should include
words to the effect that its the hardware value that goes with swapper_pg_dir. 
You may
want to point readers to the arm-arm for more details on what the value means.


Ok, got it. Fixed this in v6, which should be on its way shortly.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 0/3] arm64: handle "reserved" entries in /proc/iomem

2019-12-22 Thread Bhupesh Sharma
Thanks for the patches Masa,

On Wed, Dec 18, 2019 at 10:13 PM Masayoshi Mizuma  wrote:
>
> In recent arm64 kernels, /proc/iomem has an extended file format like:
>
>  4000-5871 : System RAM
>4180-426a : Kernel code
>426b-42aa : reserved
>42ab-42c64fff : Kernel data
>5440-583f : Crash kernel
>5859-585e : reserved
>5870-5871 : reserved
>  5872-58b5 : reserved
>  58b6-5be3 : System RAM
>58b61000-58b61fff : reserved
>
> where "reserved" entries can be an ACPI table, UEFI related code or
> data. They can be corrupted and result in early failure in booting
> a new kernel. As an actual example, LPI pending table and LPI property
> table, which are pointed by a UEFI data, are sometimes destroyed.
>
> They are expected to be preserved across kexec'ing.
>
> Changelog:
> v3: - Re-based to the latest commit (bd07796).
> - Added Tested-by tag from Bhupesh and Masayoshi
> - Added an error handling in case
>   mem_regions_alloc_and_exclude() fails (0002 patch).
>
> AKASHI Takahiro (3):
>   kexec: add variant helper functions for handling memory regions
>   arm64: kexec: allocate memory space avoiding reserved regions
>   arm64: kdump: deal with a lot of resource entries in /proc/iomem
>
>  kexec/arch/arm64/crashdump-arm64.c |  25 ++---
>  kexec/arch/arm64/kexec-arm64.c | 153 ++---
>  kexec/mem_regions.c|  42 
>  kexec/mem_regions.h|   7 ++
>  4 files changed, 153 insertions(+), 74 deletions(-)

The changes look fine to me (the patches already have my Tested-by).

Hi Simon,

Can you please help pick these changes for upstream kexec-tools and
arm64 kexec is broken on few machines in the absence of these changes.
The corresponding linux change (see [0]) has already been accepted in
linux-next tree.

[0]. ab0eb16205b4 ("efi/memreserve: Register reservations as
'reserved' in /proc/iomem")

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] printing the final constructed kernel command line

2019-12-22 Thread Bhupesh Sharma
On Thu, Dec 19, 2019 at 11:27 PM Eric DeVolder  wrote:
>
> Bhupesh,
> Thank you. For the formal patch, would you be ok with a two phase approach, 
> first where we add in
> the dbgprintf(),

Sure Eric, I think you can send the patch with the dbgprintf() right
away. It seems a straight forward change and should be acceptable to
other reviewers I believe.

> and followed later by a consolidation of the --command-line, --append,
> --reuse-cmdline option code?

Actually, I did some work a few months ago (on the request of an arm32
kexec-tools user) on this consolidation, but I never got the time to
complete the same.

I will try to find out some time over this week to consolidate these
features and send an RFC patch. I will Cc you to the same. Hopefully
that should do the trick.

Thanks,
Bhupesh

> On 12/19/19 7:34 AM, Bhupesh Sharma wrote:
> > Hi Eric,
> >
> > On 12/19/2019 12:30 AM, Eric DeVolder wrote:
> >> Thanks Bhupesh for the feedback, responses below!
> >> eric
> >>
> >> On 12/17/19 1:59 PM, Bhupesh Sharma wrote:
> >>> Hi Eric,
> >>>
> >>> On 12/17/2019 02:02 AM, Eric DeVolder wrote:
> >>>> The --command-line, --append, and --reuse-cmdline options to kexec can
> >>>> be used in combination to craft a kernel command line for a kernel
> >>>> loaded via kexec. In addition, the kexec tool may also manipulate
> >>>> further the command line, eg.  elfcorehdr addition.
> >>>
> >>> Thanks for proposing this change. I have some comments/queries (see 
> >>> below).
> >>>
> >>>> To aid in debugging kdump/kexec related issues, it would be helpful
> >>>> for kexec to print the final constructed kernel command line argument.
> >>>>
> >>>> For example, the following simple change (for i386/x86_64):
> >>>>
> >>>> diff --git a/kexec/arch/i386/x86-linux-setup.c 
> >>>> b/kexec/arch/i386/x86-linux-setup.c
> >>>> index 057ee14..6dc4adc 100644
> >>>> --- a/kexec/arch/i386/x86-linux-setup.c
> >>>> +++ b/kexec/arch/i386/x86-linux-setup.c
> >>>> @@ -57,6 +57,8 @@ void setup_linux_bootloader_parameters_high(
> >>>>   char *cmdline_ptr;
> >>>>   unsigned long initrd_base, initrd_addr_max;
> >>>>
> >>>> +printf("Final kernel cmdline: '%s'\n", cmdline);
> >>>> +
> >>>
> >>> If we want to add this for debugging purposes, its better to have 
> >>> dbgprintf() instead of printf()
> >>> here. This will make sure that the debug message is printed only when 
> >>> '-d' flag is specified
> >>> while calling kexec utility from command-line.
> >>
> >> Yes! I used printf() merely to provide an example of what is possible.
> >
> > Ok.
> >
> >>>>   /* Say I'm a boot loader */
> >>>>   real_mode->loader_type = LOADER_TYPE_KEXEC << 4;
> >>>>
> >>>> results in the following on a systemd-based system (formatted to fit
> >>>> in 70 char lines):
> >>>>
> >>>> % systemctl status -l kdump.service
> >>>> ● kdump.service - Crash recovery kernel arming
> >>>> Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled;
> >>>>  vendor preset: enabled)
> >>>> Active: active (exited) since Mon 2019-12-16 14:59:21 EST;
> >>>>  2min 53s ago
> >>>>Process: 14058 ExecStop=/usr/bin/kdumpctl stop (code=exited,
> >>>> status=0/SUCCESS)
> >>>>Process: 14073 ExecStart=/usr/bin/kdumpctl start (code=exited,
> >>>> status=0/SUCCESS)
> >>>>   Main PID: 14073 (code=exited, status=0/SUCCESS)
> >>>>
> >>>> Dec 16 14:59:18 vm364 kdumpctl[14058]: Stopping kdump: [OK]
> >>>> Dec 16 14:59:18 vm364 systemd[1]: Stopped Crash recovery kernel arming.
> >>>> Dec 16 14:59:18 vm364 systemd[1]: Starting Crash recovery kernel 
> >>>> arming...
> >>>> Dec 16 14:59:21 vm364 kdumpctl[14073]: Final kernel cmdline: 'BOOT_IMAGE=
> >>>>   /vmlinuz-4.14.35-1902.7.3.1.el7uek.x86_64 ro rhgb quiet 
> >>>> LANG=en_US.UTF-8
> >>>>   irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
> >>>>   udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
> >>>>   transparent_hugepage=never nokaslr novmcoredd disable_cpu_apici

Re: [RFC] printing the final constructed kernel command line

2019-12-19 Thread Bhupesh Sharma

On 12/19/2019 12:46 AM, John Donnelly wrote:




On Dec 18, 2019, at 1:00 PM, Eric DeVolder  wrote:

Thanks Bhupesh for the feedback, responses below!
eric

On 12/17/19 1:59 PM, Bhupesh Sharma wrote:

Hi Eric,
On 12/17/2019 02:02 AM, Eric DeVolder wrote:

The --command-line, --append, and --reuse-cmdline options to kexec can
be used in combination to craft a kernel command line for a kernel
loaded via kexec. In addition, the kexec tool may also manipulate
further the command line, eg.  elfcorehdr addition.

Thanks for proposing this change. I have some comments/queries (see below).

To aid in debugging kdump/kexec related issues, it would be helpful
for kexec to print the final constructed kernel command line argument.

For example, the following simple change (for i386/x86_64):

diff --git a/kexec/arch/i386/x86-linux-setup.c 
b/kexec/arch/i386/x86-linux-setup.c
index 057ee14..6dc4adc 100644
--- a/kexec/arch/i386/x86-linux-setup.c
+++ b/kexec/arch/i386/x86-linux-setup.c
@@ -57,6 +57,8 @@ void setup_linux_bootloader_parameters_high(
   char *cmdline_ptr;
   unsigned long initrd_base, initrd_addr_max;

+printf("Final kernel cmdline: '%s'\n", cmdline);
+

If we want to add this for debugging purposes, its better to have dbgprintf() 
instead of printf() here. This will make sure that the debug message is printed 
only when '-d' flag is specified while calling kexec utility from command-line.


Yes! I used printf() merely to provide an example of what is possible.


   /* Say I'm a boot loader */
   real_mode->loader_type = LOADER_TYPE_KEXEC << 4;

results in the following on a systemd-based system (formatted to fit
in 70 char lines):

% systemctl status -l kdump.service
● kdump.service - Crash recovery kernel arming
 Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled;
  vendor preset: enabled)
 Active: active (exited) since Mon 2019-12-16 14:59:21 EST;
  2min 53s ago
Process: 14058 ExecStop=/usr/bin/kdumpctl stop (code=exited,
 status=0/SUCCESS)
Process: 14073 ExecStart=/usr/bin/kdumpctl start (code=exited,
 status=0/SUCCESS)
   Main PID: 14073 (code=exited, status=0/SUCCESS)

Dec 16 14:59:18 vm364 kdumpctl[14058]: Stopping kdump: [OK]
Dec 16 14:59:18 vm364 systemd[1]: Stopped Crash recovery kernel arming.
Dec 16 14:59:18 vm364 systemd[1]: Starting Crash recovery kernel arming...
Dec 16 14:59:21 vm364 kdumpctl[14073]: Final kernel cmdline: 'BOOT_IMAGE=
   /vmlinuz-4.14.35-1902.7.3.1.el7uek.x86_64 ro rhgb quiet LANG=en_US.UTF-8
   irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
   udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
   transparent_hugepage=never nokaslr novmcoredd disable_cpu_apicid=0
   elfcorehdr=901492K'
Dec 16 14:59:21 vm364 systemd[1]: Started Crash recovery kernel arming.
Dec 16 14:59:21 vm364 kdumpctl[14073]: kexec: loaded kdump kernel
Dec 16 14:59:21 vm364 kdumpctl[14073]: Starting kdump: [OK]

and the output is also available in /var/log/messages.

Since kdumpctl is a distribution specific script (Used by Fedora/RHEL), which 
invokes 'kexec' under the hood, we can discuss the features supported by 
'kexec' rather than the distribution specific scripts (discussion regarding 
which are probably more suited to the Fedora kexec list: 
ke...@lists.fedoraproject.org)


Agreed, this RFC is for a change to kexec, noting that wrapper scripts such as 
kdumpctl are insufficient to provide the functionality requested.


There might also be an opportunity to consolidate handling of the
kernel command line, as most arch targets have the --command-line,
--append, and --reuse-cmdline options, though each arch independently
codes the support for these options.

This seems like a good idea, more on the same below ...

Note: Simply printing the cmdline in scripts such as kdumpctl may not
result in the same ordering, and will omit any addition made internally
by kexec, such as the elfcorehdr.

I propose the addition of an option to kexec, --print-kcl (to mirror
--print-ckr), that would control such printing, as well as the needed
per arch conditional print statements similar to the above to print the
final constructed kernel command line.

... I am not sure I understand the above point. Looking at the latest 
kexec-tools man page (see: 
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git), I couldn't 
find '--print-ckr' option:
--print-ckr-size: Print crash kernel region size, if available.
Can you please elaborate on '--print-ckr' and '--print-kcl' options more.


You proposed using dbgprintf() in conjunction with the -d option; and that 
makes great sense; I had not in my eagerness to produce this RFC.

Instead, I proposed another option --print-kcl (for print kernel command line) 
to conditionally print the information. I was using --print-ckr as an example 
of similar option used to print information (in this case, the crash kernel 
region). Other than a simila

Re: [RFC] printing the final constructed kernel command line

2019-12-19 Thread Bhupesh Sharma

Hi Eric,

On 12/19/2019 12:30 AM, Eric DeVolder wrote:

Thanks Bhupesh for the feedback, responses below!
eric

On 12/17/19 1:59 PM, Bhupesh Sharma wrote:

Hi Eric,

On 12/17/2019 02:02 AM, Eric DeVolder wrote:

The --command-line, --append, and --reuse-cmdline options to kexec can
be used in combination to craft a kernel command line for a kernel
loaded via kexec. In addition, the kexec tool may also manipulate
further the command line, eg.  elfcorehdr addition.


Thanks for proposing this change. I have some comments/queries (see 
below).



To aid in debugging kdump/kexec related issues, it would be helpful
for kexec to print the final constructed kernel command line argument.

For example, the following simple change (for i386/x86_64):

diff --git a/kexec/arch/i386/x86-linux-setup.c 
b/kexec/arch/i386/x86-linux-setup.c

index 057ee14..6dc4adc 100644
--- a/kexec/arch/i386/x86-linux-setup.c
+++ b/kexec/arch/i386/x86-linux-setup.c
@@ -57,6 +57,8 @@ void setup_linux_bootloader_parameters_high(
  char *cmdline_ptr;
  unsigned long initrd_base, initrd_addr_max;

+printf("Final kernel cmdline: '%s'\n", cmdline);
+


If we want to add this for debugging purposes, its better to have 
dbgprintf() instead of printf() here. This will make sure that the 
debug message is printed only when '-d' flag is specified while 
calling kexec utility from command-line.


Yes! I used printf() merely to provide an example of what is possible.


Ok.


  /* Say I'm a boot loader */
  real_mode->loader_type = LOADER_TYPE_KEXEC << 4;

results in the following on a systemd-based system (formatted to fit
in 70 char lines):

% systemctl status -l kdump.service
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled;
 vendor preset: enabled)
Active: active (exited) since Mon 2019-12-16 14:59:21 EST;
 2min 53s ago
   Process: 14058 ExecStop=/usr/bin/kdumpctl stop (code=exited,
status=0/SUCCESS)
   Process: 14073 ExecStart=/usr/bin/kdumpctl start (code=exited,
status=0/SUCCESS)
  Main PID: 14073 (code=exited, status=0/SUCCESS)

Dec 16 14:59:18 vm364 kdumpctl[14058]: Stopping kdump: [OK]
Dec 16 14:59:18 vm364 systemd[1]: Stopped Crash recovery kernel arming.
Dec 16 14:59:18 vm364 systemd[1]: Starting Crash recovery kernel 
arming...
Dec 16 14:59:21 vm364 kdumpctl[14073]: Final kernel cmdline: 
'BOOT_IMAGE=
  /vmlinuz-4.14.35-1902.7.3.1.el7uek.x86_64 ro rhgb quiet 
LANG=en_US.UTF-8

  irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
  udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
  transparent_hugepage=never nokaslr novmcoredd disable_cpu_apicid=0
  elfcorehdr=901492K'
Dec 16 14:59:21 vm364 systemd[1]: Started Crash recovery kernel arming.
Dec 16 14:59:21 vm364 kdumpctl[14073]: kexec: loaded kdump kernel
Dec 16 14:59:21 vm364 kdumpctl[14073]: Starting kdump: [OK]

and the output is also available in /var/log/messages.


Since kdumpctl is a distribution specific script (Used by 
Fedora/RHEL), which invokes 'kexec' under the hood, we can discuss the 
features supported by 'kexec' rather than the distribution specific 
scripts (discussion regarding which are probably more suited to the 
Fedora kexec list: ke...@lists.fedoraproject.org)


Agreed, this RFC is for a change to kexec, noting that wrapper scripts 
such as kdumpctl are insufficient to provide the functionality requested.





There might also be an opportunity to consolidate handling of the
kernel command line, as most arch targets have the --command-line,
--append, and --reuse-cmdline options, though each arch independently
codes the support for these options.


This seems like a good idea, more on the same below ...


Note: Simply printing the cmdline in scripts such as kdumpctl may not
result in the same ordering, and will omit any addition made internally
by kexec, such as the elfcorehdr.

I propose the addition of an option to kexec, --print-kcl (to mirror
--print-ckr), that would control such printing, as well as the needed
per arch conditional print statements similar to the above to print the
final constructed kernel command line.


... I am not sure I understand the above point. Looking at the latest 
kexec-tools man page (see: 
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git), I 
couldn't find '--print-ckr' option:


--print-ckr-size: Print crash kernel region size, if available.

Can you please elaborate on '--print-ckr' and '--print-kcl' options more.


You proposed using dbgprintf() in conjunction with the -d option; and 
that makes great sense; I had not in my eagerness to produce this RFC.


Ok, no problem.

Instead, I proposed another option --print-kcl (for print kernel command 
line) to conditionally print the information. I was using --print-ckr as 
an example of similar option used to print information (in this case, 
the crash kernel region). Other than a similar naming convention, there

Re: [RFC] printing the final constructed kernel command line

2019-12-17 Thread Bhupesh Sharma

Hi Eric,

On 12/17/2019 02:02 AM, Eric DeVolder wrote:

The --command-line, --append, and --reuse-cmdline options to kexec can
be used in combination to craft a kernel command line for a kernel
loaded via kexec. In addition, the kexec tool may also manipulate
further the command line, eg.  elfcorehdr addition.


Thanks for proposing this change. I have some comments/queries (see below).


To aid in debugging kdump/kexec related issues, it would be helpful
for kexec to print the final constructed kernel command line argument.

For example, the following simple change (for i386/x86_64):

diff --git a/kexec/arch/i386/x86-linux-setup.c 
b/kexec/arch/i386/x86-linux-setup.c

index 057ee14..6dc4adc 100644
--- a/kexec/arch/i386/x86-linux-setup.c
+++ b/kexec/arch/i386/x86-linux-setup.c
@@ -57,6 +57,8 @@ void setup_linux_bootloader_parameters_high(
  char *cmdline_ptr;
  unsigned long initrd_base, initrd_addr_max;

+printf("Final kernel cmdline: '%s'\n", cmdline);
+


If we want to add this for debugging purposes, its better to have 
dbgprintf() instead of printf() here. This will make sure that the debug 
message is printed only when '-d' flag is specified while calling kexec 
utility from command-line.



  /* Say I'm a boot loader */
  real_mode->loader_type = LOADER_TYPE_KEXEC << 4;

results in the following on a systemd-based system (formatted to fit
in 70 char lines):

% systemctl status -l kdump.service
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled;
 vendor preset: enabled)
Active: active (exited) since Mon 2019-12-16 14:59:21 EST;
 2min 53s ago
   Process: 14058 ExecStop=/usr/bin/kdumpctl stop (code=exited,
status=0/SUCCESS)
   Process: 14073 ExecStart=/usr/bin/kdumpctl start (code=exited,
status=0/SUCCESS)
  Main PID: 14073 (code=exited, status=0/SUCCESS)

Dec 16 14:59:18 vm364 kdumpctl[14058]: Stopping kdump: [OK]
Dec 16 14:59:18 vm364 systemd[1]: Stopped Crash recovery kernel arming.
Dec 16 14:59:18 vm364 systemd[1]: Starting Crash recovery kernel arming...
Dec 16 14:59:21 vm364 kdumpctl[14073]: Final kernel cmdline: 'BOOT_IMAGE=
  /vmlinuz-4.14.35-1902.7.3.1.el7uek.x86_64 ro rhgb quiet LANG=en_US.UTF-8
  irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off
  udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug
  transparent_hugepage=never nokaslr novmcoredd disable_cpu_apicid=0
  elfcorehdr=901492K'
Dec 16 14:59:21 vm364 systemd[1]: Started Crash recovery kernel arming.
Dec 16 14:59:21 vm364 kdumpctl[14073]: kexec: loaded kdump kernel
Dec 16 14:59:21 vm364 kdumpctl[14073]: Starting kdump: [OK]

and the output is also available in /var/log/messages.


Since kdumpctl is a distribution specific script (Used by Fedora/RHEL), 
which invokes 'kexec' under the hood, we can discuss the features 
supported by 'kexec' rather than the distribution specific scripts 
(discussion regarding which are probably more suited to the Fedora kexec 
list: ke...@lists.fedoraproject.org)



There might also be an opportunity to consolidate handling of the
kernel command line, as most arch targets have the --command-line,
--append, and --reuse-cmdline options, though each arch independently
codes the support for these options.


This seems like a good idea, more on the same below ...


Note: Simply printing the cmdline in scripts such as kdumpctl may not
result in the same ordering, and will omit any addition made internally
by kexec, such as the elfcorehdr.

I propose the addition of an option to kexec, --print-kcl (to mirror
--print-ckr), that would control such printing, as well as the needed
per arch conditional print statements similar to the above to print the
final constructed kernel command line.


... I am not sure I understand the above point. Looking at the latest 
kexec-tools man page (see: 
git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git), I 
couldn't find '--print-ckr' option:


--print-ckr-size: Print crash kernel region size, if available.

Can you please elaborate on '--print-ckr' and '--print-kcl' options more.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/5] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2019-12-15 Thread Bhupesh Sharma
Hi Boris,

On Sat, Dec 14, 2019 at 5:57 PM Borislav Petkov  wrote:
>
> On Fri, Nov 29, 2019 at 01:53:36AM +0530, Bhupesh Sharma wrote:
> > Bhupesh Sharma (5):
> >   crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo
> >   arm64/crash_core: Export TCR_EL1.T1SZ in vmcoreinfo
> >   Documentation/arm64: Fix a simple typo in memory.rst
> >   Documentation/vmcoreinfo: Add documentation for 'MAX_PHYSMEM_BITS'
> >   Documentation/vmcoreinfo: Add documentation for 'TCR_EL1.T1SZ'
>
> why are those last two separate patches and not part of the patches
> which export the respective variable/define?

I remember there was a suggestion during the review of an earlier
version to keep them as a separate patch(es) so that the documentation
text is easier to review, but I have no strong preference towards the
same.

I can merge the documentation patches with the respective patches
(which export the variables/defines to vmcoreinfo) in v6, unless other
maintainers have an objections towards the same.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 2/3] arm64: kexec: allocate memory space avoiding reserved regions

2019-12-15 Thread Bhupesh Sharma
Thanks Masa,

On Sat, Dec 14, 2019 at 1:34 AM Masayoshi Mizuma  wrote:
>
> some nits as below:
>
> On Fri, Jan 11, 2019 at 06:59:45PM +0900, AKASHI Takahiro wrote:
> > On UEFI/ACPI-only system, some memory regions, including but not limited
> > to UEFI memory map and ACPI tables, must be preserved across kexec'ing.
> > Otherwise, they can be corrupted and result in early failure in booting
> > a new kernel.
> >
> > In recent kernels, /proc/iomem now has an extended file format like:
> >   4000-5871 : System RAM
> > 4180-426a : Kernel code
> > 426b-42aa : reserved
> > 42ab-42c64fff : Kernel data
> > 5440-583f : Crash kernel
> > 5859-585e : reserved
> > 5870-5871 : reserved
> >   5872-58b5 : reserved
> >   58b6-5be3 : System RAM
> > 58b61000-58b61fff : reserved
> > 59a77000-59a77fff : reserved
> >   5be4-5bec : reserved
> >   5bed-5bed : System RAM
> >   5bee-5bff : reserved
> >   5c00-5fff : System RAM
> > 5da0-5e9f : reserved
> > 5ec0-5edf : reserved
> > 5ef6a000-5ef6afff : reserved
> > 5ef6b000-5efcafff : reserved
> > 5efcd000-5efc : reserved
> > 5efd-5eff : reserved
> > 5f00-5fff : reserved
> >
> > where the "reserved" entries at the top level or under System RAM (and
> > its descendant resources) are ones of such kind and should not be regarded
> > as usable memory ranges where several free spaces for loading kexec data
> > will be allocated.
> >
> > With this patch, get_memory_ranges() will handle this format of file
> > correctly. Note that, for safety, unknown regions, in addition to
> > "reserved" ones, will also be excluded.
> >
> > Signed-off-by: AKASHI Takahiro 
> > ---
> >  kexec/arch/arm64/kexec-arm64.c | 146 -
> >  1 file changed, 87 insertions(+), 59 deletions(-)
> >
> > diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
> > index 1cde75d1a771..2e923b54f5b1 100644
> > --- a/kexec/arch/arm64/kexec-arm64.c
> > +++ b/kexec/arch/arm64/kexec-arm64.c
> > @@ -10,7 +10,9 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -29,6 +31,7 @@
> >  #include "fs2dt.h"
> >  #include "iomem.h"
> >  #include "kexec-syscall.h"
> > +#include "mem_regions.h"
> >  #include "arch/options.h"
> >
> >  #define ROOT_NODE_ADDR_CELLS_DEFAULT 1
> > @@ -899,19 +902,33 @@ int get_phys_base_from_pt_load(unsigned long 
> > *phys_offset)
> >   return 0;
> >  }
> >
> > +static bool to_be_excluded(char *str)
> > +{
> > + if (!strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM)) ||
> > + !strncmp(str, KERNEL_CODE, strlen(KERNEL_CODE)) ||
> > + !strncmp(str, KERNEL_DATA, strlen(KERNEL_DATA)) ||
> > + !strncmp(str, CRASH_KERNEL, strlen(CRASH_KERNEL)))
> > + return false;
> > + else
> > + return true;
> > +}
> > +
> >  /**
> > - * get_memory_ranges_iomem_cb - Helper for get_memory_ranges_iomem.
> > + * get_memory_ranges - Try to get the memory ranges from
> > + * /proc/iomem.
> >   */
> > -
> > -static int get_memory_ranges_iomem_cb(void *data, int nr, char *str,
> > - unsigned long long base, unsigned long long length)
> > +int get_memory_ranges(struct memory_range **range, int *ranges,
> > + unsigned long kexec_flags)
> >  {
> > - int ret;
> >   unsigned long phys_offset = UINT64_MAX;
> > - struct memory_range *r;
> > -
> > - if (nr >= KEXEC_SEGMENT_MAX)
> > - return -1;
> > + FILE *fp;
> > + const char *iomem = proc_iomem();
> > + char line[MAX_LINE], *str;
> > + unsigned long long start, end;
> > + int n, consumed;
> > + struct memory_ranges memranges;
> > + struct memory_range *last, excl_range;
> > + int ret;
> >
> >   if (!try_read_phys_offset_from_kcore) {
> >   /* Since kernel version 4.19, 'kcore' contains
> > @@ -945,17 +962,65 @@ static int get_memory_ranges_iomem_cb(void *data, int 
> > nr, char *str,
> >   try_read_phys_offset_from_kcore = true;
> >   }
> >
> > - r = (struct memory_range *)data + nr;
> > + fp = fopen(iomem, "r");
> > + if (!fp)
> > + die("Cannot open %s\n", iomem);
> > +
> > + memranges.ranges = NULL;
> > + memranges.size = memranges.max_size  = 0;
> > +
> > + while (fgets(line, sizeof(line), fp) != 0) {
> > + n = sscanf(line, "%llx-%llx : %n", , , );
> > + if (n != 2)
> > + continue;
> > + str = line + consumed;
> > +
> > + if (!strncmp(str, SYSTEM_RAM, strlen(SYSTEM_RAM))) {
> > + ret = mem_regions_alloc_and_add(,
> > + start, end - start + 1, RANGE_RAM);
> > +

Re: [PATCH v4 4/4] makedumpfile: Mark --mem-usage option unsupported for arm64

2019-12-05 Thread Bhupesh Sharma
Hi Kazu,

On Wed, Dec 4, 2019 at 11:20 PM Kazuhito Hagio  wrote:
>
> > -Original Message-
> > This patch marks '--mem-usage' option as unsupported for arm64
> > architecture.
> >
> > With the newer arm64 kernels supporting 48-bit/52-bit VA address spaces
> > and keeping a single binary for supporting the same, the address of
> > kernel symbols like _stext which could be earlier used to determine
> > VA_BITS value, can no longer to determine whether VA_BITS is set to 48
> > or 52 in the kernel space.
>
> The --mem-usage option works with older arm64 kernels, so we should not
> mark it unsupported for all arm64 kernels.
>
> (If we use ELF note vmcoreinfo in kcore, is it possible to support the
> option?  Let's think about it later..)

Ok, I am in the process of discussing this with arm64 maintainers in
detail as _stext symbol address can no longer be used to separate
48-bit v/s 52-bit kernel VA space configurations.

Also other user-space utilities like 'kexec-tools' also face a similar
problem with the 52-bit change (as the vmcore-dmesg stops working).

I am currently caught up with another high priority issue. Will come
back with more thoughts on this in a couple of days.

Thanks,
Bhupesh

> > Hence for now, it makes sense to mark '--mem-usage' option as
> > unsupported for arm64 architecture until we have more clarity from arm64
> > kernel maintainers on how to manage the same in future
> > kernel/makedumpfile versions.
> >
> > Cc: John Donnelly 
> > Cc: Kazuhito Hagio 
> > Cc: kexec@lists.infradead.org
> > Signed-off-by: Bhupesh Sharma 
> > ---
> >  makedumpfile.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index baf559e4d74e..ae60466a1e9c 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -11564,6 +11564,11 @@ main(int argc, char *argv[])
> >   MSG("\n");
> >   MSG("The dmesg log is saved to %s.\n", info->name_dumpfile);
> >   } else if (info->flag_mem_usage) {
> > +#ifdef __aarch64__
> > + MSG("mem-usage not supported for arm64 architecure.\n");
> > + goto out;
> > +#endif
> > +
> >   if (!check_param_for_creating_dumpfile(argc, argv)) {
> >   MSG("Commandline parameter is invalid.\n");
> >   MSG("Try `makedumpfile --help' for more 
> > information.\n");
> > --
> > 2.7.4
> >
>
>
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 2/4] makedumpfile/arm64: Add support for ARMv8.2-LPA (52-bit PA support)

2019-12-05 Thread Bhupesh Sharma
Hi Kazu,

On Wed, Dec 4, 2019 at 11:07 PM Kazuhito Hagio  wrote:
>
> > -Original Message-
> > ARMv8.2-LPA architecture extension (if available on underlying hardware)
> > can support 52-bit physical addresses, while the kernel virtual
> > addresses remain 48-bit.
> >
> > Make sure that we read the 52-bit PA address capability from
> > 'MAX_PHYSMEM_BITS' variable (if available in vmcoreinfo) and
> > accordingly change the pte_to_phy() mask values and also traverse
> > the page-table walk accordingly.
> >
> > Also make sure that it works well for the existing 48-bit PA address
> > platforms and also on environments which use newer kernels with 52-bit
> > PA support but hardware which is not ARM8.2-LPA compliant.
> >
> > I have sent a kernel patch upstream to add 'MAX_PHYSMEM_BITS' to
> > vmcoreinfo for arm64 (see [0]).
> >
> > This patch is in accordance with ARMv8 Architecture Reference Manual
> > version D.a
> >
> > [0]. http://lists.infradead.org/pipermail/kexec/2019-November/023960.html
> >
> > Cc: Kazuhito Hagio 
> > Cc: John Donnelly 
> > Cc: kexec@lists.infradead.org
> > Signed-off-by: Bhupesh Sharma 
> > ---
> >  arch/arm64.c | 292 
> > +--
> >  1 file changed, 204 insertions(+), 88 deletions(-)
> >
> > diff --git a/arch/arm64.c b/arch/arm64.c
> > index 3516b340adfd..ecb19139e178 100644
> > --- a/arch/arm64.c
> > +++ b/arch/arm64.c
> > @@ -39,72 +39,184 @@ typedef struct {
> >   unsigned long pte;
> >  } pte_t;
> >
>
> > +#define __pte(x) ((pte_t) { (x) } )
> > +#define __pmd(x) ((pmd_t) { (x) } )
> > +#define __pud(x) ((pud_t) { (x) } )
> > +#define __pgd(x) ((pgd_t) { (x) } )
>
> Is it possible to remove these macros?

Ok, will fix in v5.

> > +
> > +static int lpa_52_bit_support_available;
> >  static int pgtable_level;
> >  static int va_bits;
> >  static unsigned long kimage_voffset;
> >
> > -#define SZ_4K(4 * 1024)
> > -#define SZ_16K   (16 * 1024)
> > -#define SZ_64K   (64 * 1024)
> > -#define SZ_128M  (128 * 1024 * 1024)
> > +#define SZ_4K4096
> > +#define SZ_16K   16384
> > +#define SZ_64K   65536
> >
> > -#define PAGE_OFFSET_36 ((0xUL) << 36)
> > -#define PAGE_OFFSET_39 ((0xUL) << 39)
> > -#define PAGE_OFFSET_42 ((0xUL) << 42)
> > -#define PAGE_OFFSET_47 ((0xUL) << 47)
> > -#define PAGE_OFFSET_48 ((0xUL) << 48)
> > +#define PAGE_OFFSET_36   ((0xUL) << 36)
> > +#define PAGE_OFFSET_39   ((0xUL) << 39)
> > +#define PAGE_OFFSET_42   ((0xUL) << 42)
> > +#define PAGE_OFFSET_47   ((0xUL) << 47)
> > +#define PAGE_OFFSET_48   ((0xUL) << 48)
> > +#define PAGE_OFFSET_52   ((0xUL) << 52)
> >
> >  #define pgd_val(x)   ((x).pgd)
> >  #define pud_val(x)   (pgd_val((x).pgd))
> >  #define pmd_val(x)   (pud_val((x).pud))
> >  #define pte_val(x)   ((x).pte)
> >
> > -#define PAGE_MASK(~(PAGESIZE() - 1))
> > -#define PGDIR_SHIFT  ((PAGESHIFT() - 3) * pgtable_level + 3)
> > -#define PTRS_PER_PGD (1 << (va_bits - PGDIR_SHIFT))
> > -#define PUD_SHIFTget_pud_shift_arm64()
> > -#define PUD_SIZE (1UL << PUD_SHIFT)
> > -#define PUD_MASK (~(PUD_SIZE - 1))
> > -#define PTRS_PER_PTE (1 << (PAGESHIFT() - 3))
> > -#define PTRS_PER_PUD PTRS_PER_PTE
> > -#define PMD_SHIFT((PAGESHIFT() - 3) * 2 + 3)
> > -#define PMD_SIZE (1UL << PMD_SHIFT)
> > -#define PMD_MASK (~(PMD_SIZE - 1))
>
> > +/* See 'include/uapi/linux/const.h' for definitions below */
> > +#define __AC(X,Y)(X##Y)
> > +#define _AC(X,Y) __AC(X,Y)
> > +#define _AT(T,X) ((T)(X))
> > +
> > +/* See 'include/asm/pgtable-types.h' for definitions below */
> > +typedef unsigned long pteval_t;
> > +typedef unsigned long pmdval_t;
> > +typedef unsigned long pudval_t;
> > +typedef unsigned long pgdval_t;
>
> Is it possible to remove these macros/typedefs as well?
> I don't th

Re: [PATCH v4 1/4] tree-wide: Retrieve 'MAX_PHYSMEM_BITS' from vmcoreinfo (if available)

2019-12-05 Thread Bhupesh Sharma
Hi Kazu,

On Wed, Dec 4, 2019 at 11:05 PM Kazuhito Hagio  wrote:
>
> Hi Bhupesh,
>
> Sorry for the late reply.

No problem.

> > -Original Message-
> > This patch adds a common feature for archs (except arm64, for which
> > similar support is added via subsequent patch) to retrieve
> > 'MAX_PHYSMEM_BITS' from vmcoreinfo (if available).
>
> We already have the calibrate_machdep_info() function, which sets
> info->max_physmem_bits from vmcoreinfo, so practically we don't need
> to add this patch for the benefit.

Since other user-space tools like crash use the 'MAX_PHYSMEM_BITS' value as well
it was agreed with the arm64 maintainers that it would be a good
approach to export the
same in vmcoreinfo and not use different methods to determine the same
in user-space.

Take an example of the PPC makedumpfile implementation for example. It
uses the following complex method of dtereming
'info->max_physmem_bits':
int
set_ppc64_max_physmem_bits(void)
{
long array_len = ARRAY_LENGTH(mem_section);
/*
 * The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the
 * newer kernels 3.7 onwards uses 46 bits.
 */

info->max_physmem_bits  = _MAX_PHYSMEM_BITS_ORIG ;
if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
return TRUE;

info->max_physmem_bits  = _MAX_PHYSMEM_BITS_3_7;
if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
return TRUE;

info->max_physmem_bits  = _MAX_PHYSMEM_BITS_4_19;
if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
return TRUE;

info->max_physmem_bits  = _MAX_PHYSMEM_BITS_4_20;
if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME()))
|| (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT(
return TRUE;

return FALSE;
}

This might need modification and introduction of another
_MAX_PHYSMEM_BITS_x_y macro when this changes for a newer kernel
version.

I think this makes the code error-prone and hard to read. Its much
better to replace it with:
/* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER) {
info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
return TRUE;
} else {
..
}

I think it will reduce future reworks (as per kernel versions) and
also reduce issues while backporting makedumpfile to older kernels.

What do you think?

Regards,
Bhupesh
> > I recently posted a kernel patch (see [0]) which appends
> > 'MAX_PHYSMEM_BITS' to vmcoreinfo in the core code itself rather than
> > in arch-specific code, so that user-space code can also benefit from
> > this addition to the vmcoreinfo and use it as a standard way of
> > determining 'SECTIONS_SHIFT' value in 'makedumpfile' utility.
> >
> > This patch ensures backward compatibility for kernel versions in which
> > 'MAX_PHYSMEM_BITS' is not available in vmcoreinfo.
> >
> > [0]. http://lists.infradead.org/pipermail/kexec/2019-November/023960.html
> >
> > Cc: Kazuhito Hagio 
> > Cc: John Donnelly 
> > Cc: kexec@lists.infradead.org
> > Signed-off-by: Bhupesh Sharma 
> > ---
> >  arch/arm.c |  8 +++-
> >  arch/ia64.c|  7 ++-
> >  arch/ppc.c |  8 +++-
> >  arch/ppc64.c   | 49 -
> >  arch/s390x.c   | 29 ++---
> >  arch/sparc64.c |  9 +++--
> >  arch/x86.c | 34 --
> >  arch/x86_64.c  | 27 ---
> >  8 files changed, 109 insertions(+), 62 deletions(-)
> >
> > diff --git a/arch/arm.c b/arch/arm.c
> > index af7442ac70bf..33536fc4dfc9 100644
> > --- a/arch/arm.c
> > +++ b/arch/arm.c
> > @@ -81,7 +81,13 @@ int
> >  get_machdep_info_arm(void)
> >  {
> >   info->page_offset = SYMBOL(_stext) & 0xUL;
> > - info->max_physmem_bits = _MAX_PHYSMEM_BITS;
> > +
> > + /* Check if we can get MAX_PHYSMEM_BITS from vmcoreinfo */
> > + if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
> > + info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
> > + else
> > + info->max_physmem_bits = _MAX_PHYSMEM_BITS;
> > +
> >   info->kernel_start = SYMBOL(_stext);
> >   info->section_size_bits = _SECTION_SIZE_BITS;
> >
> > diff --git a/arch/ia64.c b/arch/ia64.c
> > index 6c33cc7c8288..fb44dda47172 100644
> > --- a/arch/ia64.c
> > +++ b/arch/i

Re: [PATCH v4 3/4] makedumpfile/arm64: Add support for ARMv8.2-LVA (52-bit kernel VA support)

2019-12-05 Thread Bhupesh Sharma
Hi Kazu,

On Thu, Dec 5, 2019 at 9:00 PM Kazuhito Hagio  wrote:
>
> > -Original Message-
> > > -Original Message-
> > > With ARMv8.2-LVA architecture extension availability, arm64 hardware
> > > which supports this extension can support upto 52-bit virtual
> > > addresses. It is specially useful for having a 52-bit user-space virtual
> > > address space while the kernel can still retain 48-bit/52-bit virtual
> > > addressing.
> > >
> > > Since at the moment we enable the support of this extension in the
> > > kernel via a CONFIG flag (CONFIG_ARM64_VA_BITS_52), so there are
> > > no clear mechanisms in user-space to determine this CONFIG
> > > flag value and use it to determine the kernel-space VA address range
> > > values.
> > >
> > > 'makedumpfile' can instead use 'TCR_EL1.T1SZ' value from vmcoreinfo
> > > which indicates the size offset of the memory region addressed by
> > > TTBR1_EL1 (and hence can be used for determining the
> > > vabits_actual value).
> > >
> > > The user-space computation for determining whether an address lies in
> > > the linear map range is the same as we have in kernel-space:
> > >
> > >   #define __is_lm_address(addr) (!(((u64)addr) & BIT(vabits_actual - 
> > > 1)))
> > >
> > > I have sent a kernel patch upstream to add 'TCR_EL1.T1SZ' to
> > > vmcoreinfo for arm64 (see [0]).
> > >
> > > This patch is in accordance with ARMv8 Architecture Reference Manual
> > > version D.a
> > >
> > > Note that with these changes the '--mem-usage' option will not work
> > > properly for arm64 (a subsequent patch in this series will address the
> > > same) and there is a discussion on-going with the arm64 maintainers to
> > > find a way-out for the same (via standard kernel symbols like _stext).
> > >
> > > [0].http://lists.infradead.org/pipermail/kexec/2019-November/023962.html
> > >
> > > Cc: Kazuhito Hagio 
> > > Cc: John Donnelly 
> > > Cc: kexec@lists.infradead.org
> > > Signed-off-by: Bhupesh Sharma 
> > > ---
> > >  arch/arm64.c   | 148 
> > > +
> > >  makedumpfile.c |   2 +
> > >  makedumpfile.h |   3 +-
> > >  3 files changed, 122 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/arch/arm64.c b/arch/arm64.c
> > > index ecb19139e178..094d73b8a60f 100644
> > > --- a/arch/arm64.c
> > > +++ b/arch/arm64.c
> > > @@ -47,6 +47,7 @@ typedef struct {
> > >  static int lpa_52_bit_support_available;
> > >  static int pgtable_level;
> > >  static int va_bits;
> > > +static int vabits_actual;
> > >  static unsigned long kimage_voffset;
> > >
> > >  #define SZ_4K  4096
> > > @@ -218,12 +219,19 @@ pmd_page_paddr(pmd_t pmd)
> > >  #define pte_index(vaddr)   (((vaddr) >> PAGESHIFT()) & 
> > > (PTRS_PER_PTE - 1))
> > >  #define pte_offset(dir, vaddr) (pmd_page_paddr((*dir)) + 
> > > pte_index(vaddr) * sizeof(pte_t))
> > >
> > > +/*
> > > + * The linear kernel range starts at the bottom of the virtual address
> > > + * space. Testing the top bit for the start of the region is a
> > > + * sufficient check and avoids having to worry about the tag.
> > > + */
> > > +#define is_linear_addr(addr)   (!(((unsigned long)addr) & (1UL << 
> > > (vabits_actual - 1
> >
> > Does this check cover 5.3 or earlier kernels?
> > There is no case that vabits_actual is zero?

We can set vabits_actual as va_bits for older kernels. That shouldn't
be a big change.
Will add it in v5. See more below ...

> As you know, 14c127c957c1 ("arm64: mm: Flip kernel VA space") changed
> the check for linear address:
>
> -#define __is_lm_address(addr)  (!!((addr) & BIT(VA_BITS - 1)))
> +#define __is_lm_address(addr)  (!((addr) & BIT(VA_BITS - 1)))
>
> so if we use the same check as kernel has, I think we will need the
> former one to support earlier kernels.

See above, we can use va_bits where vabits_actual is not present.

> > > +
> > >  static unsigned long long
> > >  __pa(unsigned long vaddr)
> > >  {
> > > if (kimage_voffset == NOT_FOUND_NUMBER ||
> > > -   (vaddr >= PAGE_OFFSET))
> > > -   return (vaddr - PAGE_OFFSET + info->phys_base);
> > > + 

Re: [PATCH v2 3/3] arm64: kexec_file: add crash dump support

2019-12-04 Thread Bhupesh Sharma

On 11/14/2019 10:45 AM, AKASHI Takahiro wrote:

Enabling crash dump (kdump) includes
* prepare contents of ELF header of a core dump file, /proc/vmcore,
   using crash_prepare_elf64_headers(), and
* add two device tree properties, "linux,usable-memory-range" and
   "linux,elfcorehdr", which represent respectively a memory range
   to be used by crash dump kernel and the header's location

Signed-off-by: AKASHI Takahiro 
Cc: Catalin Marinas 
Cc: Will Deacon 
Reviewed-by: James Morse 
---
  arch/arm64/include/asm/kexec.h |   4 +
  arch/arm64/kernel/kexec_image.c|   4 -
  arch/arm64/kernel/machine_kexec_file.c | 106 -
  3 files changed, 106 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 12a561a54128..d24b527e8c00 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -96,6 +96,10 @@ static inline void crash_post_resume(void) {}
  struct kimage_arch {
void *dtb;
unsigned long dtb_mem;
+   /* Core ELF header buffer */
+   void *elf_headers;
+   unsigned long elf_headers_mem;
+   unsigned long elf_headers_sz;
  };
  
  extern const struct kexec_file_ops kexec_image_ops;

diff --git a/arch/arm64/kernel/kexec_image.c b/arch/arm64/kernel/kexec_image.c
index 29a9428486a5..af9987c154ca 100644
--- a/arch/arm64/kernel/kexec_image.c
+++ b/arch/arm64/kernel/kexec_image.c
@@ -47,10 +47,6 @@ static void *image_load(struct kimage *image,
struct kexec_segment *kernel_segment;
int ret;
  
-	/* We don't support crash kernels yet. */

-   if (image->type == KEXEC_TYPE_CRASH)
-   return ERR_PTR(-EOPNOTSUPP);
-
/*
 * We require a kernel with an unambiguous Image header. Per
 * Documentation/arm64/booting.rst, this is the case when image_size
diff --git a/arch/arm64/kernel/machine_kexec_file.c 
b/arch/arm64/kernel/machine_kexec_file.c
index 7b08bf9499b6..f1d1bb895fd2 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -17,12 +17,15 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
  #include 
  
  /* relevant device tree properties */

+#define FDT_PROP_KEXEC_ELFHDR  "linux,elfcorehdr"
+#define FDT_PROP_MEM_RANGE "linux,usable-memory-range"
  #define FDT_PROP_INITRD_START "linux,initrd-start"
  #define FDT_PROP_INITRD_END   "linux,initrd-end"
  #define FDT_PROP_BOOTARGS "bootargs"
@@ -40,6 +43,10 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
vfree(image->arch.dtb);
image->arch.dtb = NULL;
  
+	vfree(image->arch.elf_headers);

+   image->arch.elf_headers = NULL;
+   image->arch.elf_headers_sz = 0;
+
return kexec_image_post_load_cleanup_default(image);
  }
  
@@ -55,6 +62,31 @@ static int setup_dtb(struct kimage *image,
  
  	off = ret;
  
+	ret = fdt_delprop(dtb, off, FDT_PROP_KEXEC_ELFHDR);

+   if (ret && ret != -FDT_ERR_NOTFOUND)
+   goto out;
+   ret = fdt_delprop(dtb, off, FDT_PROP_MEM_RANGE);
+   if (ret && ret != -FDT_ERR_NOTFOUND)
+   goto out;
+
+   if (image->type == KEXEC_TYPE_CRASH) {
+   /* add linux,elfcorehdr */
+   ret = fdt_appendprop_addrrange(dtb, 0, off,
+   FDT_PROP_KEXEC_ELFHDR,
+   image->arch.elf_headers_mem,
+   image->arch.elf_headers_sz);
+   if (ret)
+   return (ret == -FDT_ERR_NOSPACE ? -ENOMEM : -EINVAL);
+
+   /* add linux,usable-memory-range */
+   ret = fdt_appendprop_addrrange(dtb, 0, off,
+   FDT_PROP_MEM_RANGE,
+   crashk_res.start,
+   crashk_res.end - crashk_res.start + 1);
+   if (ret)
+   return (ret == -FDT_ERR_NOSPACE ? -ENOMEM : -EINVAL);
+   }
+
/* add bootargs */
if (cmdline) {
ret = fdt_setprop_string(dtb, off, FDT_PROP_BOOTARGS, cmdline);
@@ -125,8 +157,8 @@ static int setup_dtb(struct kimage *image,
  }
  
  /*

- * More space needed so that we can add initrd, bootargs, kaslr-seed, and
- * rng-seed.
+ * More space needed so that we can add initrd, bootargs, kaslr-seed,
+ * rng-seed, userable-memory-range and elfcorehdr.


nitpick:
s/userable-memory-range/usable-memory-range


   */
  #define DTB_EXTRA_SPACE 0x1000
  
@@ -174,6 +206,43 @@ static int create_dtb(struct kimage *image,

}
  }
  
+static int prepare_elf_headers(void **addr, unsigned long *sz)

+{
+   struct crash_mem *cmem;
+   unsigned int nr_ranges;
+   int ret;
+   u64 i;
+   phys_addr_t start, end;
+
+   nr_ranges = 1; /* for exclusion of crashkernel region */
+   for_each_mem_range(i, , NULL, NUMA_NO_NODE,
+   MEMBLOCK_NONE, , , NULL)
+   

Re: [PATCH v2 1/3] libfdt: define UINT32_MAX in libfdt_env.h

2019-12-04 Thread Bhupesh Sharma

Hi Akashi,

On 11/14/2019 10:45 AM, AKASHI Takahiro wrote:

In the implementation of kexec_file_load-based kdump for arm64,
fdt_appendprop_addrrange() will be used, but fdt_addresses.c
will fail to compile due to missing UINT32_MAX.

So just define it in libfdt_env.h.

Signed-off-by: AKASHI Takahiro 
Cc: Rob Herring 
Cc: Frank Rowand 
---
  include/linux/libfdt_env.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/include/linux/libfdt_env.h b/include/linux/libfdt_env.h
index edb0f0c30904..9ca00f11d9b1 100644
--- a/include/linux/libfdt_env.h
+++ b/include/linux/libfdt_env.h
@@ -3,6 +3,7 @@
  #define LIBFDT_ENV_H
  
  #include 	/* For INT_MAX */

+#include /* For UINT32_MAX */
  #include 
  
  #include 

@@ -11,6 +12,8 @@ typedef __be16 fdt16_t;
  typedef __be32 fdt32_t;
  typedef __be64 fdt64_t;
  
+#define UINT32_MAX U32_MAX

+
  #define fdt32_to_cpu(x) be32_to_cpu(x)
  #define cpu_to_fdt32(x) cpu_to_be32(x)
  #define fdt64_to_cpu(x) be64_to_cpu(x)



With following upstream patches accepted already in Linus's tree (see 
[0] and [1]), so we can drop this patch from the v3:


[0] 26ed19adbab1 ("libfdt: reduce the number of headers included from 
libfdt_env.h")

[1] a8de1304b7df ("libfdt: define INT32_MAX and UINT32_MAX in libfdt_env.h")

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/3] arm64: kexec_file: add kdump

2019-12-04 Thread Bhupesh Sharma

Hi Akashi,

Thanks for the patchset.

On 11/14/2019 10:45 AM, AKASHI Takahiro wrote:

This is the last piece of my kexec_file_load implementation for arm64.
It is now ready for being merged as some relevant patch to dtc/libfdt[1]
has finally been integrated in v5.3-rc1.
(Nothing changed since kexec_file v16[2] except adding Patch#1 and #2.)

Patch#1 and #2 are preliminary patches for libfdt component.
Patch#3 is to add kdump support.

Bhepesh's patch[3] will be required for 52-bit VA support.
Once this patch is applied, whether or not CONFIG_ARM64_VA_BITS_52 is
enabled or not, a matching fix on user space side, crash utility,
will also be needed.

Anyway, I tested my patch, at least, with the following configuration:
1) CONFIG_ARM64_BITS_48=y
2) CONFIG_ARM64_BITS_52=y, but vabits_actual=48

(I don't have any platform to use for
3) CONFIG_ARM64_BITS_52=y, and vabits_actual=52)

[1] commit 9bb9c6a110ea ("scripts/dtc: Update to upstream version
 v1.5.0-23-g87963ee20693"), in particular
7fcf8208b8a9 libfdt: add fdt_append_addrrange()
[2] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-November/612641.html
[3] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-November/693411.html

Changes in v2 (Nov 14, 2019)
* rebased to v5.4-rc7
* no functional changes


This looks like a step in the right direction. I have some minor 
nitpicks which I have mentioned in the individual patch reviews.


With those addressed (v2?):

Tested-and-Reviewed-by: Bhupesh Sharma 

Thanks,
Bhupesh


AKASHI Takahiro (3):
   libfdt: define UINT32_MAX in libfdt_env.h
   libfdt: include fdt_addresses.c
   arm64: kexec_file: add crash dump support

  arch/arm64/include/asm/kexec.h |   4 +
  arch/arm64/kernel/kexec_image.c|   4 -
  arch/arm64/kernel/machine_kexec_file.c | 106 -
  include/linux/libfdt_env.h |   3 +
  lib/Makefile   |   2 +-
  lib/fdt_addresses.c|   2 +
  6 files changed, 112 insertions(+), 9 deletions(-)
  create mode 100644 lib/fdt_addresses.c




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] efi/memreserve: register reservations as 'reserved' in /proc/iomem

2019-12-04 Thread Bhupesh SHARMA
Hello Masa,

(+Cc Simon)

On Thu, Dec 5, 2019 at 12:27 AM Masayoshi Mizuma  wrote:
>
> On Wed, Dec 04, 2019 at 06:17:59PM +, James Morse wrote:
> > Hi Masa,
> >
> > On 04/12/2019 17:17, Masayoshi Mizuma wrote:
> > > Thank you for sending the patch, but unfortunately it doesn't work for 
> > > the issue...
> > >
> > > After applied your patch, the LPI tables are marked as reserved in
> > > /proc/iomem like as:
> > >
> > > 8030-a1fd : System RAM
> > >   8048-8134 : Kernel code
> > >   8135-817b : reserved
> > >   817c-82ac : Kernel data
> > >   830f-830f : reserved # Property table
> > >   8348-83480fff : reserved # Pending table
> > >   8349-8349 : reserved # Pending table
> > >
> > > However, kexec tries to allocate memory from System RAM, it doesn't care
> > > the reserved in System RAM.
> >
> > > I'm not sure why kexec doesn't care the reserved in System RAM, however,
> >
> > Hmm, we added these to fix a problem with the UEFI memory map, and more 
> > recently ACPI
> > tables being overwritten by kexec.
> >
> > Which version of kexec-tools are you using? Could you try:
> > https://git.linaro.org/people/takahiro.akashi/kexec-tools.git/commit/?h=arm64/resv_mem
>
> Thanks a lot! It worked and the issue is gone with Ard's patch and
> the linaro kexec (arm64/resv_mem branch).
>
> Ard, please feel free to add:
>
> Tested-by: Masayoshi Mizuma 

Same results at my side, so:
Tested-and-Reviewed-by: Bhipesh Sharma 

> >
> > > if the kexec behaivor is right, the LPI tables should not belong to
> > > System RAM.
> >
> > > Like as:
> > >
> > > 8030-830e : System RAM
> > >   8048-8134 : Kernel code
> > >   8135-817b : reserved
> > >   817c-82ac : Kernel data
> > > 830f-830f : reserved # Property table
> > > 8348-83480fff : reserved # Pending table
> > > 8349-8349 : reserved # Pending table
> > > 834a-a1fd : System RAM
> > >
> > > I don't have ideas to separete LPI tables from System RAM... so I tried
> > > to add a new file to inform the LPI tables to userspace.
> >
> > This is how 'nomap' memory appears, we carve it out of System RAM. A side 
> > effect of this
> > is kdump can't touch it, as you've told it this isn't memory.
> >
> > As these tables are memory, mapped by the linear map, I think Ard's patch 
> > is the right
> > thing to do ... I suspect your kexec-tools doesn't have those patches from 
> > Akashi to make
> > it honour all second level entries.
>
> I used the kexec on the top of master branch:
> git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
>
> Should we use the linaro kexec for aarch64 machine?
> Or will the arm64/resv_mem branch be merged to the kexec on
> git.kernel.org...?

Glad that Ard's patch fixes the issue for you.
Regarding Akashi's patch, I think it was sent to upstream kexec-tools
some time ago (see [0}) but  seems not integrated in upstream
kexec-tools (now I noticed my Tested-by email for the same got bounced
off due to some gmail msmtp setting issues at my end - sorry for
that). I have added Simon in Cc list.

Hi Simon,

Can you please help pick [0] in upstream kexec-tools with Tested-by
from Masa and myself? Thanks a lot for your help.

[0]. http://lists.infradead.org/pipermail/kexec/2019-January/022201.html

Thanks,
Bhupesh

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 0/5] Append new variables to vmcoreinfo (TCR_EL1.T1SZ for arm64 and MAX_PHYSMEM_BITS for all archs)

2019-11-29 Thread Bhupesh Sharma
Hi Will,

On Fri, Nov 29, 2019 at 3:54 PM Will Deacon  wrote:
>
> On Fri, Nov 29, 2019 at 01:53:36AM +0530, Bhupesh Sharma wrote:
> > Changes since v4:
> > 
> > - v4 can be seen here:
> >   http://lists.infradead.org/pipermail/kexec/2019-November/023961.html
> > - Addressed comments from Dave and added patches for documenting
> >   new variables appended to vmcoreinfo documentation.
> > - Added testing report shared by Akashi for PATCH 2/5.
>
> Please can you fix your mail setup? The last two times you've sent this
> series it seems to get split into two threads, which is really hard to
> track in my inbox:
>
> First thread:
>
> https://lore.kernel.org/lkml/1574972621-25750-1-git-send-email-bhsha...@redhat.com/
>
> Second thread:
>
> https://lore.kernel.org/lkml/1574972716-25858-1-git-send-email-bhsha...@redhat.com/

There seems to be some issue with my server's msmtp settings. I have
tried resending the v5 (see
<http://lists.infradead.org/pipermail/linux-arm-kernel/2019-November/696833.html>).

I hope the threading is ok this time.

Thanks for your patience.

Regards,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[RESEND PATCH v5 5/5] Documentation/vmcoreinfo: Add documentation for 'TCR_EL1.T1SZ'

2019-11-29 Thread Bhupesh Sharma
Add documentation for TCR_EL1.T1SZ variable being added to
vmcoreinfo.

It indicates the size offset of the memory region addressed by TTBR1_EL1
and hence can be used for determining the vabits_actual value.

Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Catalin Marinas 
Cc: Ard Biesheuvel 
Cc: Dave Anderson 
Cc: Kazuhito Hagio 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Cc: kexec@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 Documentation/admin-guide/kdump/vmcoreinfo.rst | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst 
b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index 447b64314f56..f9349f9d3345 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -398,6 +398,12 @@ KERNELOFFSET
 The kernel randomization offset. Used to compute the page offset. If
 KASLR is disabled, this value is zero.
 
+TCR_EL1.T1SZ
+
+
+Indicates the size offset of the memory region addressed by TTBR1_EL1
+and hence can be used for determining the vabits_actual value.
+
 arm
 ===
 
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[RESEND PATCH v5 3/5] Documentation/arm64: Fix a simple typo in memory.rst

2019-11-29 Thread Bhupesh Sharma
Fix a simple typo in arm64/memory.rst

Cc: Jonathan Corbet 
Cc: James Morse 
Cc: Mark Rutland 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Catalin Marinas 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Bhupesh Sharma 
---
 Documentation/arm64/memory.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
index 02e02175e6f5..cf03b3290800 100644
--- a/Documentation/arm64/memory.rst
+++ b/Documentation/arm64/memory.rst
@@ -129,7 +129,7 @@ this logic.
 
 As a single binary will need to support both 48-bit and 52-bit VA
 spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
-also must be sized large enought to accommodate a fixed PAGE_OFFSET.
+also must be sized large enough to accommodate a fixed PAGE_OFFSET.
 
 Most code in the kernel should not need to consider the VA_BITS, for
 code that does need to know the VA size the variables are
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


  1   2   3   4   5   >