Re: [PATCH] kexec: remove the 2GB size limit on initrd file

2020-09-02 Thread Robi Buranyi
On Wed, Sep 2, 2020 at 11:41 AM Robi Buranyi  wrote:
>
>
>
> On Wed, Sep 2, 2020 at 11:18 AM Bhupesh Sharma  wrote:
>>
>> Hi Robi,
>>
>> On Wed, Sep 2, 2020 at 1:05 PM Robi Buranyi  wrote:
>> >
>> > Enable loading initrd files exceeding the INT_MAX size. Remove the
>> > INT_MAX limit completely, and let any initrd load if it fits in the
>> > memory.
>> >
>> > Signed-off-by: Robi Buranyi 
>> > ---
>> >  kernel/kexec_file.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
>> > index ca40bef75a61..659a9d165198 100644
>> > --- a/kernel/kexec_file.c
>> > +++ b/kernel/kexec_file.c
>> > @@ -222,7 +222,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
>> > kernel_fd, int initrd_fd,
>> > loff_t size;
>> >
>> > ret = kernel_read_file_from_fd(kernel_fd, &image->kernel_buf,
>> > -  &size, INT_MAX, 
>> > READING_KEXEC_IMAGE);
>> > +  &size, 0, READING_KEXEC_IMAGE);
>> > if (ret)
>> > return ret;
>> > image->kernel_buf_len = size;
>> > @@ -242,7 +242,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
>> > kernel_fd, int initrd_fd,
>> > /* It is possible that there no initramfs is being loaded */
>> > if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
>> > ret = kernel_read_file_from_fd(initrd_fd, 
>> > &image->initrd_buf,
>> > -  &size, INT_MAX,
>> > +  &size, 0,
>> >READING_KEXEC_INITRAMFS);
>> > if (ret)
>> > goto out;
>> > --
>> > 2.28.0.402.g5ffc5be6b7-goog
>>
>> Can you share some background about this fix? For example why is it
>> needed or what is failing at your end?
>> I think a 2GB initramfs is a good enough size to accommodate while
>> loading it via kexec_file_load(). Eventually the initramfs to be
>> loaded is passed via user-space as a command line argument to the
>> kexec_file_load() syscall, so we should be careful about the file
>> sizes we might be loading here.
>>
>> I am just trying to understand what initramfs size limits you are
>> working with from a kexec_file_load() p-o-v.
>>
>> Thanks,
>> Bhupesh
>>
>
> Hello Bhupesh,
>
>  Sure, and thank you very much for the question. I will do my best to anwer.
>
>  I am experimenting with netbooting larger images of various distros. I am 
> looking for standalone configurations where I would like to have the system 
> booted without dependencies to NFS/CIFS shares and just have 'everything' on 
> ramdisk. In the netboot configuration I'm using, it's a u-root bootloader 
> (https://github.com/u-root/u-root) kexec-ing into this netboot image. The 
> machine HW configuration I have is typically several hundreds of GB RAM with 
> 100Gb+ network interface(s).
>  To sum up: I am netbooting (kexec-ing) from a 'tiny' 32MB linux bootloader 
> image into a huge 2GB+ linux image.
>
>  My problem with the 2GB limit in the current kexec is, that in my opinion 
> it's somehow arbitrary selection and does not have to do with any 
> architectural limitations any more. If a user wants to use a bigger image 
> (and has the necessary memory to load the image) - the kernel should not 
> limit it in my opinion. I was looking at other invocations of 
> kernel_read_file(...) - which this kernel_read_file_from_fd comes down to - 
> and there are also other use cases with no limit specified. I see your point 
> about being specified as a command line argument, but if you let's say you 
> have a system with several terabytes of DDR, what's the point in limiting the 
> user at 2GB? We could increase the limit to 8GB for now (and that would be 
> certainly sufficient for a while) - but then we would possibly have the same 
> discussion in a few years again.
>
>  Does it answer your question?
>
> Thanks, Robi

Hello Bhupesh,

 Sure, and thank you very much for the question. I will do my best to anwer.

 I am experimenting with netbooting larger images of various distros.
I am looking for standalone configurations where I would like to have
the system booted without dependencies to NFS/CIFS shares and just
have 'everything' on ramdisk. In the netboot configuration I'm using,
it's a u-root bootloader (https://github.com/u-root/u-root) kexec-ing
into this netboot image. The machine HW configuration I have is
typically several hundreds of GB RAM with 100Gb+ network interface(s).
 To sum up: I am netbooting (kexec-ing) from a 'tiny' 32MB linux
bootloader image into a huge 2GB+ linux image.

 My problem with the 2GB limit in the current kexec is, that in my
opinion it's somehow arbitrary selection and does not have to do with
any architectural limitations any more. If a user wants to use a
bigger image (and has the necessary memory to load the image) - the
kernel should not limit 

Re: [PATCH] kexec: remove the 2GB size limit on initrd file

2020-09-02 Thread Bhupesh Sharma
Hi Robi,

On Wed, Sep 2, 2020 at 1:05 PM Robi Buranyi  wrote:
>
> Enable loading initrd files exceeding the INT_MAX size. Remove the
> INT_MAX limit completely, and let any initrd load if it fits in the
> memory.
>
> Signed-off-by: Robi Buranyi 
> ---
>  kernel/kexec_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index ca40bef75a61..659a9d165198 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -222,7 +222,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
> loff_t size;
>
> ret = kernel_read_file_from_fd(kernel_fd, &image->kernel_buf,
> -  &size, INT_MAX, READING_KEXEC_IMAGE);
> +  &size, 0, READING_KEXEC_IMAGE);
> if (ret)
> return ret;
> image->kernel_buf_len = size;
> @@ -242,7 +242,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
> /* It is possible that there no initramfs is being loaded */
> if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
> ret = kernel_read_file_from_fd(initrd_fd, &image->initrd_buf,
> -  &size, INT_MAX,
> +  &size, 0,
>READING_KEXEC_INITRAMFS);
> if (ret)
> goto out;
> --
> 2.28.0.402.g5ffc5be6b7-goog

Can you share some background about this fix? For example why is it
needed or what is failing at your end?
I think a 2GB initramfs is a good enough size to accommodate while
loading it via kexec_file_load(). Eventually the initramfs to be
loaded is passed via user-space as a command line argument to the
kexec_file_load() syscall, so we should be careful about the file
sizes we might be loading here.

I am just trying to understand what initramfs size limits you are
working with from a kexec_file_load() p-o-v.

Thanks,
Bhupesh


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-02 Thread Catalin Marinas
On Sat, Aug 01, 2020 at 09:08:56PM +0800, Chen Zhou wrote:
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..4b58f97351d5 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -299,7 +299,15 @@ Boot into System Kernel
> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
> starting at physical address 0x0100 (16MB) for the dump-capture 
> kernel.
>  
> -   On x86 and x86_64, use "crashkernel=64M@16M".
> +   On x86 use "crashkernel=64M@16M".
> +
> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
> +   fall back to reserve region above 4G.
> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> +   also tries to allocate at least 256M below 4G automatically and
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   Use "crashkernel=Y@X" if you really have to reserve memory from specified
> +   start address X.
>  
> On ppc64, use "crashkernel=128M@32M".
>  
> @@ -316,8 +324,15 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> -   the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   On arm64, use "crashkernel=X" to try low allocation in ZONE_DMA, and
> +   fall back to high allocation if it fails. And go for high allocation
> +   directly if the required size is too large. If crash_base is outside

I wouldn't mention crash_base in the admin guide. That's an
implementation detail really and admins are not supposed to read the
source code to make sense of the documentation. ZONE_DMA is also a
kernel internal, so you'd need to define what it is for arm64. At least
the DMA and DMA32 zones are printed during kernel boot.

> +   ZONE_DMA, try to allocate at least 256M in ZONE_DMA automatically.
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
> +   Use "crashkernel=Y@X" if you really have to reserve memory from
> +   specified start address X. Note that the start address of the kernel,
> +   X if explicitly specified, must be aligned to 2MiB (0x20).

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-02 Thread Catalin Marinas
On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> high allocation if it fails.
> 
> If requized size X is too large and leads to very little free memory
> in ZONE_DMA after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
> 
> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> specified size low memory.

Except for the threshold to keep zone ZONE_DMA memory,
reserve_crashkernel() looks very close to the x86 version. Shall we try
to make this generic as well? In the first instance, you could avoid the
threshold check if it takes an explicit ",high" option.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-02 Thread Catalin Marinas
On Tue, Aug 18, 2020 at 03:07:04PM +0800, chenzhou wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a8e34d97a894..4df18c7ea438 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -147,7 +147,7 @@ static void __init reserve_crashkernel(void)
> }
> memblock_reserve(crash_base, crash_size);
>  
> -   if (crash_base >= CRASH_ADDR_LOW_MAX) {
> +   if (memstart_addr < CRASH_ADDR_LOW_MAX && crash_base >= 
> CRASH_ADDR_LOW_MAX) {
> const char *rename = "Crash kernel (low)";

Since CRASH_ADDR_LOW_MAX is defined as arm64_dma32_phys_limit and such
limit is always greater than memstart_addr, this additional check
doesn't do anything. See my other reply on how ZONE_DMA32 is created on
arm64.

-- 
Catalin

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm: kdump: Add DT properties to crash dump kernel's DTB

2020-09-02 Thread Simon Horman
On Wed, Sep 02, 2020 at 05:48:00PM +0200, Geert Uytterhoeven wrote:
> On Wed, Sep 2, 2020 at 5:41 PM Geert Uytterhoeven
>  wrote:
> > Pass the following properties to the crash dump kernel, to provide a
> > modern DT interface between kexec and the crash dump kernel:
> >
> >   - linux,elfcorehdr: ELF core header segment, similar to the
> > "elfcorehdr=" kernel parameter.
> >   - linux,usable-memory-range: Usable memory reserved for the crash dump
> > kernel.
> > This makes the memory reservation explicit, so Linux no longer needs
> > to mask the program counter, and rely on the "mem=" kernel parameter
> > to obtain the start and size of usable memory.
> >
> > For backwards compatibility, the "elfcorehdr=" and "mem=" kernel
> > parameters are still appended to the kernel command line.
> >
> > Loosely based on the ARM64 version by Akashi Takahiro.
> >
> > Signed-off-by: Geert Uytterhoeven 
> 
> The counterpart patch for linux is "[PATCH] ARM: Parse kdump DT
> properties"
> (https://lore.kernel.org/r/20200902154538.6807-1-geert+rene...@glider.be)

Thanks Geert,

I think it would be best to hold-off on the user-space patch
until the kernel patch is accepted.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] kexec/kexec.c: Add missing close() call

2020-09-02 Thread Simon Horman
On Wed, Aug 26, 2020 at 09:53:34AM -0600, Khalid Aziz wrote:
> On 8/25/20 6:51 PM, Youling Tang wrote:
> > Add missing close() call.
> > 
> > Signed-off-by: Youling Tang 
> > ---
> >  kexec/kexec.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kexec/kexec.c b/kexec/kexec.c
> > index a62b362..bb88caa 100644
> > --- a/kexec/kexec.c
> > +++ b/kexec/kexec.c
> > @@ -585,6 +585,7 @@ static char *slurp_file_generic(const char *filename, 
> > off_t *r_size,
> > die("Read on %s ended before stat said it should\n", filename);
> >  
> > *r_size = size;
> > +   close(fd);
> > return buf;
> >  }
> >  
> > @@ -1257,12 +1258,14 @@ static int do_kexec_file_load(int fileind, int 
> > argc, char **argv,
> > if (i == file_types) {
> > fprintf(stderr, "Cannot determine the file type " "of %s\n",
> > kernel);
> > +   close(kernel_fd);
> > return EFAILED;
> > }
> >  
> > ret = file_type[i].load(argc, argv, kernel_buf, kernel_size, &info);
> > if (ret < 0) {
> > fprintf(stderr, "Cannot load %s\n", kernel);
> > +   close(kernel_fd);
> > return ret;
> > }
> >  
> > 
> 
> This looks good to me.
> 
> Reviewed-by: Khalid Aziz 

Thanks, applied.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] AUTHORS: Update email address for Khalid

2020-09-02 Thread Simon Horman
On Tue, Aug 25, 2020 at 08:29:57AM -0600, Khalid Aziz wrote:
> New email address for Khalid Aziz.
> 
> Signed-off-by: Khalid Aziz 

Thanks Khalid,

applied.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec-tools: Fix resource leaks in kexec.c

2020-09-02 Thread Simon Horman
On Tue, Aug 25, 2020 at 04:43:23PM +0800, Youling Tang wrote:
> Add free() call when handling some error returns.
> 
> Signed-off-by: Youling Tang 

Is there a value in cleaning up like this immediately before exiting?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] arm: kdump: Add DT properties to crash dump kernel's DTB

2020-09-02 Thread Geert Uytterhoeven
On Wed, Sep 2, 2020 at 5:41 PM Geert Uytterhoeven
 wrote:
> Pass the following properties to the crash dump kernel, to provide a
> modern DT interface between kexec and the crash dump kernel:
>
>   - linux,elfcorehdr: ELF core header segment, similar to the
> "elfcorehdr=" kernel parameter.
>   - linux,usable-memory-range: Usable memory reserved for the crash dump
> kernel.
> This makes the memory reservation explicit, so Linux no longer needs
> to mask the program counter, and rely on the "mem=" kernel parameter
> to obtain the start and size of usable memory.
>
> For backwards compatibility, the "elfcorehdr=" and "mem=" kernel
> parameters are still appended to the kernel command line.
>
> Loosely based on the ARM64 version by Akashi Takahiro.
>
> Signed-off-by: Geert Uytterhoeven 

The counterpart patch for linux is "[PATCH] ARM: Parse kdump DT
properties"
(https://lore.kernel.org/r/20200902154538.6807-1-geert+rene...@glider.be)

Gr{oetje,eeting}s,

Geert


--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] ARM: Parse kdump DT properties

2020-09-02 Thread Geert Uytterhoeven
Parse the following DT properties in the crash dump kernel, to provide a
modern interface between kexec and the crash dump kernel:

  - linux,elfcorehdr: ELF core header segment, similar to the
"elfcorehdr=" kernel parameter.
  - linux,usable-memory-range: Usable memory reserved for the crash dump
kernel.
This makes the memory reservation explicit.  If present, Linux no
longer needs to mask the program counter, and rely on the "mem="
kernel parameter to obtain the start and size of usable memory.

For backwards compatibility, the traditional method to derive the start
of memory is still used if "linux,usable-memory-range" is absent, and
the "elfcorehdr=" and "mem=" kernel parameters are still parsed.

Loosely based on the ARM64 version by Akashi Takahiro.

Signed-off-by: Geert Uytterhoeven 
---
This depends on "[PATCH v9] ARM: boot: Validate start of physical memory
against DTB"
(https://lore.kernel.org/r/20200902153606.13652-1-geert+rene...@glider.be)

The counterpart patch for kexec-tools is "PATCH] arm: kdump: Add DT
properties to crash dump kernel's DTB"
(https://lore.kernel.org/r/20200902154129.6358-1-geert+rene...@glider.be)

 Documentation/devicetree/bindings/chosen.txt |  4 +-
 arch/arm/boot/compressed/fdt_get_mem_start.c | 18 +++-
 arch/arm/mm/init.c   | 90 
 3 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/chosen.txt 
b/Documentation/devicetree/bindings/chosen.txt
index 45e79172a646c537..ba75c58413667760 100644
--- a/Documentation/devicetree/bindings/chosen.txt
+++ b/Documentation/devicetree/bindings/chosen.txt
@@ -79,7 +79,7 @@ a different secondary CPU release mechanism)
 linux,usable-memory-range
 -
 
-This property (arm64 only) holds a base address and size, describing a
+This property (arm and arm64 only) holds a base address and size, describing a
 limited region in which memory may be considered available for use by
 the kernel. Memory outside of this range is not available for use.
 
@@ -106,7 +106,7 @@ respectively, of the root node.
 linux,elfcorehdr
 
 
-This property (currently used only on arm64) holds the memory range,
+This property (currently used only on arm and arm64) holds the memory range,
 the address and the size, of the elf core header which mainly describes
 the panicked kernel's memory layout as PT_LOAD segments of elf format.
 e.g.
diff --git a/arch/arm/boot/compressed/fdt_get_mem_start.c 
b/arch/arm/boot/compressed/fdt_get_mem_start.c
index fd501ec3c14b4ae4..344fee8bfc7ab438 100644
--- a/arch/arm/boot/compressed/fdt_get_mem_start.c
+++ b/arch/arm/boot/compressed/fdt_get_mem_start.c
@@ -40,7 +40,7 @@ static uint32_t get_size(const void *fdt, const char *name)
 unsigned long fdt_get_mem_start(const void *fdt)
 {
uint32_t addr_size, size_size, mem_start, masked_pc;
-   const __be32 *memory;
+   const __be32 *usable, *memory;
 
if (!fdt)
return -1;
@@ -52,8 +52,16 @@ unsigned long fdt_get_mem_start(const void *fdt)
addr_size = get_size(fdt, "#address-cells");
size_size = get_size(fdt, "#size-cells");
 
-   /* Find the first memory node */
-   memory = get_prop(fdt, "/memory", "reg", addr_size + size_size);
+   /*
+* Find start of memory, either from "linux,usable-memory-range" (for
+* crashkernel), or from the first memory node
+*/
+   usable = get_prop(fdt, "/chosen", "linux,usable-memory-range",
+ addr_size + size_size);
+   if (usable)
+   memory = usable;
+   else
+   memory = get_prop(fdt, "/memory", "reg", addr_size + size_size);
if (!memory)
return -1;
 
@@ -62,6 +70,10 @@ unsigned long fdt_get_mem_start(const void *fdt)
/* Must be a multiple of 16 MiB for phys/virt patching */
mem_start = round_up(mem_start, SZ_16M);
 
+   /* "linux,usable-memory-range" is considered authoritative */
+   if (usable)
+   return mem_start;
+
/*
 * Traditionally, the start of memory is obtained by masking the
 * program counter.  Prefer that method, unless it would yield an
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 000c1b48e9734f4e..407fd847dbfc9280 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -4,6 +4,7 @@
  *
  *  Copyright (C) 1995-2005 Russell King
  */
+#include 
 #include 
 #include 
 #include 
@@ -210,8 +211,95 @@ void check_cpu_icache_size(int cpuid)
 }
 #endif
 
+#ifdef CONFIG_OF_EARLY_FLATTREE
+static int __init early_init_dt_scan_usablemem(unsigned long node,
+const char *uname, int depth, void *data)
+{
+struct memblock_region *usablemem = data;
+const __be32 *reg;
+int len;
+
+if (depth != 1 || strcmp(uname, "chosen") != 0)
+return 0;
+
+reg = of_get_flat_dt_prop(node, "linux,usable-memory

[PATCH] arm: kdump: Add DT properties to crash dump kernel's DTB

2020-09-02 Thread Geert Uytterhoeven
Pass the following properties to the crash dump kernel, to provide a
modern DT interface between kexec and the crash dump kernel:

  - linux,elfcorehdr: ELF core header segment, similar to the
"elfcorehdr=" kernel parameter.
  - linux,usable-memory-range: Usable memory reserved for the crash dump
kernel.
This makes the memory reservation explicit, so Linux no longer needs
to mask the program counter, and rely on the "mem=" kernel parameter
to obtain the start and size of usable memory.

For backwards compatibility, the "elfcorehdr=" and "mem=" kernel
parameters are still appended to the kernel command line.

Loosely based on the ARM64 version by Akashi Takahiro.

Signed-off-by: Geert Uytterhoeven 
---
 kexec/arch/arm/crashdump-arm.c|  10 ++-
 kexec/arch/arm/crashdump-arm.h|   2 +
 kexec/arch/arm/kexec-zImage-arm.c | 137 ++
 3 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/kexec/arch/arm/crashdump-arm.c b/kexec/arch/arm/crashdump-arm.c
index daa478868b4976d7..1ec1826c24b52ed9 100644
--- a/kexec/arch/arm/crashdump-arm.c
+++ b/kexec/arch/arm/crashdump-arm.c
@@ -54,7 +54,7 @@ struct memory_ranges usablemem_rgns = {
 };
 
 /* The boot-time physical memory range reserved for crashkernel region */
-static struct memory_range crash_kernel_mem;
+struct memory_range crash_kernel_mem;
 
 /* reserved regions */
 #define CRASH_MAX_RESERVED_RANGES 2
@@ -64,6 +64,8 @@ static struct memory_ranges crash_reserved_rgns = {
.ranges = crash_reserved_ranges,
 };
 
+struct memory_range elfcorehdr_mem;
+
 static struct crash_elf_info elf_info = {
.class  = ELFCLASS32,
.data   = ELFDATANATIVE,
@@ -307,7 +309,11 @@ int load_crashdump_segments(struct kexec_info *info, char 
*mod_cmdline)
  crash_kernel_mem.start,
  crash_kernel_mem.end, -1, 0);
 
-   dbgprintf("elfcorehdr: %#lx\n", elfcorehdr);
+   elfcorehdr_mem.start = elfcorehdr;
+   elfcorehdr_mem.end = elfcorehdr + bufsz - 1;
+
+   dbgprintf("elfcorehdr 0x%llx-0x%llx\n", elfcorehdr_mem.start,
+ elfcorehdr_mem.end);
cmdline_add_elfcorehdr(mod_cmdline, elfcorehdr);
 
/*
diff --git a/kexec/arch/arm/crashdump-arm.h b/kexec/arch/arm/crashdump-arm.h
index 6e87b13c4b8c86fe..bbdf8bf9de58eb5d 100644
--- a/kexec/arch/arm/crashdump-arm.h
+++ b/kexec/arch/arm/crashdump-arm.h
@@ -13,6 +13,8 @@ extern "C" {
 
 
 extern struct memory_ranges usablemem_rgns;
+extern struct memory_range crash_kernel_mem;
+extern struct memory_range elfcorehdr_mem;
 
 struct kexec_info;
 
diff --git a/kexec/arch/arm/kexec-zImage-arm.c 
b/kexec/arch/arm/kexec-zImage-arm.c
index 925a9be120aa401a..e056f79124240e40 100644
--- a/kexec/arch/arm/kexec-zImage-arm.c
+++ b/kexec/arch/arm/kexec-zImage-arm.c
@@ -4,6 +4,7 @@
  */
 #define _GNU_SOURCE
 #define _XOPEN_SOURCE
+#include 
 #include 
 #include 
 #include 
@@ -374,6 +375,103 @@ static const struct zimage_tag *find_extension_tag(const 
char *buf, off_t len,
return NULL;
 }
 
+static int get_cells_size(void *fdt, uint32_t *address_cells,
+ uint32_t *size_cells)
+{
+   int nodeoffset;
+   const uint32_t *prop = NULL;
+   int prop_len;
+
+   /* default values */
+   *address_cells = 1;
+   *size_cells = 1;
+
+   /* under root node */
+   nodeoffset = fdt_path_offset(fdt, "/");
+   if (nodeoffset < 0)
+   return -1;
+
+   prop = fdt_getprop(fdt, nodeoffset, "#address-cells", &prop_len);
+   if (prop) {
+   if (prop_len != sizeof(*prop))
+   return -1;
+
+   *address_cells = fdt32_to_cpu(*prop);
+   }
+
+   prop = fdt_getprop(fdt, nodeoffset, "#size-cells", &prop_len);
+   if (prop) {
+   if (prop_len != sizeof(*prop))
+   return -1;
+
+   *size_cells = fdt32_to_cpu(*prop);
+   }
+
+   dbgprintf("%s: #address-cells:%d #size-cells:%d\n", __func__,
+ *address_cells, *size_cells);
+   return 0;
+}
+
+static bool cells_size_fitted(uint32_t address_cells, uint32_t size_cells,
+ struct memory_range *range)
+{
+   dbgprintf("%s: %llx-%llx\n", __func__, range->start, range->end);
+
+   /* if *_cells >= 2, cells can hold 64-bit values anyway */
+   if ((address_cells == 1) && (range->start >= (1ULL << 32)))
+   return false;
+
+   if ((size_cells == 1) &&
+   ((range->end - range->start + 1) >= (1ULL << 32)))
+   return false;
+
+   return true;
+}
+
+static void fill_property(void *buf, uint64_t val, uint32_t cells)
+{
+   uint32_t val32;
+   int i;
+
+   if (cells == 1) {
+   val32 = cpu_to_fdt32((uint32_t)val);
+   memcpy(buf, &val32, sizeof(uint32_t));
+   } else {
+   for (i = 0;
+  

Re: [PATCH next v3 7/8] printk: reimplement log_cont using record extension

2020-09-02 Thread Petr Mladek
On Mon 2020-08-31 03:16:57, John Ogness wrote:
> Use the record extending feature of the ringbuffer to implement
> continuous messages. This preserves the existing continuous message
> behavior.
> 
> Signed-off-by: John Ogness 
> ---
>  kernel/printk/printk.c | 98 +-
>  1 file changed, 20 insertions(+), 78 deletions(-)

It looks simple from the printk() side ;-)

Reviewed-by: Petr Mladek 

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: state names: vas: Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread Petr Mladek
On Wed 2020-09-02 13:26:14, John Ogness wrote:
> On 2020-09-02, Petr Mladek  wrote:
> >> +static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
> >> +   u32 caller_id, unsigned long *id_out)
> >> +{
> >> +  unsigned long prev_state_val;
> >> +  enum desc_state d_state;
> >> +  struct prb_desc desc;
> >> +  struct prb_desc *d;
> >> +  unsigned long id;
> >> +
> >> +  id = atomic_long_read(&desc_ring->head_id);
> >> +
> >> +  /*
> >> +   * To minimize unnecessarily reopening a descriptor, first check the
> >> +   * descriptor is in the correct state and has a matching caller ID.
> >> +   */
> >> +  d_state = desc_read(desc_ring, id, &desc);
> >> +  if (d_state != desc_reserved ||
> >> +  !(atomic_long_read(&desc.state_var) & DESC_COMMIT_MASK) ||
> >
> > First, define 5 desc_states, something like:
> >
> > enum desc_state {
> > desc_miss = -1, /* ID mismatch */
> > desc_modified =  0x0,   /* reserved, being modified by writer */
> 
> I prefer the "desc_reserved" name. It may or may not have be modified yet.

Yeah, "desc_reserved" sounds better. I probably just wanted to free my
fantasy from the current code ;-)


> > desc_committed = 0x1,   /* committed by writer, could get reopened */
> > desc_finalized = 0x2,   /* committed, could not longer get modified */
> > desc_reusable =  0x3,   /* free, not yet used by any writer */
> > };
> >
> > Second, only 4 variants of the 3 state bits are currently used.
> > It means that two bits are enough and they might use exactly
> > the above names:
> >
> > I mean to do something like:
> >
> > #define DESC_SV_BITS(sizeof(unsigned long) * 8)
> > #define DESC_SV(desc_state) ((unsigned long)desc_state << (DESC_SV_BITS - 
> > 2))
> > #define DESC_ST(state_val)  ((unsigned long)state_val >> (DESC_SV_BITS - 2))
> 
> This makes sense and will get us back the bit we lost because of
> finalization.

Yup. Which is good especially on 32-bit architectures.

> I am wondering if VMCOREINFO should include a DESC_FLAGS_MASK so that
> crash tools could at least successfully iterate the ID's, even if they
> didn't know what all the flag values mean (in the case that more bits
> are added later).

Good point. I am just not sure whether they should try read all ids
or they should refuse reading anything when a new bit is added.

Well, I really hope that we will not need new states anytime soon.
It would need a really strong reason.

I personally can't think about any use case. pr_cont() was special
because it was the writer side. All other steps of the printk rework
are on the reader side.

I believe that we are getting close with all the ring buffer code.
And I have good feeling about it.

Best Regards,
Petr

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


misc: was: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread Petr Mladek
On Mon 2020-08-31 03:16:56, John Ogness wrote:
> Add support for extending the newest data block. For this, introduce
> a new finalization state flag (DESC_FINAL_MASK) that denotes when a
> descriptor may not be extended, i.e. is finalized.
> 
> Three new memory barrier pairs are introduced. Two of them are not
> significant because they are alternate path memory barriers that
> exactly correspond to existing memory barriers.
>
> But the third (_prb_commit:B / desc_reserve:D) is new and guarantees
> that descriptors will always be finalized, either because a
> descriptor setting DESC_COMMIT_MASK sees that there is a newer
> descriptor and so finalizes itself or because a new descriptor being
> reserved sees that the previous descriptor has DESC_COMMIT_MASK set
> and finalizes that descriptor.

Just for record. All the barriers look good to me.

There are always full barriers when we change DESC_COMMIT_MASK.
It guarantees consistency of the descriptor and data rings.

The only relaxed modification is adding DESC_FINAL_MASK. It is OK.
It can be done only when DESC_COMMIT_MASK has been set before
using a full barrier. It means that all changes of the rings
has already been since before.


> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> +
> +/**
> + * prb_reserve_in_last() - Re-reserve and extend the space in the ringbuffer
> + * used by the newest record.
> + *
> + * @e: The entry structure to setup.
> + * @rb:The ringbuffer to re-reserve and extend data in.
> + * @r: The record structure to allocate buffers for.
> + * @caller_id: The caller ID of the caller (reserving writer).
> + *
> + * This is the public function available to writers to re-reserve and extend
> + * data.
> + *
> + * The writer specifies the text size to extend (not the new total size) by
> + * setting the @text_buf_size field of @r. Dictionaries cannot be extended so

Better might be to say that extending dictionaries is not
supported.

> + * @dict_buf_size of @r should be set to 0. To ensure proper initialization 
> of
> + * @r, prb_rec_init_wr() should be used.
> + *
> + * This function will fail if @caller_id does not match the caller ID of the
> + * newest record. In that case the caller must reserve new data using
> + * prb_reserve().
> + *
> + * Context: Any context. Disables local interrupts on success.
> + * Return: true if text data could be extended, otherwise false.
> + *
> + * On success:
> + *
> + *   - @r->text_buf points to the beginning of the entire text buffer.
> + *
> + *   - @r->text_buf_len is set to the new total size of the buffer.

s/buf_len/buf_size/

> + *   - @r->dict_buf and @r->dict_buf_len are cleared because extending
> + * the dict buffer is not supported.

Same here.

> + *
> + *   - @r->info is not touched so that @r->info->text_len could be used
> + * to append the text.
> + *
> + *   - prb_record_text_space() can be used on @e to query the new
> + * actually used space.
> + *
> + * Important: All @r->info fields will already be set with the current values
> + *for the record. I.e. @r->info->text_len will be less than
> + *@text_buf_size and @r->info->dict_len may be set, even though
> + *@dict_buf_size is 0. Writers can use @r->info->text_len to know
> + *where concatenation begins and writers should update
> + *@r->info->text_len after concatenating.
> + */
> +bool prb_reserve_in_last(struct prb_reserved_entry *e, struct 
> printk_ringbuffer *rb,
> +  struct printk_record *r, u32 caller_id)
> +{
> + unsigned int data_size;
> + struct prb_desc *d;
> + unsigned long id;
> +
> + local_irq_save(e->irqflags);
> +
> + /* Transition the newest descriptor back to the reserved state. */
> + d = desc_reopen_last(&rb->desc_ring, caller_id, &id);
> + if (!d) {
> + local_irq_restore(e->irqflags);
> + goto fail_reopen;
> + }
> +
> + /* Now the writer has exclusive access: LMM(prb_reserve_in_last:A) */
> +
> + /*
> +  * Set the @e fields here so that prb_commit() can be used if
> +  * anything fails from now on.
> +  */
> + e->rb = rb;
> + e->id = id;
> +
> + /*
> +  * desc_reopen_last() checked the caller_id, but there was no
> +  * exclusive access at that point. The descriptor may have
> +  * changed since then.
> +  */
> + if (caller_id != d->info.caller_id)
> + goto fail;
> +
> + if (BLK_DATALESS(&d->text_blk_lpos)) {

We do not clear d->info when reopening the descriptor. I would at
least add here a consistency check similar to the one below:

if (WARN_ON_ONCE(d->info.text_len)) {
pr_warn_once("wrong data size (%u, expecting 0) for 
data\n",
 d->info.text_len);
d->info.text_len = 0;
}

> + 

Re: state names: vas: Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread John Ogness
On 2020-09-02, Petr Mladek  wrote:
>> +static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
>> + u32 caller_id, unsigned long *id_out)
>> +{
>> +unsigned long prev_state_val;
>> +enum desc_state d_state;
>> +struct prb_desc desc;
>> +struct prb_desc *d;
>> +unsigned long id;
>> +
>> +id = atomic_long_read(&desc_ring->head_id);
>> +
>> +/*
>> + * To minimize unnecessarily reopening a descriptor, first check the
>> + * descriptor is in the correct state and has a matching caller ID.
>> + */
>> +d_state = desc_read(desc_ring, id, &desc);
>> +if (d_state != desc_reserved ||
>> +!(atomic_long_read(&desc.state_var) & DESC_COMMIT_MASK) ||
>
> This looks like a hack. And similar extra check of the bit is needed
> also in desc_read(), see
> https://lore.kernel.org/r/878sdvq8kd@jogness.linutronix.de

Agreed.

> I has been actually getting less and less happy with the inconsistency
> between names of the bits and states.
>
> ...
>
> First, define 5 desc_states, something like:
>
> enum desc_state {
>   desc_miss = -1, /* ID mismatch */
>   desc_modified =  0x0,   /* reserved, being modified by writer */

I prefer the "desc_reserved" name. It may or may not have be modified yet.

>   desc_committed = 0x1,   /* committed by writer, could get reopened */
>   desc_finalized = 0x2,   /* committed, could not longer get modified */
>   desc_reusable =  0x3,   /* free, not yet used by any writer */
> };
>
> Second, only 4 variants of the 3 state bits are currently used.
> It means that two bits are enough and they might use exactly
> the above names:
>
> I mean to do something like:
>
> #define DESC_SV_BITS  (sizeof(unsigned long) * 8)
> #define DESC_SV(desc_state)   ((unsigned long)desc_state << (DESC_SV_BITS - 
> 2))
> #define DESC_ST(state_val)((unsigned long)state_val >> (DESC_SV_BITS - 2))

This makes sense and will get us back the bit we lost because of
finalization.

> I am sorry that I did not came up with this earlier. I know how
> painful it is to rework bigger patchsets. But it affects format
> of the ring buffer, so we should do it early.

Agreed.

I am wondering if VMCOREINFO should include a DESC_FLAGS_MASK so that
crash tools could at least successfully iterate the ID's, even if they
didn't know what all the flag values mean (in the case that more bits
are added later).

> PS: I am still middle of review. It looks good so far. I wanted to
> send this early and separately because it is a bigger change.

Thanks for the heads up.

John Ogness

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread Petr Mladek
On Mon 2020-08-31 03:16:56, John Ogness wrote:
> Add support for extending the newest data block. For this, introduce
> a new finalization state flag (DESC_FINAL_MASK) that denotes when a
> descriptor may not be extended, i.e. is finalized.
> 
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> +/*
> + * Try to resize an existing data block associated with the descriptor
> + * specified by @id. If the resized datablock should become wrapped, it
> + * copies the old data to the new data block.
> + *
> + * Fail if this is not the last allocated data block or if there is not
> + * enough space or it is not possible make enough space.
> + *
> + * Return a pointer to the beginning of the entire data buffer or NULL on
> + * failure.
> + */
> +static char *data_realloc(struct printk_ringbuffer *rb,
> +   struct prb_data_ring *data_ring, unsigned int size,
> +   struct prb_data_blk_lpos *blk_lpos, unsigned long id)
> +{
> + struct prb_data_block *blk;
> + unsigned long head_lpos;
> + unsigned long next_lpos;
> + bool wrapped;
> +
> + /* Reallocation only works if @blk_lpos is the newest data block. */
> + head_lpos = atomic_long_read(&data_ring->head_lpos);
> + if (head_lpos != blk_lpos->next)
> + return NULL;
> +
> + /* Keep track if @blk_lpos was a wrapping data block. */
> + wrapped = (DATA_WRAPS(data_ring, blk_lpos->begin) != 
> DATA_WRAPS(data_ring, blk_lpos->next));
> +
> + size = to_blk_size(size);
> +
> + next_lpos = get_next_lpos(data_ring, blk_lpos->begin, size);
> +
> + /* If the data block does not increase, there is nothing to do. */
> + if (next_lpos == head_lpos) {
> + blk = to_block(data_ring, blk_lpos->begin);
> + return &blk->data[0];
> + }

We might be here even when the data are shrinked but the code below
is not fully ready for this.

> + if (!data_push_tail(rb, data_ring, next_lpos - DATA_SIZE(data_ring)))
> + return NULL;
> +
> + /* The memory barrier involvement is the same as data_alloc:A. */
> + if (!atomic_long_try_cmpxchg(&data_ring->head_lpos, &head_lpos,
> +  next_lpos)) { /* LMM(data_realloc:A) */
> + return NULL;
> + }
> +
> + blk = to_block(data_ring, blk_lpos->begin);
> +
> + if (DATA_WRAPS(data_ring, blk_lpos->begin) != DATA_WRAPS(data_ring, 
> next_lpos)) {
> + struct prb_data_block *old_blk = blk;
> +
> + /* Wrapping data blocks store their data at the beginning. */
> + blk = to_block(data_ring, 0);
> +
> + /*
> +  * Store the ID on the wrapped block for consistency.
> +  * The printk_ringbuffer does not actually use it.
> +  */
> + blk->id = id;
> +
> + if (!wrapped) {
> + /*
> +  * Since the allocated space is now in the newly
> +  * created wrapping data block, copy the content
> +  * from the old data block.
> +  */
> + memcpy(&blk->data[0], &old_blk->data[0],
> +(blk_lpos->next - blk_lpos->begin) - 
> sizeof(blk->id));

It took me quite some time to check whether this code is correct or not.

First, I wondered whether the size was correctly calculated.
It is because the original block was not wrapped, so
lpos->next - lpos->beging defines the real data buffer size.

Second, I wondered whether the target block might be smaller
than the original (the above check allows shrinking). It can't
be smaller because then the new block won't be wrapped as well.

Sigh, it is a bit tricky. And there is 3rd possibility that
is not handled. The original block might be wrapped but
the new shrunken one might not longer be wrapped.
Then we would need to copy the data the other way.


I know that this function is not currently used for shrinking.
But I would prefer to be on the safe side. Either make
the copying generic, e.g. by calculating the real data size
using the code from get_data(). Or simply comletely
refuse shrinking by the above check.

Best Regards,
Petr

> + }
> + }
> +
> + blk_lpos->next = next_lpos;
> +
> + return &blk->data[0];
> +}
> +
>  /* Return the number of bytes used by a data block. */
>  static unsigned int space_used(struct prb_data_ring *data_ring,
>  struct prb_data_blk_lpos *blk_lpos)

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


state names: vas: Re: [PATCH next v3 6/8] printk: ringbuffer: add finalization/extension support

2020-09-02 Thread Petr Mladek
On Mon 2020-08-31 03:16:56, John Ogness wrote:
> Add support for extending the newest data block. For this, introduce
> a new finalization state flag (DESC_FINAL_MASK) that denotes when a
> descriptor may not be extended, i.e. is finalized.
 
> --- a/kernel/printk/printk_ringbuffer.c
> +++ b/kernel/printk/printk_ringbuffer.c
> @@ -49,14 +49,16 @@
>   * Descriptors have three states:
>   *
>   *   reserved
> - * A writer is modifying the record.
> + * A writer is modifying the record. Internally represented as either "0"
> + * or "DESC_COMMIT_MASK".

We should explain the difference between the two values. It might be
enough to add something like:

See "Descriptor Finalization" section for more details."

> @@ -79,6 +81,25 @@
>   * committed or reusable queried state. This makes it possible that a valid
>   * sequence number of the tail is always available.
>   *
> + * Descriptor Finalization
> + * ~~~
> + * When a writer calls the commit function prb_commit(), the record may still
> + * continue to be in the reserved queried state. In order for that record to
> + * enter into the committed queried state, that record also must be 
> finalized.
> + * A record can be finalized by three different scenarios:
> + *
> + *   1) A writer can finalize its record immediately by calling
> + *  prb_final_commit() instead of prb_commit().
> + *
> + *   2) When a new record is reserved and the previous record has been
> + *  committed via prb_commit(), that previous record is finalized.
> + *
> + *   3) When a record is committed via prb_commit() and a newer record
> + *  already exists, the record being committed is finalized.
> + *
> + * Until a record is finalized (represented by "DESC_FINAL_MASK"), a writer
> + * may "reopen" that record and extend it with more data.
> + *
>   * Data Rings
>   * ~~
>   * The two data rings (text and dictionary) function identically. They exist

[...]

> +/*
> + * Attempt to remove the commit flag so that the record can be modified by a
> + * writer again. This is only possible if the descriptor is not yet 
> finalized.
> + *
> + * Note that on success, the queried state did not change. A non-finalized
> + * record (even with the commit flag set) is in the reserved queried state.
> + */
> +static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
> +  u32 caller_id, unsigned long *id_out)
> +{
> + unsigned long prev_state_val;
> + enum desc_state d_state;
> + struct prb_desc desc;
> + struct prb_desc *d;
> + unsigned long id;
> +
> + id = atomic_long_read(&desc_ring->head_id);
> +
> + /*
> +  * To minimize unnecessarily reopening a descriptor, first check the
> +  * descriptor is in the correct state and has a matching caller ID.
> +  */
> + d_state = desc_read(desc_ring, id, &desc);
> + if (d_state != desc_reserved ||
> + !(atomic_long_read(&desc.state_var) & DESC_COMMIT_MASK) ||

This looks like a hack. And similar extra check of the bit is needed
also in desc_read(), see
https://lore.kernel.org/r/878sdvq8kd@jogness.linutronix.de

I has been actually getting less and less happy with the inconsistency
between names of the bits and states.

Sigh, you will hate me because this would mean a bigger change.
IMHO, it would be much cleaner and help with long-term maintainability
when we do the following two changes:

First, define 5 desc_states, something like:

enum desc_state {
desc_miss = -1, /* ID mismatch */
desc_modified =  0x0,   /* reserved, being modified by writer */
desc_committed = 0x1,   /* committed by writer, could get reopened */
desc_finalized = 0x2,   /* committed, could not longer get modified */
desc_reusable =  0x3,   /* free, not yet used by any writer */
};

Second, only 4 variants of the 3 state bits are currently used.
It means that two bits are enough and they might use exactly
the above names:

I mean to do something like:

#define DESC_SV_BITS(sizeof(unsigned long) * 8)
#define DESC_SV(desc_state) ((unsigned long)desc_state << (DESC_SV_BITS - 
2))
#define DESC_ST(state_val)  ((unsigned long)state_val >> (DESC_SV_BITS - 2))


Then we could have:

static enum desc_state get_desc_state(unsigned long id,
  unsigned long state_val)
{
if (id != DESC_ID(state_val))
return desc_miss;

return DESC_ST(state_val);
}

and use in the code:

unsigned long val_committed = id | DESC_SV(desc_committed);


or do

#define DESC_SV(id, desc_state) (id | (unsigned long)desc_state << 
(DESC_SV_BITS - 2))

and then use DESC_SV(id, DESC_COMMITTED).


I am sorry that I did not came up with this earlier. I know how
painful it is to rework bigger patchsets. But it affects format
of the ring buffer, so we should do it early.

Best Regards,
Petr

PS: I am still middle of review. It looks good s

[PATCH] kexec: remove the 2GB size limit on initrd file

2020-09-02 Thread Robi Buranyi
Enable loading initrd files exceeding the INT_MAX size. Remove the
INT_MAX limit completely, and let any initrd load if it fits in the
memory.

Signed-off-by: Robi Buranyi 
---
 kernel/kexec_file.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index ca40bef75a61..659a9d165198 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -222,7 +222,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
loff_t size;
 
ret = kernel_read_file_from_fd(kernel_fd, &image->kernel_buf,
-  &size, INT_MAX, READING_KEXEC_IMAGE);
+  &size, 0, READING_KEXEC_IMAGE);
if (ret)
return ret;
image->kernel_buf_len = size;
@@ -242,7 +242,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
/* It is possible that there no initramfs is being loaded */
if (!(flags & KEXEC_FILE_NO_INITRAMFS)) {
ret = kernel_read_file_from_fd(initrd_fd, &image->initrd_buf,
-  &size, INT_MAX,
+  &size, 0,
   READING_KEXEC_INITRAMFS);
if (ret)
goto out;
-- 
2.28.0.402.g5ffc5be6b7-goog


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec