[PATCH] kexec: socket not released when error situation occur.

2016-08-24 Thread YoungHyun Yoo
Fix resourceleek in ifdown function when error occur.

Signed-off-by: YoungHyun Yoo 
---
 kexec/ifdown.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/kexec/ifdown.c b/kexec/ifdown.c
index 46b7bef..9679ad7 100644
--- a/kexec/ifdown.c
+++ b/kexec/ifdown.c
@@ -1,6 +1,6 @@
 /*
- * ifdown.cFind all network interfaces on the system and
- * shut them down.
+ * ifdown.c Find all network interfaces on the system and
+ *  shut them down.
  *
  */
 char *v_ifdown = "@(#)ifdown.c  1.11  02-Jun-1998  miqu...@cistron.nl";
@@ -20,10 +20,10 @@ char *v_ifdown = "@(#)ifdown.c  1.11  02-Jun-1998  
miqu...@cistron.nl";
 #include 
 
 /*
- * First, we find all shaper devices and down them. Then we
- * down all real interfaces. This is because the comment in the
- * shaper driver says "if you down the shaper device before the
- * attached inerface your computer will follow".
+ *  First, we find all shaper devices and down them. Then we
+ *  down all real interfaces. This is because the comment in the
+ *  shaper driver says "if you down the shaper device before the
+ *  attached inerface your computer will follow".
  */
 int ifdown(void)
 {
@@ -34,13 +34,13 @@ int ifdown(void)
if ((fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
fprintf(stderr, "ifdown: ");
perror("socket");
-   return -1;
+   goto error;
}
 
if ((ifa = if_nameindex()) == NULL) {
fprintf(stderr, "ifdown: ");
perror("if_nameindex");
-   return -1;
+   goto error;
}
 
for (shaper = 1; shaper >= 0; shaper--) {
@@ -57,18 +57,22 @@ int ifdown(void)
if (ioctl(fd, SIOCGIFFLAGS, ) < 0) {
fprintf(stderr, "ifdown: shutdown ");
perror(ifp->if_name);
-   return -1;
+   goto error;
}
ifr.ifr_flags &= ~(IFF_UP);
if (ioctl(fd, SIOCSIFFLAGS, ) < 0) {
fprintf(stderr, "ifdown: shutdown ");
perror(ifp->if_name);
-   return -1;
+   goto error;
}
 
}
}
-   close(fd);
 
+   close(fd);
return 0;
+
+error:
+   close(fd);
+   return -1;
 }
-- 
2.9.0.GIT


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Fix double-free when failing to relocate the purgatory.

2016-08-24 Thread Baoquan He
It's reasonable. Ack.

Acked-by: Baoquan He 

On 08/24/16 at 09:05pm, Thiago Jung Bauermann wrote:
> If kexec_apply_relocations fails, kexec_load_purgatory frees pi->sechdrs
> and pi->purgatory_buf. This is redundant, because in case of error
> kimage_file_prepare_segments calls kimage_file_post_load_cleanup,
> which will also free those buffers.
> 
> This causes two warnings like the following, one for pi->sechdrs and the
> other for pi->purgatory_buf:
> 
> [   18.112843] kexec-bzImage64: Loading purgatory failed
> [   18.113257] [ cut here ]
> [   18.113263] WARNING: CPU: 1 PID: 2119 at mm/vmalloc.c:1490 
> __vunmap+0xc1/0xd0
> [   18.113264] Trying to vfree() nonexistent vm area (c9e91000)
> [   18.113367] Modules linked in:
> [   18.113371] CPU: 1 PID: 2119 Comm: kexec Not tainted 4.8.0-rc3+ #5
> [   18.113372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Bochs 01/01/2011
> [   18.113373]   88003bbb7dc8 8132eca8 
> 88003bbb7e18
> [   18.113376]   88003bbb7e08 8105f1bb 
> 05d281175bf8
> [   18.113377]  c9e91000  0001 
> 88003e5f8c00
> [   18.113379] Call Trace:
> [   18.113384]  [] dump_stack+0x4d/0x65
> [   18.113386]  [] __warn+0xcb/0xf0
> [   18.113388]  [] warn_slowpath_fmt+0x4f/0x60
> [   18.113390]  [] ? find_vmap_area+0x19/0x70
> [   18.113393]  [] ? kimage_file_post_load_cleanup+0x47/0xb0
> [   18.113394]  [] __vunmap+0xc1/0xd0
> [   18.113396]  [] vfree+0x2e/0x70
> [   18.113397]  [] kimage_file_post_load_cleanup+0x5e/0xb0
> [   18.113398]  [] SyS_kexec_file_load+0x448/0x680
> [   18.113401]  [] ? putname+0x54/0x60
> [   18.113403]  [] ? do_sys_open+0x190/0x1f0
> [   18.113407]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
> [   18.113408] ---[ end trace 158bb74f5950ca2b ]---
> 
> Fix by setting pi->sechdrs an pi->purgatory_buf to NULL, since vfree
> won't try to free a NULL pointer.
> 
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  kernel/kexec_file.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 503bc2d348e5..037c321c5618 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -887,7 +887,10 @@ int kexec_load_purgatory(struct kimage *image, unsigned 
> long min,
>   return 0;
>  out:
>   vfree(pi->sechdrs);
> + pi->sechdrs = NULL;
> +
>   vfree(pi->purgatory_buf);
> + pi->purgatory_buf = NULL;
>   return ret;
>  }
>  
> -- 
> 1.9.1
> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 5/9] arm64: kdump: add kdump support

2016-08-24 Thread Dave Young
On 08/24/16 at 11:25am, James Morse wrote:
> Hi Dave,
> 
> On 24/08/16 09:04, Dave Young wrote:
> > Looking the arm-init.c, I suspect it missed the some efi ranges as
> > reserved ranges like runtime code and runtime data etc. But I might be
> > wrong.
> 
> This had me confused for too... I think I get it, my understanding is:
> 
James, thanks for your clarification. 
> 
> > static __init int is_reserve_region(efi_memory_desc_t *md)
> > {
> > switch (md->type) {
> > case EFI_LOADER_CODE:
> > case EFI_LOADER_DATA:
> > case EFI_BOOT_SERVICES_CODE:
> > case EFI_BOOT_SERVICES_DATA:
> > case EFI_CONVENTIONAL_MEMORY:
> > case EFI_PERSISTENT_MEMORY:
> > return 0;
> 
> return false - this is the list of region-types to never reserve, regardless 
> of
> memory attributes.
> 
> 
> > default:
> > break;
> > }
> > return is_normal_ram(md);
> 
> If its not in the 'never reserve' list above, then we check if the region is
> 'normal' ram. If it is then it will end up in memblock.memory so we return 
> true,
> causing it to be marked nomap too.
> 
> reserve_regions() in that same file calls is_normal_ram() directly before 
> adding
> all regions with the WB attribute to memblock.memory via
> early_init_dt_add_memory_arch().
> 
> A runtime region with the WB attribute will be caught by is_reserve_region(),
> and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.

Hmm, It is not straitforward like the do_add_efi_memmap. I got it.

BTW, I believe there is same problem in arm as well as arm64, it also
need mark the runtime ranges as "reserved" /proc/iomem. 

> 
> 
> > }
> > 
> > Let's see the x86 do_add_efi_mem_map, the default case set all other
> > types as reserved. Shouldn't this be same in all arches though there's
> > no e820 in arm(64)?
> 
> > static void __init do_add_efi_memmap(void)
> > {
> > 
> > [snip]
> > switch (md->type) {
> > case EFI_LOADER_CODE:
> > case EFI_LOADER_DATA:
> > case EFI_BOOT_SERVICES_CODE:
> > case EFI_BOOT_SERVICES_DATA:
> > case EFI_CONVENTIONAL_MEMORY:
> > if (md->attribute & EFI_MEMORY_WB)
> > e820_type = E820_RAM;
> 
> In this case reserve_regions() will add the memory to memblock.memory because 
> it
> has the WB attribute, and not reserve it.
> 
> 
> > else
> > e820_type = E820_RESERVED;
> 
> Without the WB attribute, these regions are in neither memblock.memory nor
> memblock.nomap.
> 
> 
> > break;
> > [snip]
> > default:
> > /*
> >  * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
> >  * EFI_RUNTIME_SERVICES_DATA
> >  * EFI_MEMORY_MAPPED_IO
> >  * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
> >  */
> > e820_type = E820_RESERVED;
> > break;
> 
> If any other regions has the WB attribute, it will be added to memblock.memory
> and memblock.nomap. If it doesn't, it will appear in neither.
> 
> 
> > }
> > [snip]
> > }
> 
> Does this help at all?
> 

Yes, thanks a lot!
Dave

> 
> Thanks,
> 
> James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec: Fix double-free when failing to relocate the purgatory.

2016-08-24 Thread Thiago Jung Bauermann
If kexec_apply_relocations fails, kexec_load_purgatory frees pi->sechdrs
and pi->purgatory_buf. This is redundant, because in case of error
kimage_file_prepare_segments calls kimage_file_post_load_cleanup,
which will also free those buffers.

This causes two warnings like the following, one for pi->sechdrs and the
other for pi->purgatory_buf:

[   18.112843] kexec-bzImage64: Loading purgatory failed
[   18.113257] [ cut here ]
[   18.113263] WARNING: CPU: 1 PID: 2119 at mm/vmalloc.c:1490 __vunmap+0xc1/0xd0
[   18.113264] Trying to vfree() nonexistent vm area (c9e91000)
[   18.113367] Modules linked in:
[   18.113371] CPU: 1 PID: 2119 Comm: kexec Not tainted 4.8.0-rc3+ #5
[   18.113372] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[   18.113373]   88003bbb7dc8 8132eca8 
88003bbb7e18
[   18.113376]   88003bbb7e08 8105f1bb 
05d281175bf8
[   18.113377]  c9e91000  0001 
88003e5f8c00
[   18.113379] Call Trace:
[   18.113384]  [] dump_stack+0x4d/0x65
[   18.113386]  [] __warn+0xcb/0xf0
[   18.113388]  [] warn_slowpath_fmt+0x4f/0x60
[   18.113390]  [] ? find_vmap_area+0x19/0x70
[   18.113393]  [] ? kimage_file_post_load_cleanup+0x47/0xb0
[   18.113394]  [] __vunmap+0xc1/0xd0
[   18.113396]  [] vfree+0x2e/0x70
[   18.113397]  [] kimage_file_post_load_cleanup+0x5e/0xb0
[   18.113398]  [] SyS_kexec_file_load+0x448/0x680
[   18.113401]  [] ? putname+0x54/0x60
[   18.113403]  [] ? do_sys_open+0x190/0x1f0
[   18.113407]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
[   18.113408] ---[ end trace 158bb74f5950ca2b ]---

Fix by setting pi->sechdrs an pi->purgatory_buf to NULL, since vfree
won't try to free a NULL pointer.

Signed-off-by: Thiago Jung Bauermann 
---
 kernel/kexec_file.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 503bc2d348e5..037c321c5618 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -887,7 +887,10 @@ int kexec_load_purgatory(struct kimage *image, unsigned 
long min,
return 0;
 out:
vfree(pi->sechdrs);
+   pi->sechdrs = NULL;
+
vfree(pi->purgatory_buf);
+   pi->purgatory_buf = NULL;
return ret;
 }
 
-- 
1.9.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [integrity:next-restore-kexec 21/31] kernel/kexec_core.c:780:10: warning: passing argument 1 of '__va' makes integer from pointer without a cast

2016-08-24 Thread Thiago Jung Bauermann
Am Dienstag, 23 August 2016, 22:44:19 schrieb kbuild test robot:
> tree:  
> https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> next-restore-kexec head:   62bc4b565254de4796a0835f6f67569eb4835f9f
> commit: d03a46a7730822305a2264c9defa21c06d4ff861 [21/31] kexec_file: Add
> mechanism to update kexec segments. config: m68k-sun3_defconfig (attached
> as .config)
> compiler: m68k-linux-gcc (GCC) 4.9.0
> reproduce:
> wget
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin
> /make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross
> git checkout d03a46a7730822305a2264c9defa21c06d4ff861
> # save the attached .config to linux build tree
> make.cross ARCH=m68k
> 
> All warnings (new ones prefixed by >>):
> 

>kernel/kexec_core.c: In function 'kexec_update_segment':
> >> kernel/kexec_core.c:780:10: warning: passing argument 1 of '__va' makes
> >> integer from pointer without a cast
>ptr = __va(addr);
>  ^
>In file included from arch/m68k/include/asm/page.h:46:0,
> from arch/m68k/include/asm/thread_info.h:5,
> from include/linux/thread_info.h:54,
> from include/asm-generic/preempt.h:4,
> from ./arch/m68k/include/generated/asm/preempt.h:1,
> from include/linux/preempt.h:59,
> from include/linux/spinlock.h:50,
> from include/linux/mmzone.h:7,
> from include/linux/gfp.h:5,
> from include/linux/mm.h:9,
> from kernel/kexec_core.c:12:
>arch/m68k/include/asm/page_mm.h:105:21: note: expected 'long unsigned
> int' but argument is of type 'void *' static inline void *__va(unsigned
> long x)
> ^

This doesn't produce a warning on x86 and powerpc because on both arches 
__va is a macro which does a cast to unsigned long, but on m68k __va is a 
function expecting an unsigned long argument.

>In file included from include/asm-generic/bug.h:13:0,
> from arch/m68k/include/asm/bug.h:28,
> from include/linux/bug.h:4,
> from include/linux/mmdebug.h:4,
> from include/linux/mm.h:8,
> from kernel/kexec_core.c:12:
>include/linux/kernel.h:742:17: warning: comparison of distinct pointer
> types lacks a cast (void) (&_min1 == &_min2);  \
> ^
>kernel/kexec_core.c:800:14: note: in expansion of macro 'min'
> uchunk = min(bufsz, mchunk);
>  ^

This is because bufsz is unsigned long but mchunk is size_t.

Both warnings are fixed by the changes below, which will be in my next 
revision of the kexec buffer hand-over series.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 6ec09e85efd9..ea2e5a7b9b69 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -261,8 +261,8 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
*image,
  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 const char *name);
-int kexec_update_segment(const char *buffer, unsigned long bufsz,
-unsigned long load_addr, unsigned long memsz);
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz);
 extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 3740235d6819..11ca5f8678df 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -732,8 +732,8 @@ static struct page *kimage_alloc_page(struct kimage *image,
  *
  * Return: 0 on success, negative errno on error.
  */
-int kexec_update_segment(const char *buffer, unsigned long bufsz,
-unsigned long load_addr, unsigned long memsz)
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz)
 {
int i;
unsigned long entry;
@@ -763,7 +763,7 @@ int kexec_update_segment(const char *buffer, unsigned long 
bufsz,
break;
}
if (i == kexec_image->nr_segments) {
-   pr_err("Couldn't find segment to update: 0x%lx, size 0x%lx\n",
+   pr_err("Couldn't find segment to update: 0x%lx, size 0x%zx\n",
   load_addr, memsz);
return -EINVAL;
}
@@ -777,7 +777,7 @@ int kexec_update_segment(const char *buffer, unsigned long 
bufsz,
dest = addr;
break;
case IND_INDIRECTION:
-   ptr = __va(addr);
+   

Re: [PATCH v24 5/9] arm64: kdump: add kdump support

2016-08-24 Thread Ard Biesheuvel
On 9 August 2016 at 03:56, AKASHI Takahiro  wrote:
> On crash dump kernel, all the information about primary kernel's system
> memory (core image) is available in elf core header.
> The primary kernel will set aside this header with reserve_elfcorehdr()
> at boot time and inform crash dump kernel of its location via a new
> device-tree property, "linux,elfcorehdr".
>
> Please note that all other architectures use traditional "elfcorehdr="
> kernel parameter for this purpose.
>
> Then crash dump kernel will access the primary kernel's memory with
> copy_oldmem_page(), which reads one page by ioremap'ing it since it does
> not reside in linear mapping on crash dump kernel.
>
> We also need our own elfcorehdr_read() here since the header is placed
> within crash dump kernel's usable memory.
>
> Signed-off-by: AKASHI Takahiro 
> ---
>  arch/arm64/Kconfig | 11 +++
>  arch/arm64/kernel/Makefile |  1 +
>  arch/arm64/kernel/crash_dump.c | 71 
> ++
>  arch/arm64/mm/init.c   | 54 
>  4 files changed, 137 insertions(+)
>  create mode 100644 arch/arm64/kernel/crash_dump.c
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 69c8787..9ede54b 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -682,6 +682,17 @@ config KEXEC
>   but it is independent of the system firmware.   And like a reboot
>   you can start any kernel with it, not just Linux.
>
> +config CRASH_DUMP
> +   bool "Build kdump crash kernel"
> +   help
> + Generate crash dump after being started by kexec. This should
> + be normally only set in special crash dump kernels which are
> + loaded in the main kernel with kexec-tools into a specially
> + reserved region and then later executed after a crash by
> + kdump/kexec.
> +
> + For more details see Documentation/kdump/kdump.txt
> +
>  config XEN_DOM0
> def_bool y
> depends on XEN
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 14f7b65..f1cbfc8 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -48,6 +48,7 @@ arm64-obj-$(CONFIG_RANDOMIZE_BASE)+= kaslr.o
>  arm64-obj-$(CONFIG_HIBERNATION)+= hibernate.o hibernate-asm.o
>  arm64-obj-$(CONFIG_KEXEC)  += machine_kexec.o relocate_kernel.o  
>   \
>cpu-reset.o
> +arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
>
>  obj-y  += $(arm64-obj-y) vdso/ probes/
>  obj-m  += $(arm64-obj-m)
> diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
> new file mode 100644
> index 000..2dc54d1
> --- /dev/null
> +++ b/arch/arm64/kernel/crash_dump.c
> @@ -0,0 +1,71 @@
> +/*
> + * Routines for doing kexec-based kdump
> + *
> + * Copyright (C) 2014 Linaro Limited
> + * Author: AKASHI Takahiro 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/**
> + * copy_oldmem_page() - copy one page from old kernel memory
> + * @pfn: page frame number to be copied
> + * @buf: buffer where the copied page is placed
> + * @csize: number of bytes to copy
> + * @offset: offset in bytes into the page
> + * @userbuf: if set, @buf is in a user address space
> + *
> + * This function copies one page from old kernel memory into buffer pointed 
> by
> + * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
> + * copied or negative error in case of failure.
> + */
> +ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
> +size_t csize, unsigned long offset,
> +int userbuf)
> +{
> +   void *vaddr;
> +
> +   if (!csize)
> +   return 0;
> +
> +   vaddr = ioremap_cache(__pfn_to_phys(pfn), PAGE_SIZE);

Could we please use memremap() here?

> +   if (!vaddr)
> +   return -ENOMEM;
> +
> +   if (userbuf) {
> +   if (copy_to_user(buf, vaddr + offset, csize)) {
> +   iounmap(vaddr);
> +   return -EFAULT;
> +   }
> +   } else {
> +   memcpy(buf, vaddr + offset, csize);
> +   }
> +
> +   iounmap(vaddr);

... and memunmap here?

ioremap_cache() is not very well defined, and memremap() has been
introduced specifically to replace it, so I think we should use it in
new code.

Thanks,
Ard.


> +
> +   return csize;
> +}
> +
> +/**
> + * elfcorehdr_read - read from ELF core header
> + * @buf: buffer where the data is placed
> + * @csize: number of 

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-24 Thread Xunlei Pang
On 2016/08/24 at 16:20, Dave Young wrote:
> On 08/23/16 at 06:11pm, Yinghai Lu wrote:
>> On Wed, Aug 17, 2016 at 1:20 AM, Dave Young  wrote:
>>> On 08/17/16 at 09:50am, Xunlei Pang wrote:
 "/sys/kernel/kexec_crash_size" only handles crashk_res, it
 is fine in most cases, but sometimes we have crashk_low_res.
 For example, when "crashkernel=size[KMG],high" combined with
 "crashkernel=size[KMG],low" is used for 64-bit x86.

 Like crashk_res, we introduce the corresponding sysfs file
 "/sys/kernel/kexec_crash_low_size" for crashk_low_res.

 So, the exact total reserved memory is the sum of the two.

 crashk_low_res can also be shrunk via this new interface,
 and users should be aware of what they are doing.
>> ...
 @@ -218,6 +238,7 @@ static struct attribute * kernel_attrs[] = {
  #ifdef CONFIG_KEXEC_CORE
   _loaded_attr.attr,
   _crash_loaded_attr.attr,
 + _crash_low_size_attr.attr,
   _crash_size_attr.attr,
   _attr.attr,
  #endif
>> would be better if you can use attribute_group .is_visible to control 
>> showing of
>> crash_low_size only when the crash_base is above 4G.
> I have same feeling that it looks odd to show low in sysfs in case no
> crashkernel=,high being used. Even if crashkernel=,high is used only in
> x86 the resource crashk_low is in common code. What do you think to move
> it to x86?

If want to put some restriction on it, I'd prefer to move crashk_low to arch 
x86, to make
it x86-specific.

We can show the interface unconditionally. If it isn't used, its size is 0, it 
doesn't matter.

Regards,
Xunlei

>
> Thanks
> Dave
>
>> Thanks
>>
>> Yinghai
>>
>> ___
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 5/9] arm64: kdump: add kdump support

2016-08-24 Thread James Morse
Hi Dave,

On 24/08/16 09:04, Dave Young wrote:
> Looking the arm-init.c, I suspect it missed the some efi ranges as
> reserved ranges like runtime code and runtime data etc. But I might be
> wrong.

This had me confused for too... I think I get it, my understanding is:


> static __init int is_reserve_region(efi_memory_desc_t *md)
> {
> switch (md->type) {
> case EFI_LOADER_CODE:
> case EFI_LOADER_DATA:
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> case EFI_PERSISTENT_MEMORY:
> return 0;

return false - this is the list of region-types to never reserve, regardless of
memory attributes.


> default:
> break;
> }
> return is_normal_ram(md);

If its not in the 'never reserve' list above, then we check if the region is
'normal' ram. If it is then it will end up in memblock.memory so we return true,
causing it to be marked nomap too.

reserve_regions() in that same file calls is_normal_ram() directly before adding
all regions with the WB attribute to memblock.memory via
early_init_dt_add_memory_arch().

A runtime region with the WB attribute will be caught by is_reserve_region(),
and is_normal_ram(), so it ends up in memblock.memory and memblock.nomap.


> }
> 
> Let's see the x86 do_add_efi_mem_map, the default case set all other
> types as reserved. Shouldn't this be same in all arches though there's
> no e820 in arm(64)?

> static void __init do_add_efi_memmap(void)
> {
> 
> [snip]
> switch (md->type) {
> case EFI_LOADER_CODE:
> case EFI_LOADER_DATA:
> case EFI_BOOT_SERVICES_CODE:
> case EFI_BOOT_SERVICES_DATA:
> case EFI_CONVENTIONAL_MEMORY:
> if (md->attribute & EFI_MEMORY_WB)
> e820_type = E820_RAM;

In this case reserve_regions() will add the memory to memblock.memory because it
has the WB attribute, and not reserve it.


> else
> e820_type = E820_RESERVED;

Without the WB attribute, these regions are in neither memblock.memory nor
memblock.nomap.


> break;
> [snip]
> default:
> /*
>  * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
>  * EFI_RUNTIME_SERVICES_DATA
>  * EFI_MEMORY_MAPPED_IO
>  * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
>  */
> e820_type = E820_RESERVED;
> break;

If any other regions has the WB attribute, it will be added to memblock.memory
and memblock.nomap. If it doesn't, it will appear in neither.


> }
> [snip]
> }

Does this help at all?


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-24 Thread Dave Young
On 08/23/16 at 06:11pm, Yinghai Lu wrote:
> On Wed, Aug 17, 2016 at 1:20 AM, Dave Young  wrote:
> > On 08/17/16 at 09:50am, Xunlei Pang wrote:
> >> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
> >> is fine in most cases, but sometimes we have crashk_low_res.
> >> For example, when "crashkernel=size[KMG],high" combined with
> >> "crashkernel=size[KMG],low" is used for 64-bit x86.
> >>
> >> Like crashk_res, we introduce the corresponding sysfs file
> >> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
> >>
> >> So, the exact total reserved memory is the sum of the two.
> >>
> >> crashk_low_res can also be shrunk via this new interface,
> >> and users should be aware of what they are doing.
> ...
> >> @@ -218,6 +238,7 @@ static struct attribute * kernel_attrs[] = {
> >>  #ifdef CONFIG_KEXEC_CORE
> >>   _loaded_attr.attr,
> >>   _crash_loaded_attr.attr,
> >> + _crash_low_size_attr.attr,
> >>   _crash_size_attr.attr,
> >>   _attr.attr,
> >>  #endif
> 
> would be better if you can use attribute_group .is_visible to control showing 
> of
> crash_low_size only when the crash_base is above 4G.

I have same feeling that it looks odd to show low in sysfs in case no
crashkernel=,high being used. Even if crashkernel=,high is used only in
x86 the resource crashk_low is in common code. What do you think to move
it to x86?

Thanks
Dave

> 
> Thanks
> 
> Yinghai
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 5/9] arm64: kdump: add kdump support

2016-08-24 Thread Dave Young
Ccing uefi people.

On 08/23/16 at 04:53pm, Pratyush Anand wrote:
> On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> > On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > > >> think, then you also need to change phys_offset calculation in 
> > > >> kexec-tools. That
> > > >> should be start of either of first "reserved" or "System RAM" block.
> > > > 
> > > > Good point, but I'm not sure this is always true.
> > > 
> > > > Is there any system whose ACPI memory is *not* part of DRAM
> > > 
> > > From the spec, it looks like this is allowed.
> > > 
> > > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory 
> > > map, so
> > > the question is what is its type and memory attributes?
> > 
> > Yes.
> > 
> > > The UEFI spec[0] says ACPI regions can have a type of 
> > > EfiACPIReclaimMemory or
> > > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen 
> > > by the
> > > firmware.
> > > 
> > > It is possible these regions have to be mapped non-cacheable, page 40 has 
> > > a
> > > couple of:
> > > > If no information about the table location exists in the UEFI memory 
> > > > map or
> > > ACPI memory
> > > > descriptors, the table is assumed to be non-cached.
> > > 
> > > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry 
> > > in the
> > > memory map that has a 'WB' attribute to the memblock.memory list (via
> > > early_init_dt_add_memory_arch()), it will also mark as no-map regions 
> > > that have
> > > this attribute and aren't in the is_reserve_region() list.

Looking the arm-init.c, I suspect it missed the some efi ranges as
reserved ranges like runtime code and runtime data etc. But I might be
wrong.

Below is the is_reserve_region, it will regard any regions which is not
in the EFI_* below as normal memory, it does not include the runtime
ranges and other types.

static __init int is_reserve_region(efi_memory_desc_t *md)
{
switch (md->type) {
case EFI_LOADER_CODE:
case EFI_LOADER_DATA:
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
case EFI_PERSISTENT_MEMORY:
return 0;
default:
break;
}
return is_normal_ram(md);
}

Let's see the x86 do_add_efi_mem_map, the default case set all other
types as reserved. Shouldn't this be same in all arches though there's
no e820 in arm(64)?

static void __init do_add_efi_memmap(void)
{

[snip]
switch (md->type) {
case EFI_LOADER_CODE:
case EFI_LOADER_DATA:
case EFI_BOOT_SERVICES_CODE:
case EFI_BOOT_SERVICES_DATA:
case EFI_CONVENTIONAL_MEMORY:
if (md->attribute & EFI_MEMORY_WB)
e820_type = E820_RAM;
else
e820_type = E820_RESERVED;
break;
[snip]
default:
/*
 * EFI_RESERVED_TYPE EFI_RUNTIME_SERVICES_CODE
 * EFI_RUNTIME_SERVICES_DATA
 * EFI_MEMORY_MAPPED_IO
 * EFI_MEMORY_MAPPED_IO_PORT_SPACE EFI_PAL_CODE
 */
e820_type = E820_RESERVED;
break;
}
[snip]
}

> > > 
> > > If these ACPI regions have the 'WB' attribute, we add them as memory and 
> > > mark
> > > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > > If they don't have the 'WB' attribute, then then they are left out of 
> > > memblock
> > > and aren't part of DRAM, I don't think these will show up in /proc/iomem 
> > > at all.
> > 
> > Let's say,
> > 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> > 0x8000-0x: System RAM (DRAM)
> 
> May be slightly more complicated:
> 0x8000-0x80001fff: System RAM (DRAM) for UEFI, WB
> 0x80002000-0x: System RAM (DRAM)
> 
> Kernel will have phys_offset 0x8000, however kexec-tools will calculate it
> as 0x80002000.
> 
> > 
> > If, as Pratyush suggested, "reserved" resources are added to phys_offset
> > calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> > there is no actual mapping around PAGE_OFFSET.
> > It won't hurt anything, but looks funny.
> > So we'd better not include "reserved" in phys_offset calculation anyway.
> > -> Pratyush
> 
> My only concern is that, then we will have different values of phys_offset in
> kernel and kexec-tools, which might lead to further confusion.
> 
> ~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec