[PATCHv2 7/7] x86/mm: isolate the bottom-up style to init_32.c

2019-01-10 Thread Pingfan Liu
bottom-up style is useless in x86_64 any longer, isolate it. Later, it may
be removed completely from x86.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: Chao Fan 
Cc: Baoquan He 
Cc: Juergen Gross 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/x86/mm/init.c| 153 +-
 arch/x86/mm/init_32.c | 147 
 arch/x86/mm/mm_internal.h |   8 ++-
 3 files changed, 155 insertions(+), 153 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 003ad77..6a853e4 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -502,7 +502,7 @@ unsigned long __ref init_memory_mapping(unsigned long start,
  * That range would have hole in the middle or ends, and only ram parts
  * will be mapped in init_range_memory_mapping().
  */
-static unsigned long __init init_range_memory_mapping(
+unsigned long __init init_range_memory_mapping(
   unsigned long r_start,
   unsigned long r_end)
 {
@@ -530,157 +530,6 @@ static unsigned long __init init_range_memory_mapping(
return mapped_ram_size;
 }
 
-#ifdef CONFIG_X86_32
-
-static unsigned long min_pfn_mapped;
-
-static unsigned long __init get_new_step_size(unsigned long step_size)
-{
-   /*
-* Initial mapped size is PMD_SIZE (2M).
-* We can not set step_size to be PUD_SIZE (1G) yet.
-* In worse case, when we cross the 1G boundary, and
-* PG_LEVEL_2M is not set, we will need 1+1+512 pages (2M + 8k)
-* to map 1G range with PTE. Hence we use one less than the
-* difference of page table level shifts.
-*
-* Don't need to worry about overflow in the top-down case, on 32bit,
-* when step_size is 0, round_down() returns 0 for start, and that
-* turns it into 0x1ULL.
-* In the bottom-up case, round_up(x, 0) returns 0 though too, which
-* needs to be taken into consideration by the code below.
-*/
-   return step_size << (PMD_SHIFT - PAGE_SHIFT - 1);
-}
-
-/**
- * memory_map_top_down - Map [map_start, map_end) top down
- * @map_start: start address of the target memory range
- * @map_end: end address of the target memory range
- *
- * This function will setup direct mapping for memory range
- * [map_start, map_end) in top-down. That said, the page tables
- * will be allocated at the end of the memory, and we map the
- * memory in top-down.
- */
-static void __init memory_map_top_down(unsigned long map_start,
-  unsigned long map_end)
-{
-   unsigned long real_end, start, last_start;
-   unsigned long step_size;
-   unsigned long addr;
-   unsigned long mapped_ram_size = 0;
-
-   /* xen has big range in reserved near end of ram, skip it at first.*/
-   addr = memblock_find_in_range(map_start, map_end, PMD_SIZE, PMD_SIZE);
-   real_end = addr + PMD_SIZE;
-
-   /* step_size need to be small so pgt_buf from BRK could cover it */
-   step_size = PMD_SIZE;
-   max_pfn_mapped = 0; /* will get exact value next */
-   min_pfn_mapped = real_end >> PAGE_SHIFT;
-   last_start = start = real_end;
-
-   /*
-* We start from the top (end of memory) and go to the bottom.
-* The memblock_find_in_range() gets us a block of RAM from the
-* end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
-* for page table.
-*/
-   while (last_start > map_start) {
-   if (last_start > step_size) {
-   start = round_down(last_start - 1, step_size);
-   if (start < map_start)
-   start = map_start;
-   } else
-   start = map_start;
-   mapped_ram_size += init_range_memory_mapping(start,
-   last_start);
-   set_alloc_range(min_pfn_mapped, max_pfn_mapped);
-   last_start = start;
-   min_pfn_mapped = last_start >> PAGE_SHIFT;
-   if (mapped_ram_size >= step_size)
-   step_size = get_new_step_size(step_size);
-   }
-
-   if (real_end < map_end) {
-   init_range_memory_mapping(real_end, map_end);
-   set_alloc_range(min_pfn_mapped, max_pfn_mapped);
-   }
-}
-
-/**
- * memory_map_bottom_up - Map [map_start, map_end) bottom up
- * @map_start: start address of the target memory range
- * @map_end: end address of the target memory range
- *
- * This function will setup direct mapping for memory range

[PATCHv2 6/7] x86/mm: remove bottom-up allocation style for x86_64

2019-01-10 Thread Pingfan Liu
Although kaslr-kernel can avoid to stain the movable node. [1] But the
pgtable can still stain the movable node. That is a probability problem,
although low, but exist. This patch tries to make it certainty by
allocating pgtable on unmovable node, instead of following kernel end.
There are two acheivements by this patch:
-1st. keep the subtree of pgtable away from movable node.
With the previous patch, at the point of init_mem_mapping(),
memblock allocator can work with the knowledge of acpi memory hotmovable
info, and avoid to stain the movable node. As a result,
memory_map_bottom_up() is not needed any more.
The following figure show the defection of current bottom-up style:
  [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]
If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
can generate pgtable on nodeC, which stain movable node.
For more lengthy background, please refer to Background section

-2nd. simplify the logic of memory_map_top_down()
Thanks to the help of early_make_pgtable(), x86_64 can directly set up the
subtree of pgtable at any place, hence the careful iteration in
memory_map_top_down() can be discard.

*Background section*
When kaslr kernel can be guaranteed to sit inside unmovable node
after [1]. But if kaslr kernel is located near the end of the movable node,
then bottom-up allocator may create pagetable which crosses the boundary
between unmovable node and movable node.  It is a probability issue,
two factors include -1. how big the gap between kernel end and
unmovable node's end.  -2. how many memory does the system own.
Alternative way to fix this issue is by increasing the gap by
boot/compressed/kaslr*. But taking the scenario of PB level memory,
the pagetable will take server MB even if using 1GB page, different page
attr and fragment will make things worse. So it is hard to decide how much
should the gap increase.

[1]: https://lore.kernel.org/patchwork/patch/1029376/
Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: Chao Fan 
Cc: Baoquan He 
Cc: Juergen Gross 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org

---
 arch/x86/kernel/setup.c |  4 ++--
 arch/x86/mm/init.c  | 56 ++---
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9b57e01..00a1b84 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -827,7 +827,7 @@ static void early_acpi_parse(void)
early_acpi_boot_init();
initmem_init();
/* check whether memory is returned or not */
-   start = memblock_find_in_range(start, end, 1<<24, 1);
+   start = memblock_find_in_range(start, end, 1 << 24, 1);
if (!start)
pr_warn("the above acpi routines change and consume memory\n");
memblock_set_current_limit(orig_start, orig_end, enforcing);
@@ -1135,7 +1135,7 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
 
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_X86_32)
/*
 * Memory used by the kernel cannot be hot-removed because Linux
 * cannot migrate the kernel pages. When memory hotplug is
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 385b9cd..003ad77 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -72,8 +72,6 @@ static unsigned long __initdata pgt_buf_start;
 static unsigned long __initdata pgt_buf_end;
 static unsigned long __initdata pgt_buf_top;
 
-static unsigned long min_pfn_mapped;
-
 static bool __initdata can_use_brk_pgt = true;
 
 static unsigned long min_pfn_allowed;
@@ -532,6 +530,10 @@ static unsigned long __init init_range_memory_mapping(
return mapped_ram_size;
 }
 
+#ifdef CONFIG_X86_32
+
+static unsigned long min_pfn_mapped;
+
 static unsigned long __init get_new_step_size(unsigned long step_size)
 {
/*
@@ -653,6 +655,32 @@ static void __init memory_map_bottom_up(unsigned long 
map_start,
}
 }
 
+static unsigned long __init init_range_memory_mapping32(
+   unsigned long r_start, unsigned long r_end)
+{
+   /*
+* If the allocation is in bottom-up direction, we setup direct mapping
+* in bottom-up, otherwise we setup direct mapping in top-down.
+*/
+   if (memblock_bottom_up()) {
+   unsigned long kernel_end = __pa_symbol(_end);
+
+   /*
+* we need two separate calls here. This is because we want to
+* allocate page tables above the kernel. So we first map
+* [kernel_end, end) to make memory above the kernel be 

[PATCHv2 1/7] x86/mm: concentrate the code to memblock allocator enabled

2019-01-10 Thread Pingfan Liu
This patch identifies the point where memblock alloc start. It has no
functional.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: Chao Fan 
Cc: Baoquan He 
Cc: Juergen Gross 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/x86/kernel/setup.c | 54 -
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d494b9b..ac432ae 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -962,29 +962,6 @@ void __init setup_arch(char **cmdline_p)
 
if (efi_enabled(EFI_BOOT))
efi_memblock_x86_reserve_range();
-#ifdef CONFIG_MEMORY_HOTPLUG
-   /*
-* Memory used by the kernel cannot be hot-removed because Linux
-* cannot migrate the kernel pages. When memory hotplug is
-* enabled, we should prevent memblock from allocating memory
-* for the kernel.
-*
-* ACPI SRAT records all hotpluggable memory ranges. But before
-* SRAT is parsed, we don't know about it.
-*
-* The kernel image is loaded into memory at very early time. We
-* cannot prevent this anyway. So on NUMA system, we set any
-* node the kernel resides in as un-hotpluggable.
-*
-* Since on modern servers, one node could have double-digit
-* gigabytes memory, we can assume the memory around the kernel
-* image is also un-hotpluggable. So before SRAT is parsed, just
-* allocate memory near the kernel image to try the best to keep
-* the kernel away from hotpluggable memory.
-*/
-   if (movable_node_is_enabled())
-   memblock_set_bottom_up(true);
-#endif
 
x86_report_nx();
 
@@ -1096,9 +1073,6 @@ void __init setup_arch(char **cmdline_p)
 
cleanup_highmap();
 
-   memblock_set_current_limit(ISA_END_ADDRESS);
-   e820__memblock_setup();
-
reserve_bios_regions();
 
if (efi_enabled(EFI_MEMMAP)) {
@@ -1113,6 +1087,8 @@ void __init setup_arch(char **cmdline_p)
efi_reserve_boot_services();
}
 
+   memblock_set_current_limit(0, ISA_END_ADDRESS, false);
+   e820__memblock_setup();
/* preallocate 4k for mptable mpc */
e820__memblock_alloc_reserved_mpc_new();
 
@@ -1130,7 +1106,31 @@ void __init setup_arch(char **cmdline_p)
trim_platform_memory_ranges();
trim_low_memory_range();
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+   /*
+* Memory used by the kernel cannot be hot-removed because Linux
+* cannot migrate the kernel pages. When memory hotplug is
+* enabled, we should prevent memblock from allocating memory
+* for the kernel.
+*
+* ACPI SRAT records all hotpluggable memory ranges. But before
+* SRAT is parsed, we don't know about it.
+*
+* The kernel image is loaded into memory at very early time. We
+* cannot prevent this anyway. So on NUMA system, we set any
+* node the kernel resides in as un-hotpluggable.
+*
+* Since on modern servers, one node could have double-digit
+* gigabytes memory, we can assume the memory around the kernel
+* image is also un-hotpluggable. So before SRAT is parsed, just
+* allocate memory near the kernel image to try the best to keep
+* the kernel away from hotpluggable memory.
+*/
+   if (movable_node_is_enabled())
+   memblock_set_bottom_up(true);
+#endif
init_mem_mapping();
+   memblock_set_current_limit(get_max_mapped());
 
idt_setup_early_pf();
 
@@ -1145,8 +1145,6 @@ void __init setup_arch(char **cmdline_p)
 */
mmu_cr4_features = __read_cr4() & ~X86_CR4_PCIDE;
 
-   memblock_set_current_limit(get_max_mapped());
-
/*
 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
 */
-- 
2.7.4



[PATCHv2 3/7] mm/memblock: introduce allocation boundary for tracing purpose

2019-01-10 Thread Pingfan Liu
During boot time, there is requirement to tell whether a series of func
call will consume memory or not. For some reason, a temporary memory
resource can be loan to those func through memblock allocator, but at a
check point, all of the loan memory should be turned back.
A typical using style:
 -1. find a usable range by memblock_find_in_range(), said, [A,B]
 -2. before calling a series of func, memblock_set_current_limit(A,B,true)
 -3. call funcs
 -4. memblock_find_in_range(A,B,B-A,1), if failed, then some memory is not
 turned back.
 -5. reset the original limit

E.g. in the case of hotmovable memory, some acpi routines should be called,
and they are not allowed to own some movable memory. Although at present
these functions do not consume memory, but later, if changed without
awareness, they may do. With the above method, the allocation can be
detected, and pr_warn() to ask people to resolve it.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: Chao Fan 
Cc: Baoquan He 
Cc: Juergen Gross 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
---
 arch/arm/mm/init.c  |  3 ++-
 arch/arm/mm/mmu.c   |  4 ++--
 arch/arm/mm/nommu.c |  2 +-
 arch/csky/kernel/setup.c|  2 +-
 arch/microblaze/mm/init.c   |  2 +-
 arch/mips/kernel/setup.c|  2 +-
 arch/powerpc/mm/40x_mmu.c   |  6 --
 arch/powerpc/mm/44x_mmu.c   |  2 +-
 arch/powerpc/mm/8xx_mmu.c   |  2 +-
 arch/powerpc/mm/fsl_booke_mmu.c |  5 +++--
 arch/powerpc/mm/hash_utils_64.c |  4 ++--
 arch/powerpc/mm/init_32.c   |  2 +-
 arch/powerpc/mm/pgtable-radix.c |  2 +-
 arch/powerpc/mm/ppc_mmu_32.c|  8 ++--
 arch/powerpc/mm/tlb_nohash.c|  6 --
 arch/unicore32/mm/mmu.c |  2 +-
 arch/x86/kernel/setup.c |  2 +-
 arch/xtensa/mm/init.c   |  2 +-
 include/linux/memblock.h| 10 +++---
 mm/memblock.c   | 23 ++-
 20 files changed, 59 insertions(+), 32 deletions(-)

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 32e4845..58a4342 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -93,7 +93,8 @@ __tagtable(ATAG_INITRD2, parse_tag_initrd2);
 static void __init find_limits(unsigned long *min, unsigned long *max_low,
   unsigned long *max_high)
 {
-   *max_low = PFN_DOWN(memblock_get_current_limit());
+   memblock_get_current_limit(NULL, max_low);
+   *max_low = PFN_DOWN(*max_low);
*min = PFN_UP(memblock_start_of_DRAM());
*max_high = PFN_DOWN(memblock_end_of_DRAM());
 }
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index f5cc1cc..9025418 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1240,7 +1240,7 @@ void __init adjust_lowmem_bounds(void)
}
}
 
-   memblock_set_current_limit(memblock_limit);
+   memblock_set_current_limit(0, memblock_limit, false);
 }
 
 static inline void prepare_page_table(void)
@@ -1625,7 +1625,7 @@ void __init paging_init(const struct machine_desc *mdesc)
 
prepare_page_table();
map_lowmem();
-   memblock_set_current_limit(arm_lowmem_limit);
+   memblock_set_current_limit(0, arm_lowmem_limit, false);
dma_contiguous_remap();
early_fixmap_shutdown();
devicemaps_init(mdesc);
diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c
index 7d67c70..721535c 100644
--- a/arch/arm/mm/nommu.c
+++ b/arch/arm/mm/nommu.c
@@ -138,7 +138,7 @@ void __init adjust_lowmem_bounds(void)
adjust_lowmem_bounds_mpu();
end = memblock_end_of_DRAM();
high_memory = __va(end - 1) + 1;
-   memblock_set_current_limit(end);
+   memblock_set_current_limit(0, end, false);
 }
 
 /*
diff --git a/arch/csky/kernel/setup.c b/arch/csky/kernel/setup.c
index dff8b89..e6f88bf 100644
--- a/arch/csky/kernel/setup.c
+++ b/arch/csky/kernel/setup.c
@@ -100,7 +100,7 @@ static void __init csky_memblock_init(void)
 
highend_pfn = max_pfn;
 #endif
-   memblock_set_current_limit(PFN_PHYS(max_low_pfn));
+   memblock_set_current_limit(0, PFN_PHYS(max_low_pfn), false);
 
dma_contiguous_reserve(0);
 
diff --git a/arch/microblaze/mm/init.c b/arch/microblaze/mm/init.c
index b17fd8a..cee99da 100644
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -353,7 +353,7 @@ asmlinkage void __init mmu_init(void)
/* Shortly after that, the entire linear mapping will be available */
/* This will also cause that unflatten device tree will be allocated
 * inside 768MB limit */
-   memblock_set_current_limit(memory_start + lowmem_size - 1);
+   memblock_set_current_limit(0, memory_start + lowmem_size - 1, false);
 

[PATCHv2 0/7] x86_64/mm: remove bottom-up allocation style by pushing forward the parsing of mem hotplug info

2019-01-10 Thread Pingfan Liu
Background
When kaslr kernel can be guaranteed to sit inside unmovable node
after [1]. But if kaslr kernel is located near the end of the movable node,
then bottom-up allocator may create pagetable which crosses the boundary
between unmovable node and movable node.  It is a probability issue,
two factors include -1. how big the gap between kernel end and
unmovable node's end.  -2. how many memory does the system own.
Alternative way to fix this issue is by increasing the gap by
boot/compressed/kaslr*. But taking the scenario of PB level memory,
the pagetable will take server MB even if using 1GB page, different page
attr and fragment will make things worse. So it is hard to decide how much
should the gap increase.
The following figure show the defection of current bottom-up style:
  [startA, endA][startB, "kaslr kernel verly close to" endB][startC, endC]

If nodeA,B is unmovable, while nodeC is movable, then init_mem_mapping()
can generate pgtable on nodeC, which stain movable node.

This patch makes it certainty instead of a probablity problem. It achieves
this by pushing forward the parsing of mem hotplug info ahead of 
init_mem_mapping().

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Yinghai Lu 
Cc: Tejun Heo 
Cc: Chao Fan 
Cc: Baoquan He 
Cc: Juergen Gross 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: x...@kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Pingfan Liu (7):
  x86/mm: concentrate the code to memblock allocator enabled
  acpi: change the topo of acpi_table_upgrade()
  mm/memblock: introduce allocation boundary for tracing purpose
  x86/setup: parse acpi to get hotplug info before init_mem_mapping()
  x86/mm: set allowed range for memblock allocator
  x86/mm: remove bottom-up allocation style for x86_64
  x86/mm: isolate the bottom-up style to init_32.c

 arch/arm/mm/init.c  |   3 +-
 arch/arm/mm/mmu.c   |   4 +-
 arch/arm/mm/nommu.c |   2 +-
 arch/arm64/kernel/setup.c   |   2 +-
 arch/csky/kernel/setup.c|   2 +-
 arch/microblaze/mm/init.c   |   2 +-
 arch/mips/kernel/setup.c|   2 +-
 arch/powerpc/mm/40x_mmu.c   |   6 +-
 arch/powerpc/mm/44x_mmu.c   |   2 +-
 arch/powerpc/mm/8xx_mmu.c   |   2 +-
 arch/powerpc/mm/fsl_booke_mmu.c |   5 +-
 arch/powerpc/mm/hash_utils_64.c |   4 +-
 arch/powerpc/mm/init_32.c   |   2 +-
 arch/powerpc/mm/pgtable-radix.c |   2 +-
 arch/powerpc/mm/ppc_mmu_32.c|   8 +-
 arch/powerpc/mm/tlb_nohash.c|   6 +-
 arch/unicore32/mm/mmu.c |   2 +-
 arch/x86/kernel/setup.c |  93 ++-
 arch/x86/mm/init.c  | 163 +---
 arch/x86/mm/init_32.c   | 147 
 arch/x86/mm/mm_internal.h   |   8 +-
 arch/xtensa/mm/init.c   |   2 +-
 drivers/acpi/tables.c   |   4 +-
 include/linux/acpi.h|   5 +-
 include/linux/memblock.h|  10 ++-
 mm/memblock.c   |  23 --
 26 files changed, 290 insertions(+), 221 deletions(-)

-- 
2.7.4



Re: [alsa-devel] [PATCH v2] ASoC: soc-core: defer card probe until all component is added to list

2019-01-10 Thread Pierre-Louis Bossart
While debugging Skylake audio stuff, I came across a kernel oops 
introduced by this commit.


It's quite late here and my brain is fried, submitting as is but my 
money is on the use of link->platform->of_node which is quite unlikely 
to work on ACPI platforms.


and btw you may want to fix the typos, it's registration, not registartion.

-Pierre

8780cf1142a59568a3aa77959cbd76b2edb6fd81 is the first bad commit
commit 8780cf1142a59568a3aa77959cbd76b2edb6fd81
Author: Ajit Pandey 
Date:   Wed Jan 9 14:17:07 2019 +0530

    ASoC: soc-core: defer card probe until all component is added to list

    DAI component probe is not called if it is not present
    in component list during sound card registration.
    Check if component is available in component list for
    platform and cpu dai before soundcard registration.

    Signed-off-by: Ajit Pandey 
    Signed-off-by: Rohit kumar 
    Signed-off-by: Mark Brown 

:04 04 98da59b0a73551030a0c9030b8cd58114003c82b 
48f0618f37a16dcfea5999ecd9743edbb0763594 M    sound


[    2.686029] HDMI HDA Codec ehdaudio0D2: Max dais supported: 3
[    2.687854] BUG: unable to handle kernel NULL pointer dereference at 


[    2.687858] PGD 0 P4D 0
[    2.687862] Oops:  [#1] SMP PTI
[    2.687866] CPU: 1 PID: 1647 Comm: systemd-udevd Not tainted 
4.20.0-rc7-test+ #88
[    2.687867] Hardware name: Dell Inc. XPS 13 9350/07TYC2, BIOS 1.0.0 
09/10/2015

[    2.687872] RIP: 0010:strcmp+0xc/0x20
[    2.687875] Code: 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 
4a ff 75 ed f3 c3 0f 1f 80 00 00 00 00 48 83 c7 01 0f b6 47 ff 48 83 c6 
01 <3a> 46 ff 75 07 84 c0 75 eb 31 c0 c3 19 c0 83 c8 01 c3 66 90 48 85

[    2.687877] RSP: 0018:9fadc104bb18 EFLAGS: 00010202
[    2.687880] RAX: 0065 RBX: 9d6834ba5428 RCX: 
0001
[    2.687882] RDX: c0288d00 RSI: 0001 RDI: 
9d68351b5a61
[    2.687883] RBP:  R08: 0001 R09: 
9d6836dbfd80
[    2.687885] R10:  R11: 9d6835e65648 R12: 

[    2.687887] R13:  R14:  R15: 
9fadc104be98
[    2.687889] FS:  7f976806a8c0() GS:9d6838a8() 
knlGS:

[    2.687891] CS:  0010 DS:  ES:  CR0: 80050033
[    2.687893] CR2:  CR3: 0002b4286002 CR4: 
003606e0

[    2.687895] Call Trace:
[    2.687902]  soc_find_component+0x4c/0x70 [snd_soc_core]
[    2.687908]  soc_init_dai_link+0x124/0x280 [snd_soc_core]
[    2.687913]  snd_soc_register_card+0x6b/0x1f0 [snd_soc_core]
[    2.687918]  ? __devres_alloc_node+0x2c/0x60
[    2.687922]  devm_snd_soc_register_card+0x3e/0x80 [snd_soc_core]
[    2.687926]  platform_drv_probe+0x35/0x90
[    2.687930]  ? driver_sysfs_add+0x70/0xd0
[    2.687932]  really_probe+0xee/0x2e0
[    2.687935]  driver_probe_device+0x4a/0xe0
[    2.687938]  __driver_attach+0xac/0xb0
[    2.687941]  ? driver_probe_device+0xe0/0xe0
[    2.687943]  bus_for_each_dev+0x71/0xb0
[    2.687946]  bus_add_driver+0x191/0x210
[    2.687948]  ? 0xc01bf000
[    2.687951]  driver_register+0x56/0xe0
[    2.687953]  ? 0xc01bf000
[    2.687956]  do_one_initcall+0x41/0x1b8
[    2.687960]  ? kobject_uevent_env+0x101/0x680
[    2.687962]  ? _cond_resched+0x10/0x40
[    2.687966]  ? kmem_cache_alloc_trace+0x35/0x160
[    2.687969]  do_init_module+0x56/0x1db
[    2.687973]  load_module+0x1e7c/0x2560
[    2.687976]  ? vfs_read+0x10a/0x130
[    2.687979]  ? __do_sys_finit_module+0xba/0xe0
[    2.687983]  __do_sys_finit_module+0xba/0xe0
[    2.687988]  do_syscall_64+0x43/0xf0
[    2.687992]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    2.687995] RIP: 0033:0x7f9768aef219
[    2.687998] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 
48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 47 fc 0c 00 f7 d8 64 89 01 48
[    2.688000] RSP: 002b:7ffccf3a4c98 EFLAGS: 0246 ORIG_RAX: 
0139
[    2.688003] RAX: ffda RBX: 55991cf57970 RCX: 
7f9768aef219
[    2.688006] RDX:  RSI: 7f97689d3cad RDI: 
000f
[    2.688008] RBP: 7f97689d3cad R08:  R09: 

[    2.688010] R10: 000f R11: 0246 R12: 

[    2.688012] R13: 55991cf49930 R14: 0002 R15: 
55991cf57970
[    2.688015] Modules linked in: snd_soc_skl_hda_dsp(+) 
snd_soc_hdac_hdmi snd_soc_dmic ax88179_178a(+) usbnet 
snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl snd_soc_hdac_hda 
snd_hda_ext_core snd_soc_skl_ipc x86_pkg_temp_thermal snd_soc_sst_ipc 
snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core 
snd_compress snd_hda_codec snd_hwdep snd_hda_core snd_pcm efivarfs 
intel_lpss_pci xhci_pci intel_lpss mfd_core xhci_hcd

[    2.688031] CR2: 
[    2.688034] ---[ end trace 8b96d01935d9effd ]---
[    

[PATCH 1/1] iommu/vt-d: Support page request in scalable mode

2019-01-10 Thread Lu Baolu
From: Jacob Pan 

VT-d Rev3.0 has made a few changes to the page request interface,

1. widened PRQ descriptor from 128 bits to 256 bits;
2. removed streaming response type;
3. introduced private data that requires page response even the
   request is not last request in group (LPIG).

This is a supplement to commit 1c4f88b7f1f92 ("iommu/vt-d: Shared
virtual address in scalable mode") and makes the svm code compliant
with VT-d Rev3.0.

Cc: Ashok Raj 
Cc: Liu Yi L 
Cc: Kevin Tian 
Signed-off-by: Jacob Pan 
Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-svm.c   | 77 ++---
 include/linux/intel-iommu.h | 21 +-
 include/linux/intel-svm.h   |  2 +-
 3 files changed, 55 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a2a2aa4439aa..79add5716552 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -470,20 +470,31 @@ EXPORT_SYMBOL_GPL(intel_svm_is_pasid_valid);
 
 /* Page request queue descriptor */
 struct page_req_dsc {
-   u64 srr:1;
-   u64 bof:1;
-   u64 pasid_present:1;
-   u64 lpig:1;
-   u64 pasid:20;
-   u64 bus:8;
-   u64 private:23;
-   u64 prg_index:9;
-   u64 rd_req:1;
-   u64 wr_req:1;
-   u64 exe_req:1;
-   u64 priv_req:1;
-   u64 devfn:8;
-   u64 addr:52;
+   union {
+   struct {
+   u64 type:8;
+   u64 pasid_present:1;
+   u64 priv_data_present:1;
+   u64 rsvd:6;
+   u64 rid:16;
+   u64 pasid:20;
+   u64 exe_req:1;
+   u64 pm_req:1;
+   u64 rsvd2:10;
+   };
+   u64 qw_0;
+   };
+   union {
+   struct {
+   u64 rd_req:1;
+   u64 wr_req:1;
+   u64 lpig:1;
+   u64 prg_index:9;
+   u64 addr:52;
+   };
+   u64 qw_1;
+   };
+   u64 priv_data[2];
 };
 
 #define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x10)
@@ -596,7 +607,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
/* Accounting for major/minor faults? */
rcu_read_lock();
list_for_each_entry_rcu(sdev, >devs, list) {
-   if (sdev->sid == PCI_DEVID(req->bus, req->devfn))
+   if (sdev->sid == req->rid)
break;
}
/* Other devices can go away, but the drivers are not permitted
@@ -609,33 +620,35 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 
if (sdev && sdev->ops && sdev->ops->fault_cb) {
int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
-   (req->exe_req << 1) | (req->priv_req);
-   sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr, 
req->private, rwxp, result);
+   (req->exe_req << 1) | (req->pm_req);
+   sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr,
+   req->priv_data, rwxp, result);
}
/* We get here in the error case where the PASID lookup failed,
   and these can be NULL. Do not use them below this point! */
sdev = NULL;
svm = NULL;
no_pasid:
-   if (req->lpig) {
-   /* Page Group Response */
+   if (req->lpig || req->priv_data_present) {
+   /*
+* Per VT-d spec. v3.0 ch7.7, system software must
+* respond with page group response if private data
+* is present (PDP) or last page in group (LPIG) bit
+* is set. This is an additional VT-d feature beyond
+* PCI ATS spec.
+*/
resp.qw0 = QI_PGRP_PASID(req->pasid) |
-   QI_PGRP_DID((req->bus << 8) | req->devfn) |
+   QI_PGRP_DID(req->rid) |
QI_PGRP_PASID_P(req->pasid_present) |
+   QI_PGRP_PDP(req->pasid_present) |
+   QI_PGRP_RESP_CODE(result) |
QI_PGRP_RESP_TYPE;
resp.qw1 = QI_PGRP_IDX(req->prg_index) |
-   QI_PGRP_PRIV(req->private) |
-   QI_PGRP_RESP_CODE(result);
-   } else if (req->srr) {
-   /* Page Stream Response */
-   resp.qw0 = QI_PSTRM_IDX(req->prg_index) |
-   

[PATCH INTERNAL V3 2/3] drivers: pwm: pwm-bcm-kona: Add pwm-kona-v2 support

2019-01-10 Thread Sheetal Tigadoli
From: Praveen Kumar B 

Add support for new version of pwm-kona.
Add support to make PWM changes configured and stable.

Signed-off-by: Praveen Kumar B 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
Signed-off-by: Sheetal Tigadoli 
---
 drivers/pwm/pwm-bcm-kona.c | 128 ++---
 1 file changed, 98 insertions(+), 30 deletions(-)

diff --git a/drivers/pwm/pwm-bcm-kona.c b/drivers/pwm/pwm-bcm-kona.c
index 09a95ae..2b44ad8 100644
--- a/drivers/pwm/pwm-bcm-kona.c
+++ b/drivers/pwm/pwm-bcm-kona.c
@@ -45,30 +45,39 @@
  *high or low depending on its state at that exact instant.
  */
 
-#define PWM_CONTROL_OFFSET (0x)
+#define PWM_CONTROL_OFFSET 0x
 #define PWM_CONTROL_SMOOTH_SHIFT(chan) (24 + (chan))
 #define PWM_CONTROL_TYPE_SHIFT(chan)   (16 + (chan))
 #define PWM_CONTROL_POLARITY_SHIFT(chan)   (8 + (chan))
 #define PWM_CONTROL_TRIGGER_SHIFT(chan)(chan)
 
-#define PRESCALE_OFFSET(0x0004)
+#define PRESCALE_OFFSET0x0004
 #define PRESCALE_SHIFT(chan)   ((chan) << 2)
 #define PRESCALE_MASK(chan)(0x7 << PRESCALE_SHIFT(chan))
-#define PRESCALE_MIN   (0x)
-#define PRESCALE_MAX   (0x0007)
+#define PRESCALE_MIN   0x
+#define PRESCALE_MAX   0x0007
 
 #define PERIOD_COUNT_OFFSET(chan)  (0x0008 + ((chan) << 3))
-#define PERIOD_COUNT_MIN   (0x0002)
-#define PERIOD_COUNT_MAX   (0x00ff)
+#define PERIOD_COUNT_MIN   0x0002
+#define PERIOD_COUNT_MAX   0x00ff
 
 #define DUTY_CYCLE_HIGH_OFFSET(chan)   (0x000c + ((chan) << 3))
-#define DUTY_CYCLE_HIGH_MIN(0x)
-#define DUTY_CYCLE_HIGH_MAX(0x00ff)
+#define DUTY_CYCLE_HIGH_MIN0x
+#define DUTY_CYCLE_HIGH_MAX0x00ff
+
+#define PWM_MONITOR_OFFSET 0xb0
+#define PWM_MONITOR_TIMEOUT_US 5
+
+enum kona_pwmc_ver {
+   KONA_PWM_V1 = 1,
+   KONA_PWM_V2
+};
 
 struct kona_pwmc {
struct pwm_chip chip;
void __iomem *base;
struct clk *clk;
+   enum kona_pwmc_ver version;
 };
 
 static inline struct kona_pwmc *to_kona_pwmc(struct pwm_chip *_chip)
@@ -76,11 +85,40 @@ static inline struct kona_pwmc *to_kona_pwmc(struct 
pwm_chip *_chip)
return container_of(_chip, struct kona_pwmc, chip);
 }
 
+static int kona_pwmc_wait_stable(struct pwm_chip *chip, unsigned int chan,
+unsigned int kona_ver)
+{
+   struct kona_pwmc *kp = to_kona_pwmc(chip);
+   unsigned int value;
+   unsigned int count = PWM_MONITOR_TIMEOUT_US * 1000;
+
+   switch (kona_ver) {
+   case KONA_PWM_V1:
+   /*
+* There must be a min 400ns delay between clearing trigger and
+* settingit. Failing to do this may result in no PWM signal.
+*/
+   ndelay(400);
+   return 0;
+   case KONA_PWM_V2:
+   do {
+   value = readl(kp->base + PWM_MONITOR_OFFSET);
+   if (!(value & (BIT(chan
+   return 0;
+   ndelay(1);
+   } while (count--);
+
+   return -ETIMEDOUT;
+   default:
+   return -ENODEV;
+   }
+}
+
 /*
  * Clear trigger bit but set smooth bit to maintain old output.
  */
-static void kona_pwmc_prepare_for_settings(struct kona_pwmc *kp,
-   unsigned int chan)
+static int kona_pwmc_prepare_for_settings(struct kona_pwmc *kp,
+ unsigned int chan)
 {
unsigned int value = readl(kp->base + PWM_CONTROL_OFFSET);
 
@@ -88,14 +126,10 @@ static void kona_pwmc_prepare_for_settings(struct 
kona_pwmc *kp,
value &= ~(1 << PWM_CONTROL_TRIGGER_SHIFT(chan));
writel(value, kp->base + PWM_CONTROL_OFFSET);
 
-   /*
-* There must be a min 400ns delay between clearing trigger and setting
-* it. Failing to do this may result in no PWM signal.
-*/
-   ndelay(400);
+   return kona_pwmc_wait_stable(>chip, chan, kp->version);
 }
 
-static void kona_pwmc_apply_settings(struct kona_pwmc *kp, unsigned int chan)
+static int kona_pwmc_apply_settings(struct kona_pwmc *kp, unsigned int chan)
 {
unsigned int value = readl(kp->base + PWM_CONTROL_OFFSET);
 
@@ -104,8 +138,7 @@ static void kona_pwmc_apply_settings(struct kona_pwmc *kp, 
unsigned int chan)
value |= 1 << PWM_CONTROL_TRIGGER_SHIFT(chan);
writel(value, kp->base + PWM_CONTROL_OFFSET);
 
-   /* Trigger bit must be held high for at least 400 ns. */
- 

[PATCH INTERNAL V3 1/3] dt-bindings: pwm: kona: Add new compatible for new version pwm-kona

2019-01-10 Thread Sheetal Tigadoli
From: Praveen Kumar B 

Add new compatible string for new version of pwm-kona

Signed-off-by: Praveen Kumar B 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
Signed-off-by: Sheetal Tigadoli 
---
 Documentation/devicetree/bindings/pwm/brcm,kona-pwm.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/pwm/brcm,kona-pwm.txt 
b/Documentation/devicetree/bindings/pwm/brcm,kona-pwm.txt
index 8eae9fe..d37f697 100644
--- a/Documentation/devicetree/bindings/pwm/brcm,kona-pwm.txt
+++ b/Documentation/devicetree/bindings/pwm/brcm,kona-pwm.txt
@@ -3,7 +3,7 @@ Broadcom Kona PWM controller device tree bindings
 This controller has 6 channels.
 
 Required Properties :
-- compatible: should contain "brcm,kona-pwm"
+- compatible: should contain "brcm,kona-pwm" or "brcm,kona-pwm-v2"
 - reg: physical base address and length of the controller's registers
 - clocks: phandle + clock specifier pair for the external clock
 - #pwm-cells: Should be 3. See pwm.txt in this directory for a
-- 
1.9.1



[PATCH INTERNAL V3 3/3] ARM: dts: cygnus: Change pwm compatible to new version

2019-01-10 Thread Sheetal Tigadoli
From: Praveen Kumar B 

Change pwm compatible to new version of pwm-kona
Add new compatible to check pwm configure status

Signed-off-by: Praveen Kumar B 
Reviewed-by: Ray Jui 
Reviewed-by: Scott Branden 
Signed-off-by: Sheetal Tigadoli 
---
 arch/arm/boot/dts/bcm-cygnus.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/bcm-cygnus.dtsi 
b/arch/arm/boot/dts/bcm-cygnus.dtsi
index 253df71..2a433e7 100644
--- a/arch/arm/boot/dts/bcm-cygnus.dtsi
+++ b/arch/arm/boot/dts/bcm-cygnus.dtsi
@@ -595,7 +595,7 @@
};
 
pwm: pwm@180aa500 {
-   compatible = "brcm,kona-pwm";
+   compatible = "brcm,kona-pwm-v2";
reg = <0x180aa500 0xc4>;
#pwm-cells = <3>;
clocks = <_clks BCM_CYGNUS_ASIU_PWM_CLK>;
-- 
1.9.1



[PATCH INTERNAL V3 0/3] Add support for PWM Configure and stablize for PWM kona

2019-01-10 Thread Sheetal Tigadoli
Hi,
This patchset contain support to make PWM changes configure
and stablize
Following are brief changes done
a. Add support for version2 compatible string
b. Change PWM config and stablize delay in PWM Kona

Praveen Kumar B (3):
  dt-bindings: pwm: kona: Add new compatible for new version pwm-kona
  drivers: pwm: pwm-bcm-kona: Add pwm-kona-v2 support
  ARM: dts: cygnus: Change pwm compatible to new version

 .../devicetree/bindings/pwm/brcm,kona-pwm.txt  |   2 +-
 arch/arm/boot/dts/bcm-cygnus.dtsi  |   2 +-
 drivers/pwm/pwm-bcm-kona.c | 128 -
 3 files changed, 100 insertions(+), 32 deletions(-)

-- 
1.9.1



Re: x86/fpu: Don't export __kernel_fpu_{begin,end}()

2019-01-10 Thread Lukas Wunner
On Thu, Jan 10, 2019 at 07:24:13PM +0100, Greg Kroah-Hartman wrote:
> My tolerance for ZFS is pretty non-existant.  Sun explicitly did not
> want their code to work on Linux, so why would we do extra work to get
> their code to work properly?

ZoL facilitates seamless r/w cross-mounting with macOS, something no
other filesystem allows, and that feature is critical for me to work
on Linux drivers for Mac hardware.  Please don't make life harder than
necessary for developers like me.  Your "extra work" argument seems
disingenuous to me, Sebastian's patch is causing extra work for
ZFS developers, not the kernel community.  The maintenance burden
for the kernel community to retain the export is zero.

I respectfully ask for 12209993e98c to be reverted, or alternatively
amended to keep ZoL working.

Thanks,

Lukas


Re: PROBLEM: syzkaller found / pool corruption-overwrite / page in user-area or NULL

2019-01-10 Thread Esme
‐‐‐ Original Message ‐‐‐
On Thursday, January 10, 2019 11:52 PM, Qian Cai  wrote:

> On 1/10/19 10:15 PM, Esme wrote:
>
> > > > [ 75.793150] RIP: 0010:rb_insert_color+0x189/0x1480
> > >
> > > What's in that line? Try,
> > > $ ./scripts/faddr2line vmlinux rb_insert_color+0x189/0x1480
> >
> > rb_insert_color+0x189/0x1480:
> > __rb_insert at /home/files/git/linux/lib/rbtree.c:131
> > (inlined by) rb_insert_color at /home/files/git/linux/lib/rbtree.c:452
>
> gparent = rb_red_parent(parent);
>
> tmp = gparent->rb_right; <-- GFP triggered here.
>
> It suggests gparent is NULL. Looks like it misses a check there because parent
> is the top node.
>
> > > What's steps to reproduce this?
> >
> > The steps is the kernel config provided (proc.config) and I double checked 
> > the attached C code from the qemu image (attached here). If the kernel does 
> > not immediately crash, a ^C will cause the fault to be noticed. The report 
> > from earlier is the report from the same code, my assumption was that the 
> > possible pool/redzone corruption is making it a bit tricky to pin down.
> > If you would like alternative kernel settings please let me know, I can do 
> > that, also, my current test-bench has about 256 core's on x64, 64 of them 
> > are bare metal and 32 are arm64. Any possible preferred configuration 
> > tweaks I'm all ears, I'll be including some of these steps you suggested to 
> > me in any/additional upcoming threads (Thank you for that so far and future 
> > suggestions).
> > Also, there is some occasionally varying stacks depending on the 
> > corruption, so this stack just now (another execution of test3.c);
>
> I am unable to reproduce any of those here. What's is the output of
> /proc/cmdline in your guest when this happens?

console=ttyS0 root=/dev/sda debug earlyprintk=serial slub_debug=QUZ



Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dominique Martinet
Linus Torvalds wrote on Thu, Jan 10, 2019:
> On Thu, Jan 10, 2019 at 4:25 AM Dominique Martinet
>  wrote:
> > Linus Torvalds wrote on Thu, Jan 10, 2019:
> > > (Except, of course, if somebody actually notices outside of tests.
> > > Which may well happen and just force us to revert that commit. But
> > > that's a separate issue entirely).
> >
> > Both Dave and I pointed at a couple of utilities that break with
> > this. nocache can arguably work with the new behaviour but will behave
> > differently; vmtouch on the other hand is no longer able to display
> > what's in cache or not - people use that for example to "warm up" a
> > container in page cache based on how it appears after it had been
> > running for a while is a pretty valid usecase to me.
> 
> So honestly, the main reason I'm loath to revert is that yes, we know
> of theoretical differences, but they seem to all be
> performance-related.

I don't see what other use mincore could have, yes - even the
"debugging" use I gave is performance investigations and not hard
problems (and I probably would go straight to perf nowadays, you'd get
the info that the program doesn't use cache from the call graphs)

> It would be really good to hear numbers. Is the warm-up optimization
> something that changes things from 3ms to 3.5ms? Or does it change
> things from 3ms to half a second?

This is heavily workload and storage hardware dependant, so hard to give
some absolute value.

Trying with some big server, fast SSD, mysql and doing:
 # echo 3 > /proc/sys/vm/drop_caches
 # (optional) prefetch table and innodb files
 # systemctl restart mariadb
 # time mysql -q db "select * from mytable where id in $ENTRIES" > /dev/null
 # time mysql -q db "select * from mytable where id in $ENTRIES2" > /dev/null
 # time mysql -q db "select * from mytable where id in $ENTRIES3" > /dev/null
(where ENTRIES* are lists of 1000 id, and id is indexed; the table is 8GB
for 62590661 entries so 1000 entries is approx 128KB of data out of that
file)

I get on average over a few queries approximately a real time of 350ms,
230ms and 220ms immediately after drop cache and service restart, and
150ms, 60ms and 60ms after a prefetch (hand-wavy average over 3 runs, I
didn't have the patience to do proper testing).
(In both cases, user/sys are less than 10ms; I don't see much difference
there)

If I restart the service without dropping caches and redo the query I
get 60ms from the first query onwards so I must not be preloading
everything properly, some real script that would look all over a
container to properly restore the page cache would do better than me
blindly preloading a few files.

Either way, we're talking about a factor of 2-3 until the application has
been looking at most of the entries, and I didn't try to see how that
would look like on spinning disks or the kind of slow storage one would
get on VPS somewhere in the cloud - I'm sure someone with time to waste
could get much more impressive figures, but this already look pretty
worthwhile to me.

-- 
Dominique Martinet | Asmadeus


Re: PROBLEM: syzkaller found / pool corruption-overwrite / page in user-area or NULL

2019-01-10 Thread Qian Cai



On 1/10/19 10:15 PM, Esme wrote:
>>> [ 75.793150] RIP: 0010:rb_insert_color+0x189/0x1480
>>
>> What's in that line? Try,
>>
>> $ ./scripts/faddr2line vmlinux rb_insert_color+0x189/0x1480
> 
> rb_insert_color+0x189/0x1480:
> __rb_insert at /home/files/git/linux/lib/rbtree.c:131
> (inlined by) rb_insert_color at /home/files/git/linux/lib/rbtree.c:452
> 

gparent = rb_red_parent(parent);

tmp = gparent->rb_right; <-- GFP triggered here.

It suggests gparent is NULL. Looks like it misses a check there because parent
is the top node.

>>
>> What's steps to reproduce this?
> 
> The steps is the kernel config provided (proc.config) and I double checked 
> the attached C code from the qemu image (attached here).  If the kernel does 
> not immediately crash, a ^C will cause the fault to be noticed.  The report 
> from earlier is the report from the same code, my assumption was that the 
> possible pool/redzone corruption is making it a bit tricky to pin down.
> 
> If you would like alternative kernel settings please let me know, I can do 
> that, also, my current test-bench has about 256 core's on x64, 64 of them are 
> bare metal and 32 are arm64.  Any possible preferred configuration tweaks I'm 
> all ears, I'll be including some of these steps you suggested to me in 
> any/additional upcoming threads (Thank you for that so far and future 
> suggestions).
> 
> Also, there is some occasionally varying stacks depending on the corruption, 
> so this stack just now (another execution of test3.c);

I am unable to reproduce any of those here. What's is the output of
/proc/cmdline in your guest when this happens?


Re: [PATCH v11 03/15] tracing: Split up onmatch action data

2019-01-10 Thread Namhyung Kim
Hi Tom,

On Wed, Jan 09, 2019 at 01:49:10PM -0600, Tom Zanussi wrote:
> From: Tom Zanussi 
> 
> Currently, the onmatch action data binds the onmatch action to data
> related to synthetic event generation.  Since we want to allow the
> onmatch handler to potentially invoke a different action, and because
> we expect other handlers to generate synthetic events, we need to
> separate the data related to these two functions.
> 
> Also rename the onmatch data to something more descriptive, and create
> and use common action data destroy function.
> 
> Signed-off-by: Tom Zanussi 
> ---
[SNIP]
>  
> -static void onmax_destroy(struct action_data *data)
> +static void action_data_destroy(struct action_data *data)
>  {
>   unsigned int i;
>  
> - destroy_hist_field(data->onmax.max_var, 0);
> - destroy_hist_field(data->onmax.var, 0);
> + lockdep_assert_held(_mutex);
>  
> - kfree(data->onmax.var_str);
>   kfree(data->action_name);
>  
>   for (i = 0; i < data->n_params; i++)
>   kfree(data->params[i]);
>  
> + if (data->synth_event)
> + data->synth_event->ref--;
> +

I was wondering about the missing synth_event_mutex used to guard the
refcount.  Then I noticed that I totally missed Masami's dynamic event
work which removed it.  Nice job..

Thanks,
Namhyung


>   kfree(data);
>  }
>  
> +static void onmax_destroy(struct action_data *data)
> +{
> + destroy_hist_field(data->onmax.max_var, 0);
> + destroy_hist_field(data->onmax.var, 0);
> +
> + kfree(data->onmax.var_str);
> +
> + action_data_destroy(data);
> +}


Re: [PATCH] net/core/neighbour: tell kmemleak about hash tables

2019-01-10 Thread Konstantin Khlebnikov
On Thu, Jan 10, 2019 at 11:45 PM Cong Wang  wrote:
>
> On Tue, Jan 8, 2019 at 1:30 AM Konstantin Khlebnikov
>  wrote:
> > @@ -443,12 +444,14 @@ static struct neigh_hash_table 
> > *neigh_hash_alloc(unsigned int shift)
> > ret = kmalloc(sizeof(*ret), GFP_ATOMIC);
> > if (!ret)
> > return NULL;
> > -   if (size <= PAGE_SIZE)
> > +   if (size <= PAGE_SIZE) {
> > buckets = kzalloc(size, GFP_ATOMIC);
> > -   else
> > +   } else {
> > buckets = (struct neighbour __rcu **)
> >   __get_free_pages(GFP_ATOMIC | __GFP_ZERO,
> >get_order(size));
> > +   kmemleak_alloc(buckets, size, 0, GFP_ATOMIC);
>
> Why min_count is 0 rather than 1 here?

The api isn't clear and I've misread description.
So it should be 1 for reporting leak of hash table itself.
But 0 doesn't add any new issues.


Re: [PATCH 3/3] bitops.h: set_mask_bits() to return old value

2019-01-10 Thread Anthony Yznaga



On 1/10/19 4:26 PM, Vineet Gupta wrote:
> | > Also, set_mask_bits is used in fs quite a bit and we can possibly come up
> | > with a generic llsc based implementation (w/o the cmpxchg loop)
> |
> | May I also suggest changing the return value of set_mask_bits() to old.
> |
> | You can compute the new value given old, but you cannot compute the old
> | value given new, therefore old is the better return value. Also, no
> | current user seems to use the return value, so changing it is without
> | risk.
>
> Link: 
> http://lkml.kernel.org/g/20150807110955.gh16...@twins.programming.kicks-ass.net
> Suggested-by: Peter Zijlstra 
> Cc: Miklos Szeredi 
> Cc: Ingo Molnar 
> Cc: Jani Nikula 
> Cc: Chris Wilson 
> Cc: Andrew Morton 
> Cc: Will Deacon 
> Signed-off-by: Vineet Gupta 
>

Reviewed-by: Anthony Yznaga 


Re: [PATCH 1/3] coredump: Replace opencoded set_mask_bits()

2019-01-10 Thread Anthony Yznaga



On 1/10/19 4:26 PM, Vineet Gupta wrote:
> Cc: Alexander Viro 
> Cc: Peter Zijlstra (Intel) 
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Link: http://lkml.kernel.org/g/20150807115710.ga16...@redhat.com
> Acked-by: Oleg Nesterov 
> Signed-off-by: Vineet Gupta 

Reviewed-by: Anthony Yznaga 



Re: [PATCH 2/3] fs: inode_set_flags() replace opencoded set_mask_bits()

2019-01-10 Thread Anthony Yznaga



On 1/10/19 4:26 PM, Vineet Gupta wrote:
> It seems that 5f16f3225b0624 and 00a1a053ebe5, both with same commitlog
> ("ext4: atomically set inode->i_flags in ext4_set_inode_flags()")
> introduced the set_mask_bits API, but somehow missed not using it in
> ext4 in the end
>
> Also, set_mask_bits is used in fs quite a bit and we can possibly come up
> with a generic llsc based implementation (w/o the cmpxchg loop)
>
> Cc: Alexander Viro 
> Cc: Theodore Ts'o 
> Cc: Peter Zijlstra (Intel) 
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vineet Gupta 
>

Reviewed-by: Anthony Yznaga 


Re: [PATCH] vhost/vsock: fix vhost vsock cid hashing inconsistent

2019-01-10 Thread Jason Wang



On 2019/1/8 下午4:07, Zha Bin wrote:

The vsock core only supports 32bit CID, but the Virtio-vsock spec define
CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
zero. This inconsistency causes one bug in vhost vsock driver. The
scenarios is:

   0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
   object. And hash_min() is used to compute the hash key. hash_min() is
   defined as:
   (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
   That means the hash algorithm has dependency on the size of macro
   argument 'val'.
   0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
   hash_min() to compute the hash key when inserting a vsock object into
   the hash table.
   0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
   to compute the hash key when looking up a vsock for an CID.

Because the different size of the CID, hash_min() returns different hash
key, thus fails to look up the vsock object for an CID.

To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
headers, but explicitly convert u64 to u32 when deal with the hash table
and vsock core.

Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack callers")
Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex
Signed-off-by: Zha Bin 
Reviewed-by: Liu Jiang 
---
  drivers/vhost/vsock.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index bc42d38ae031..3fbc068eaa9b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -642,7 +642,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, 
u64 guest_cid)
hash_del_rcu(>hash);
  
  	vsock->guest_cid = guest_cid;

-   hash_add_rcu(vhost_vsock_hash, >hash, guest_cid);
+   hash_add_rcu(vhost_vsock_hash, >hash, vsock->guest_cid);
mutex_unlock(_vsock_mutex);
  
  	return 0;



Acked-by: Jason Wang 




Re: [RFC PATCH kernel] powerpc/stack_protector: Fix external modules building

2019-01-10 Thread Alexey Kardashevskiy



On 11/01/2019 14:08, Masahiro Yamada wrote:
> On Thu, Jan 10, 2019 at 2:44 PM Alexey Kardashevskiy  wrote:
>>
>> c3ff2a519 "powerpc/32: add stack protector support" addes stack protector
>> support so now powerpc's "prepare" target depends on prepare0 (via
>> stack_protector_prepare target).
>>
>> It works fine until we try build an external module where it fails with:
>> Run: 'make -j128 SYSSRC=/home/aik/p/kernel 
>> SYSOUT=/home/aik/pbuild/kernel-le-pseries/ ARCH=powerpc'
>> make[1]: Entering directory '/home/aik/p/kernel'
>> make[2]: Entering directory '/home/aik/pbuild/kernel-le-pseries'
>> make[2]: *** No rule to make target 'prepare0', needed by 
>> 'stack_protector_prepare'.  Stop.
>>
>> The reason for that is that the main Linux Makefile defines "prepare0"
>> only if KBUILD_EXTMOD=="".
>>
>> This hacks powerpc's Makefile to make external modules build again.
>>
>> Fixes: c3ff2a519 "powerpc/32: add stack protector support"
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>
>>
>> It has been suggested that there is a better way of fixing this hence RFC.
>>
>>
>> ---
>>  arch/powerpc/Makefile | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
>> index 488c9ed..0492f62 100644
>> --- a/arch/powerpc/Makefile
>> +++ b/arch/powerpc/Makefile
>> @@ -419,7 +419,11 @@ archheaders:
>>  ifdef CONFIG_STACKPROTECTOR
>>  prepare: stack_protector_prepare
>>
>> +ifeq ($(KBUILD_EXTMOD),)
>>  stack_protector_prepare: prepare0
>> +else
>> +stack_protector_prepare:
>> +endif
> 
> 
> Honestly, I think this is ugly.
> 
> Do you want me to send an alternative solution?

YES! Thanks :)

> 
> 
> 
> 
>>  ifdef CONFIG_PPC64
>> $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk 
>> '{if ($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h))
>>  else
>> --
>> 2.17.1
>>
> 
> 

-- 
Alexey


Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Andy Lutomirski



> On Jan 10, 2019, at 8:04 PM, Dave Chinner  wrote:
> 
>> On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote:
>>> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner  wrote:
>>> 
 On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote:
 And we *can* do sane things about RWF_NOWAIT. For example, we could
 start async IO on RWF_NOWAIT, and suddenly it would go from "probe the
 page cache" to "probe and fill", and be much harder to use as an
 attack vector..
>>> 
>>> We can only do that if the application submits the read via AIO and
>>> has an async IO completion reporting mechanism.
>> 
>> Oh, no, you misunderstand.
>> 
>> RWF_NOWAIT has a lot of situations where it will potentially return
>> early (the DAX and direct IO ones have their own), but I was thinking
>> of the one in generic_file_buffered_read(), which triggers when you
>> don't find a page mapping. That looks like the obvious "probe page
>> cache" case.
>> 
>> But we could literally move that test down just a few lines. Let it
>> start read-ahead.
>> 
>> .. and then it will actually trigger on the *second* case instead, where we 
>> have
>> 
>>if (!PageUptodate(page)) {
>>if (iocb->ki_flags & IOCB_NOWAIT) {
>>put_page(page);
>>goto would_block;
>>}
>> 
>> and that's where RWF_MNOWAIT would act.
>> 
>> It would still return EAGAIN.
>> 
>> But it would have started filling the page cache. So now the act of
>> probing would fill the page cache, and the attacker would be left high
>> and dry - the fact that the page cache now exists is because of the
>> attack, not because of whatever it was trying to measure.
>> 
>> See?
> 
> Except for fadvise(POSIX_FADV_RANDOM) which triggers this code in
> page_cache_sync_readahead():
> 
>/* be dumb */
>if (filp && (filp->f_mode & FMODE_RANDOM)) {
>force_page_cache_readahead(mapping, filp, offset, req_size);
>return;
>}
> 
> So it will only read the single page we tried to access and won't
> perturb the rest of the message encoded into subsequent pages in
> file.
> 

There are two types of attacks.  One is an intentional side channel where two 
cooperating processes communicate. This is, under some circumstances, a 
problem, but it’s not one we’re about to solve in general. The other is an 
attacker monitoring an unwilling process. I think we care a lot more about 
that, and Linus’ idea will help.

Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 06:18:16PM -0800, Linus Torvalds wrote:
> On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner  wrote:
> >
> > On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote:
> > > And we *can* do sane things about RWF_NOWAIT. For example, we could
> > > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the
> > > page cache" to "probe and fill", and be much harder to use as an
> > > attack vector..
> >
> > We can only do that if the application submits the read via AIO and
> > has an async IO completion reporting mechanism.
> 
> Oh, no, you misunderstand.
> 
> RWF_NOWAIT has a lot of situations where it will potentially return
> early (the DAX and direct IO ones have their own), but I was thinking
> of the one in generic_file_buffered_read(), which triggers when you
> don't find a page mapping. That looks like the obvious "probe page
> cache" case.
> 
> But we could literally move that test down just a few lines. Let it
> start read-ahead.
> 
> .. and then it will actually trigger on the *second* case instead, where we 
> have
> 
> if (!PageUptodate(page)) {
> if (iocb->ki_flags & IOCB_NOWAIT) {
> put_page(page);
> goto would_block;
> }
> 
> and that's where RWF_MNOWAIT would act.
> 
> It would still return EAGAIN.
> 
> But it would have started filling the page cache. So now the act of
> probing would fill the page cache, and the attacker would be left high
> and dry - the fact that the page cache now exists is because of the
> attack, not because of whatever it was trying to measure.
> 
> See?

Except for fadvise(POSIX_FADV_RANDOM) which triggers this code in
page_cache_sync_readahead():

/* be dumb */
if (filp && (filp->f_mode & FMODE_RANDOM)) {
force_page_cache_readahead(mapping, filp, offset, req_size);
return;
}

So it will only read the single page we tried to access and won't
perturb the rest of the message encoded into subsequent pages in
file.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


[PATCH net V3] vhost: log dirty page correctly

2019-01-10 Thread Jason Wang
Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim 
Cc: Jintack Lim 
Signed-off-by: Jason Wang 
---
Changes from V2:
- check and log the case of range overlap
- remove unnecessary u64 cast
- use smp_wmb() for the case of device IOTLB as well
Changes from V1:
- return error instead of warn
---
 drivers/vhost/net.c   |  3 +-
 drivers/vhost/vhost.c | 88 ---
 drivers/vhost/vhost.h |  3 +-
 3 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 36f3d0f49e60..bca86bf7189f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1236,7 +1236,8 @@ static void handle_rx(struct vhost_net *net)
if (nvq->done_idx > VHOST_NET_BATCH)
vhost_net_signal_used(nvq);
if (unlikely(vq_log))
-   vhost_log_write(vq, vq_log, log, vhost_len);
+   vhost_log_write(vq, vq_log, log, vhost_len,
+   vq->iov, in);
total_len += vhost_len;
if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
vhost_poll_queue(>poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9f7942cbcbb2..55a2e8f9f8ca 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1733,13 +1733,78 @@ static int log_write(void __user *log_base,
return r;
 }
 
+static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
+{
+   struct vhost_umem *umem = vq->umem;
+   struct vhost_umem_node *u;
+   u64 start, end;
+   int r;
+   bool hit = false;
+
+   /* More than one GPAs can be mapped into a single HVA. So
+* iterate all possible umems here to be safe.
+*/
+   list_for_each_entry(u, >umem_list, link) {
+   if (u->userspace_addr > hva - 1 + len ||
+   u->userspace_addr - 1 + u->size < hva)
+   continue;
+   start = max(u->userspace_addr, hva);
+   end = min(u->userspace_addr - 1 + u->size, hva - 1 + len);
+   r = log_write(vq->log_base,
+ u->start + start - u->userspace_addr,
+ end - start + 1);
+   if (r < 0)
+   return r;
+   hit = true;
+   }
+
+   if (!hit)
+   return -EFAULT;
+
+   return 0;
+}
+
+static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
+{
+   struct iovec iov[64];
+   int i, ret;
+
+   if (!vq->iotlb)
+   return log_write(vq->log_base, vq->log_addr + used_offset, len);
+
+   ret = translate_desc(vq, (uintptr_t)vq->used + used_offset,
+len, iov, 64, VHOST_ACCESS_WO);
+   if (ret)
+   return ret;
+
+   for (i = 0; i < ret; i++) {
+   ret = log_write_hva(vq, (uintptr_t)iov[i].iov_base,
+   iov[i].iov_len);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-   unsigned int log_num, u64 len)
+   unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
int i, r;
 
/* Make sure data written is seen before log. */
smp_wmb();
+
+   if (vq->iotlb) {
+   for (i = 0; i < count; i++) {
+   r = log_write_hva(vq, (uintptr_t)iov[i].iov_base,
+ iov[i].iov_len);
+   if (r < 0)
+   return r;
+   }
+   return 0;
+   }
+
for (i = 0; i < log_num; ++i) {
u64 l = min(log[i].len, len);
r = log_write(vq->log_base, log[i].addr, l);
@@ -1769,9 +1834,8 @@ static int vhost_update_used_flags(struct vhost_virtqueue 
*vq)
smp_wmb();
/* Log used flag write. */
used = >used->flags;
-   log_write(vq->log_base, vq->log_addr +
- (used - (void 

Re: [PATCH net V2] vhost: log dirty page correctly

2019-01-10 Thread Jason Wang



On 2019/1/10 下午10:07, Michael S. Tsirkin wrote:

On Thu, Jan 10, 2019 at 08:37:17PM +0800, Jason Wang wrote:

On 2019/1/9 下午10:25, Michael S. Tsirkin wrote:

On Wed, Jan 09, 2019 at 03:29:47PM +0800, Jason Wang wrote:

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
 get HVA, for writable descriptor, get HVA through iovec. For used
 ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
 through GPA. Pay attention this reverse mapping is not guaranteed
 to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
Reported-by: Jintack Lim
Cc: Jintack Lim
Signed-off-by: Jason Wang
---
The patch is needed for stable.
Changes from V1:
- return error instead of warn
---
   drivers/vhost/net.c   |  3 +-
   drivers/vhost/vhost.c | 82 +++
   drivers/vhost/vhost.h |  3 +-
   3 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 36f3d0f49e60..bca86bf7189f 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1236,7 +1236,8 @@ static void handle_rx(struct vhost_net *net)
if (nvq->done_idx > VHOST_NET_BATCH)
vhost_net_signal_used(nvq);
if (unlikely(vq_log))
-   vhost_log_write(vq, vq_log, log, vhost_len);
+   vhost_log_write(vq, vq_log, log, vhost_len,
+   vq->iov, in);
total_len += vhost_len;
if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) {
vhost_poll_queue(>poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 9f7942cbcbb2..ee095f08ffd4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1733,11 +1733,70 @@ static int log_write(void __user *log_base,
return r;
   }
+static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
+{
+   struct vhost_umem *umem = vq->umem;
+   struct vhost_umem_node *u;
+   u64 gpa;
+   int r;
+   bool hit = false;
+
+   list_for_each_entry(u, >umem_list, link) {
+   if (u->userspace_addr < hva &&
+   u->userspace_addr + u->size >=
+   hva + len) {

So this tries to see that the GPA range is completely within
the GVA region. Does this have to be the case?

You mean e.g a buffer that crosses the boundary of two memory regions?

Yes - where hva and gva could be contigious.



Ok, let me add the overlap range logging in v3.






And if yes why not return 0 below instead of hit = true?

I think it's safe but not sure for the case like two GPAs can map to same
HVA?

Oh I see. Yes that's possible. Document the motivation?



Ok.

Thanks



Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables

2019-01-10 Thread Ganapatrao Kulkarni
Hi Shameer,

Patch looks OK to me, please feel free to add,
Reviewed-by: Ganapatrao Kulkarni 

On Thu, Dec 13, 2018 at 5:25 PM Marc Zyngier  wrote:
>
> On 13/12/2018 10:59, Shameer Kolothum wrote:
> > From: Shanker Donthineni 
> >
> > The NUMA node information is visible to ITS driver but not being used
> > other than handling hardware errata. ITS/GICR hardware accesses to the
> > local NUMA node is usually quicker than the remote NUMA node. How slow
> > the remote NUMA accesses are depends on the implementation details.
> >
> > This patch allocates memory for ITS management tables and command
> > queue from the corresponding NUMA node using the appropriate NUMA
> > aware functions. This change improves the performance of the ITS
> > tables read latency on systems where it has more than one ITS block,
> > and with the slower inter node accesses.
> >
> > Apache Web server benchmarking using ab tool on a HiSilicon D06
> > board with multiple numa mem nodes shows Time per request and
> > Transfer rate improvements of ~3.6% with this patch.
> >
> > Signed-off-by: Shanker Donthineni 
> > Signed-off-by: Hanjun Guo 
> > Signed-off-by: Shameer Kolothum 
> > ---
> >
> > This is to revive the patch originally sent by Shanker[1] and
> > to back it up with a benchmark test. Any further testing of
> > this is most welcome.
> >
> > v2-->v3
> >  -Addressed comments to use page_address().
> >  -Added Benchmark results to commit log.
> >  -Removed T-by from Ganapatrao for now.
> >
> > v1-->v2
> >  -Edited commit text.
> >  -Added Ganapatrao's tested-by.
> >
> > Benchmark test details:
> > 
> > Test Setup:
> > -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> > -ITS belongs to numa node 0.
> > -Filesystem mounted on a PCIe NVMe based disk.
> > -Apache server installed on D06.
> > -Running ab benchmark test in concurrency mode from a remote m/c
> >  connected to D06 via  hns3(PCIe) n/w port.
> >  "ab -k -c 750 -n 200 http://10.202.225.188/;
> >
> > Test results are avg. of 15 runs.
> >
> > For 4.20-rc1  Kernel,
> > 
> > Time per request(mean, concurrent)  = 0.02753[ms]
> > Transfer Rate = 416501[Kbytes/sec]
> >
> > For 4.20-rc1 +  this patch,
> > --
> > Time per request(mean, concurrent)  = 0.02653[ms]
> > Transfer Rate = 431954[Kbytes/sec]
> >
> > % improvement ~3.6%
> >
> > vmstat shows around 170K-200K interrupts per second.
> >
> > ~# vmstat 1 -w
> > procs ---memory-- -  -system--
> >  r  b swpd freein
> >  5  00 30166724  102794
> >  9  00 30141828  171148
> >  5  00 30150160  207185
> > 13  00 30145924  175691
> > 15  00 30140792  145250
> > 13  00 30135556  201879
> > 13  00 30134864  192391
> > 10  00 30133632  168880
> > 
> >
> > [1] https://patchwork.kernel.org/patch/989/
>
> The figures certainly look convincing. I'd need someone from Cavium to
> benchmark it on their hardware and come back with results so that we can
> make a decision on this.

Hi Marc,
My setup got altered during Lab migration from Cavium to Marvell office.
I don't think, i will have same setup anytime soon.

>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...

Thanks,
Ganapat


linux-next: Signed-off-by missing for commit in the net tree

2019-01-10 Thread Stephen Rothwell
Hi all,

Commit

  b19bce0335e2 ("net: ethernet: mediatek: fix warning in phy_start_aneg")

is missing a Signed-off-by from its author.

-- 
Cheers,
Stephen Rothwell


pgpbjtPwb3cSd.pgp
Description: OpenPGP digital signature


Re: [PATCH v1 7/7] arm64: dts: sdm845: wireup the thermal trip points to cpufreq

2019-01-10 Thread Viresh Kumar
On 10-01-19, 10:42, Matthias Kaehlcke wrote:
> Thanks for the pointer, there's always something new to learn!
> 
> Ok, so the policy CPU and hence the CPU registered as cooling
> device may vary. I understand that this requires to list all possible
> cooling devices,

I won't say that I changed DT because of a design issue with kernel,
rather the DT shall be complete by itself and that's why that change
was made.

And then we can have more things going on. For example with cpuidle
cooling, we can individually control each CPU (and force idle on that)
even if all CPUs are part of the same freq-domain. Each CPU shall
expose its capabilities.

> even though only one will be active at any given
> time. However I wonder if we could change this:

I won't say it that way. I see it as all the CPUs are active during a
cooling state, i.e. they are all participating.
 
> >From 103703a46495ff210a521b5b6fbf32632053c64f Mon Sep 17 00:00:00 2001
> From: Matthias Kaehlcke 
> Date: Thu, 10 Jan 2019 09:48:38 -0800
> Subject: [PATCH] thermal: cpu_cooling: always use first CPU of a freq domain
>  as cooling device
> 
> For all CPUs of a frequency domain a single cooling device is
> registered, since the CPUs can't switch their frequencies
> independently from each other. The cpufreq policy CPU is used to
> represent cooling device of the frequency domain. Which CPU is the
> policy CPU may vary based on the order of initialization or CPU
> hotplug.
> 
> For device tree based platform the above implies that cooling maps
> must include a list of all possible cooling devices of a frequency
> domain, even though only one of them will exist at any given time.
> 
> For example:
> 
> cooling-maps {
>   map0 {
>   trip = <_alert0>;
>   cooling-device = < THERMAL_NO_LIMIT 4>,
>< THERMAL_NO_LIMIT 4>,
>< THERMAL_NO_LIMIT 4>,
>< THERMAL_NO_LIMIT 4>;
>   };
>   map1 {
>   trip = <_crit0>;
>   cooling-device = < THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>< THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>< THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
>< THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;

This is the right thing to do hardware description wise, no matter
what the kernel does.

>   };
> };
> 
> This can be avoided by using always the first CPU of a frequency
> domain as cooling device. It may happen that the first CPU is offline
> when the cooling device is registered (e.g. CPU2 is initialized
> first in the above example), hence the nominal cooling device might
> be offline. This may seem odd, however it is not really different from
> the current behavior: when the policy CPU is taking offline the cooling
> device corresponding to it remains active, unless it is unregistered
> because all other CPUs of the frequency domain are offline too.
> 
> A single cooling device associated with a specific CPU of the frequency
> domain reduces redundant device tree clutter in CPU nodes and cooling
> maps.
> 
> Signed-off-by: Matthias Kaehlcke 
> ---
>  drivers/thermal/cpu_cooling.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index dfd23245f778a..bb5ea06f893a2 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -758,13 +758,14 @@ EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
>  struct thermal_cooling_device *
>  of_cpufreq_cooling_register(struct cpufreq_policy *policy)
>  {
> - struct device_node *np = of_get_cpu_node(policy->cpu, NULL);
> + unsigned int first_cpu = cpumask_first(policy->related_cpus);
> + struct device_node *np = of_get_cpu_node(first_cpu, NULL);
>   struct thermal_cooling_device *cdev = NULL;
>   u32 capacitance = 0;
>  
>   if (!np) {
>   pr_err("cpu_cooling: OF node not available for cpu%d\n",
> -policy->cpu);
> +first_cpu);
>   return NULL;
>   }
>  
> @@ -775,7 +776,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy)
>   cdev = __cpufreq_cooling_register(np, policy, capacitance);
>   if (IS_ERR(cdev)) {
>   pr_err("cpu_cooling: cpu%d is not running as cooling 
> device: %ld\n",
> -policy->cpu, PTR_ERR(cdev));
> +first_cpu, PTR_ERR(cdev));
>   cdev = NULL;
>   }
>   }
> 
> 
> Would that make sense or is there something I'm overlooking?

I don't see any benefits of this to be honest. Even if we make this
change, the DT should remain in its current form.

-- 
viresh


linux-next: Tree for Jan 11

2019-01-10 Thread Stephen Rothwell
Hi all,

Changes since 20190110:

New tree: cpufreq-arm

The vfs tree still had its build failure for which I applied a patch.

The drm-misc tree gained a conflict against Linus' tree.

The akpm-current tree gained conflicts against Linus' and the security
trees.

Non-merge commits (relative to Linus' tree): 1583
 1598 files changed, 47069 insertions(+), 24276 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 295 trees (counting Linus' and 69 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (1bdbe2274920 Merge tag 'vfio-v5.0-rc2' of 
git://github.com/awilliam/linux-vfio)
Merging fixes/master (bd88a29b15f8 Merge branch 'scsi-fallthru')
Merging kbuild-current/fixes (4064e47c8281 Merge tag 'csky-for-linus-5.0-rc1' 
of git://github.com/c-sky/csky-linux)
Merging arc-current/for-curr (5fac3149be6f ARC: adjust memblock_reserve of 
kernel memory)
Merging arm-current/fixes (c2a3831df6dc ARM: 8816/1: dma-mapping: fix potential 
uninitialized return)
Merging arm64-fixes/for-next/fixes (b89d82ef01b3 arm64: kpti: Avoid rewriting 
early page tables when KASLR is enabled)
Merging m68k-current/for-linus (bed1369f5190 m68k: Fix memblock-related crashes)
Merging powerpc-fixes/fixes (bfeffd155283 Linux 5.0-rc1)
Merging sparc/master (b71acb0e3721 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (001e465f09a1 bonding: update nest level on unlink)
Merging bpf/master (beaf3d1901f4 bpf: fix panic in stack_map_get_build_id() on 
i386 and arm32)
Merging ipsec/master (dd9ee3444014 vti4: Fix a ipip packet processing bug in 
'IPCOMP' virtual tunnel)
Merging netfilter/master (a799aea0988e netfilter: nft_flow_offload: Fix reverse 
route lookup)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (bfeffd155283 Linux 5.0-rc1)
Merging mac80211/master (1d51b4b1d3f2 Merge tag 'm68k-for-v4.20-tag2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k)
Merging rdma-fixes/for-rc (917cb8a72a94 RDMA/cma: Add cm_id restrack resource 
based on kernel or user cm_id type)
Merging sound-current/for-linus (d1dd42110d27 ALSA: hda/realtek - Disable 
headset Mic VREF for headset mode of ALC225)
Merging sound-asoc-fixes/for-linus (d9c51542207a Merge branch 'asoc-5.0' into 
asoc-linus)
Merging regmap-fixes/for-linus (1cd824361eed Merge branch 'regmap-4.21' into 
regmap-5.0)
Merging regulator-fixes/for-linus (0ab66b3c326e regulator: max77620: Initialize 
values for DT properties)
Merging spi-fixes/for-linus (aa54c1c9d90e spi: fix initial SPI_SR value in 
spi-fsl-dspi)
Merging pci-current/for-linus (a3869d43c980 PCI: amlogic: Fix build failure due 
to missing gpio header)
Merging driver-core.current/driver-core-linus (735df0ff6ece Documentation: 
driver core: remove use of BUS_ATTR)
Merging tty.current/tty-linus (d3a28a53630e serial: lantiq: Do not swap 
register read/writes)
Merging usb.current/usb-linus (b9fcb0e6b705 usb: storage: Remove outdated URL 
from MAINTAINERS)
Merging usb-gadget-fixes/fixes (069caf5950df USB: omap_udc: fix rejection of 
out transfers when DMA is used)
Merging usb-serial-fixes/usb-linus (b81c2c33eab7

[PATCH v2] fsck.f2fs: check validity of nat journal

2019-01-10 Thread Chao Yu
As reported by Aravind:

I built f2fs tools from source (at tag v1.12.0) and was able to get this 
backtrace in gdb:

Program received signal SIGSEGV, Segmentation fault.
0x77f8eb54 in f2fs_set_bit (nr=1041170432,
addr=0x7f621010 ) at libf2fs.c:312
312mask = 1 << (7 - (nr & 0x07));
(gdb) where
addr=0x7f621010 ) at libf2fs.c:312

> [ 5338.040024] nats:8781, sits:6
> [ 5338.040027] F2FS-fs (sda2): Failed to initialize F2FS segment manager
> [ 5338.128893] nats:8781, sits:6
> [ 5338.128895] F2FS-fs (sda2): Failed to initialize F2FS segment manager

nat_count/nid/blkaddr recorded in journal may be corrupted, let's do
sanity check on them, skip loading invalid ones during build_node_manager().

Reported-by: Aravind R S 
Signed-off-by: Chao Yu 
---
v2:
- truncate journal->n_nats if it exceeds max nat count in journal
- check blkaddr as well.
 fsck/mount.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/fsck/mount.c b/fsck/mount.c
index 3966525104a7..72f37a365a13 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -1066,11 +1066,27 @@ static int f2fs_init_nid_bitmap(struct f2fs_sb_info 
*sbi)
f2fs_set_bit(nid, nm_i->nid_bitmap);
}
 
+   if (nats_in_cursum(journal) > NAT_JOURNAL_ENTRIES) {
+   MSG(0, "\tError: f2fs_init_nid_bitmap truncate n_nats(%u) to "
+   "NAT_JOURNAL_ENTRIES(%lu)\n",
+   nats_in_cursum(journal), NAT_JOURNAL_ENTRIES);
+   journal->n_nats = cpu_to_le16(NAT_JOURNAL_ENTRIES);
+   }
+
for (i = 0; i < nats_in_cursum(journal); i++) {
block_t addr;
 
addr = le32_to_cpu(nat_in_journal(journal, i).block_addr);
+   if (!IS_VALID_BLK_ADDR(sbi, addr)) {
+   MSG(0, "\tError: f2fs_init_nid_bitmap: addr(%u) is 
invalid!!!\n", addr);
+   continue;
+   }
+
nid = le32_to_cpu(nid_in_journal(journal, i));
+   if (!IS_VALID_NID(sbi, nid)) {
+   MSG(0, "\tError: f2fs_init_nid_bitmap: nid(%u) is 
invalid!!!\n", nid);
+   continue;
+   }
if (addr != NULL_ADDR)
f2fs_set_bit(nid, nm_i->nid_bitmap);
}
-- 
2.18.0.rc1



Re: [PATCH] x86/boot: drop memset from copy.S

2019-01-10 Thread Cao jin
On 1/8/19 4:46 PM, Cao jin wrote:
> One more question: in compressed/, for mem*(), it seems we both use the
> macros of boot/string.h, and the functions of compressed/string.c. Is
> that what we want?
> 
> compressed/ is compiled with -O2, so it cannot be told by objdump -d,
> but still can be confirmed by nm <*.o>, for example:
> 
> $nm arch/x86/boot/compressed/eboot.o
>  U memcpy
>  U memset
> 
> $nm arch/x86/boot/compressed/pgtable_64.o
>  # No entry of mem*()
> 
> both of eboot.c and pgtable_64.c #include "../string.h", and use some of
>  mem*(), it is counter-intuitive to me. Very appreciate it someone can
> leave a hint.
> 

Well, I think HPA's previous answer is also suitable for this question,
with -O2, sometimes __builtin_mem*() is optimized as inline code, while
sometimes just emit a call to corresponding self-defined mem*() functions.

-- 
Sincerely,
Cao jin




[PATCH] mm/gup: fix gup_pmd_range() for dax

2019-01-10 Thread Yu Zhao
For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns
true on x86. So the function works as long as hugetlb is configured.
However, dax doesn't depend on hugetlb.

Signed-off-by: Yu Zhao 
---
 mm/gup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 05acd7e2eb22..75029649baca 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1674,7 +1674,8 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, 
unsigned long end,
if (!pmd_present(pmd))
return 0;
 
-   if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+   if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
+pmd_devmap(pmd))) {
/*
 * NUMA hinting faults need to be handled in the GUP
 * slowpath for accounting purposes and so that they
-- 
2.20.1.97.g81188d93c3-goog



Re: [PATCH ghak90 (was ghak32) V4 06/10] audit: add containerid support for tty_audit

2019-01-10 Thread Richard Guy Briggs
On 2019-01-10 20:12, Paul Moore wrote:
> On Thu, Jan 10, 2019 at 5:59 PM Richard Guy Briggs  wrote:
> > On 2019-01-03 15:11, Paul Moore wrote:
> > > On Wed, Oct 31, 2018 at 5:17 PM Richard Guy Briggs  
> > > wrote:
> > > > On 2018-10-19 19:17, Paul Moore wrote:
> > > > > On Sun, Aug 5, 2018 at 4:33 AM Richard Guy Briggs
> > >  wrote:
> > > > > > Add audit container identifier auxiliary record to tty logging rule
> > > > > > event standalone records.
> > > > > >
> > > > > > Signed-off-by: Richard Guy Briggs 
> > > > > > Acked-by: Serge Hallyn 
> > > > > > ---
> > > > > >  drivers/tty/tty_audit.c | 5 -
> > > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
> > > > > > index 50f567b..3e21477 100644
> > > > > > --- a/drivers/tty/tty_audit.c
> > > > > > +++ b/drivers/tty/tty_audit.c
> > > > > > @@ -66,8 +66,9 @@ static void tty_audit_log(const char 
> > > > > > *description, dev_t dev,
> > > > > > uid_t uid = from_kuid(init_user_ns, task_uid(tsk));
> > > > > > uid_t loginuid = from_kuid(init_user_ns, 
> > > > > > audit_get_loginuid(tsk));
> > > > > > unsigned int sessionid = audit_get_sessionid(tsk);
> > > > > > +   struct audit_context *context = 
> > > > > > audit_alloc_local(GFP_KERNEL);
> > > > > >
> > > > > > -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_TTY);
> > > > > > +   ab = audit_log_start(context, GFP_KERNEL, AUDIT_TTY);
> > > > > > if (ab) {
> > > > > > char name[sizeof(tsk->comm)];
> > > > > >
> > > > > > @@ -80,6 +81,8 @@ static void tty_audit_log(const char 
> > > > > > *description, dev_t dev,
> > > > > > audit_log_n_hex(ab, data, size);
> > > > > > audit_log_end(ab);
> > > > > > }
> > > > > > +   audit_log_contid(context, "tty", audit_get_contid(tsk));
> > > > > > +   audit_free_context(context);
> > > > > >  }
> > > > >
> > > > > Since I never polished up my task_struct/current fix patch enough to
> > > > > get it past RFC status during this development window (new job, stolen
> > > > > laptop, etc.) *and* it looks like you are going to need at least one
> > > > > more respin of this patchset, go ahead and fix this patch to use
> > > > > current instead of generating a local context.  I'll deal with the
> > > > > merge fallout if/when it happens.
> > > >
> > > > Sure, I will switch it to current in the call to audit_get_contid().
> > > >
> > > > The local context is a distinct issue.  Like USER records, I prefer
> > > > local due to potential record volume, it is already trackable as far as
> > > > Steve is concerned, and if it is to be connected with the syscall
> > > > record, it should be linked to syscall records in a seperate new github
> > > > issue with its own patch.  It accumulates events until the buffer is
> > > > flushed to a record, so the timestamp only represents the flush (usually
> > > > user "CR/enter").
> > >
> > > Generally, yes, associating records is a separate issue, but in this
> > > particular case you are changing this record by making it a "local"
> > > record, which as we've discussed before, I view as a necessary evil
> > > and something that must be minimized.  A quick look at the
> > > tty_audit_log() callers shows tty_audit_tiocsti() which is an ioctl
> > > handler (and thus current should be valid and correct), and
> > > tty_audit_buf_push() whose callers all seem have valid and correct
> > > current values; if you find that not to be the case please let me
> > > know.
> >
> > Unless I'm misunderstanding what "local" means, it already had a local
> > context by virtue of having a NULL context since it was never previously
> > connected to syscall events, so changing it to a local context doesn't
> > change that other than making it possible to associate an auxiliary
> > audit container identifier record.
> >
> > The reasoning I'm also applying here is that the contents of this record
> > don't all come from this one syscall, but most likely came in from an
> > entire line of individual keystrokes, so the syscall information is only
> > from the last one of those syscalls, though that syscall information
> > other than the timestamp should be the same.
> 
> Looking at the callers to tty_audit_log(), I think we can all agree
> that in the tty_audit_tiocsti() case it is correct to associate the
> tty record with current, as it is the current task which sent the
> ioctl with the data.  Do you not agree?

I'm fine with that, yes.

> With tty_audit_buf_push() we need to do a bit more work to track down
> all the callers.  Looking quickly it appears that all of the
> tty_audit_add_data() callers are copying data to/from userspace, so
> associating these tty records with their syscall would seem
> appropriate.  With tty_audit_push() it either appears to be
> tty_audit_tiocsti() (again) or more userspace copy routines.  I didn't
> bother looking at 

Re: [PATCH] staging: vboxvideo: vbox_main: Remove unnecessary local variable

2019-01-10 Thread Sidong Yang
On Thu, Jan 10, 2019 at 10:44:08PM +0300, Dan Carpenter wrote:
> On Thu, Jan 10, 2019 at 05:00:24PM +, Sidong Yang wrote:
> > I think you just point out that my code isn't obvious because the
> > function returns negative error codes. I agree with you. But what if
> > change my code like if(hgsmi_query_conf() != 0). 
> > 
> 
> That's even worse!  :P
> 
> You should do comparisons with zero when you are talking about zero
> meaning the number zero.  In this case, hgsmi_query_conf() returns "zezro
> meaning success" not "zero meaning the number zero".  How many bytes?
> Zero.  That is the number zero.
> 
> != zero is a double negative, because NOT and ZERO are negatives.  If
> double negatives simplified the code we would add four of them instead
> of just the one:
> 
>   if hgsmi_query_conf() != 0) != 0) != 0) != 0) {
> 
> See?  Adding != 0 doesn't make it simpler...
> 
> The other place where != 0 is appropriate besides talking about the
> number is when you're using a strcmp() function because it works like
> this:
> 
>   if (strcmp(a, b) < 0) {  <-- means a < b
>   if (strcmp(a, b) == 0) { <-- means a == b
>   if (strcmp(a, b) != 0) { <-- means a != b
> 
> regards,
> dan carpenter
> 

You're right. that is even worse. I understand and thank you for pointing out.

regards,
Sidong Yang


[PATCH V2] i2c: tegra: Add Bus Clear Master Support

2019-01-10 Thread Sowjanya Komatineni
Bus Clear feature of tegra i2c controller helps to recover from
bus hang when i2c master loses the bus arbitration due to the
slave device holding SDA LOW continuously for some unknown reasons.

Signed-off-by: Sowjanya Komatineni 
---
 drivers/i2c/busses/i2c-tegra.c | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index e417ebf7628c..11bc43ed08e9 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -54,6 +54,7 @@
 #define I2C_FIFO_STATUS_RX_SHIFT   0
 #define I2C_INT_MASK   0x064
 #define I2C_INT_STATUS 0x068
+#define I2C_INT_BUS_CLR_DONE   BIT(11)
 #define I2C_INT_PACKET_XFER_COMPLETE   BIT(7)
 #define I2C_INT_ALL_PACKETS_XFER_COMPLETE  BIT(6)
 #define I2C_INT_TX_FIFO_OVERFLOW   BIT(5)
@@ -96,6 +97,15 @@
 #define I2C_HEADER_MASTER_ADDR_SHIFT   12
 #define I2C_HEADER_SLAVE_ADDR_SHIFT1
 
+#define I2C_BUS_CLEAR_CNFG 0x084
+#define I2C_BC_SCLK_THRESHOLD  9
+#define I2C_BC_SCLK_THRESHOLD_SHIFT16
+#define I2C_BC_STOP_COND   BIT(2)
+#define I2C_BC_TERMINATE   BIT(1)
+#define I2C_BC_ENABLE  BIT(0)
+#define I2C_BUS_CLEAR_STATUS   0x088
+#define I2C_BC_STATUS  BIT(0)
+
 #define I2C_CONFIG_LOAD0x08C
 #define I2C_MSTR_CONFIG_LOAD   BIT(0)
 #define I2C_SLV_CONFIG_LOADBIT(1)
@@ -155,6 +165,8 @@ enum msg_end_type {
  * @has_mst_fifo: The I2C controller contains the new MST FIFO interface that
  * provides additional features and allows for longer messages to
  * be transferred in one go.
+ * @supports_bus_clear: Bus Clear support to recover from bus hang during
+ * SDA stuck low from device for some unknown reasons.
  */
 struct tegra_i2c_hw_feature {
bool has_continue_xfer_support;
@@ -167,6 +179,7 @@ struct tegra_i2c_hw_feature {
bool has_multi_master_mode;
bool has_slcg_override_reg;
bool has_mst_fifo;
+   bool supports_bus_clear;
 };
 
 /**
@@ -640,6 +653,9 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
goto err;
}
 
+   if (i2c_dev->hw->supports_bus_clear && (status & I2C_INT_BUS_CLR_DONE))
+   goto err;
+
if (i2c_dev->msg_read && (status & I2C_INT_RX_FIFO_DATA_REQ)) {
if (i2c_dev->msg_buf_remaining)
tegra_i2c_empty_rx_fifo(i2c_dev);
@@ -668,6 +684,8 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
tegra_i2c_mask_irq(i2c_dev, I2C_INT_NO_ACK | I2C_INT_ARBITRATION_LOST |
I2C_INT_PACKET_XFER_COMPLETE | I2C_INT_TX_FIFO_DATA_REQ |
I2C_INT_RX_FIFO_DATA_REQ);
+   if (i2c_dev->hw->supports_bus_clear)
+   tegra_i2c_mask_irq(i2c_dev, I2C_INT_BUS_CLR_DONE);
i2c_writel(i2c_dev, status, I2C_INT_STATUS);
if (i2c_dev->is_dvc)
dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR, DVC_STATUS);
@@ -678,6 +696,42 @@ static irqreturn_t tegra_i2c_isr(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static int tegra_i2c_issue_bus_clear(struct tegra_i2c_dev *i2c_dev)
+{
+   int err;
+   unsigned long time_left;
+   u32 reg;
+
+   if (i2c_dev->hw->supports_bus_clear) {
+   reinit_completion(_dev->msg_complete);
+   reg = (I2C_BC_SCLK_THRESHOLD << I2C_BC_SCLK_THRESHOLD_SHIFT) |
+ I2C_BC_STOP_COND | I2C_BC_TERMINATE;
+   i2c_writel(i2c_dev, reg, I2C_BUS_CLEAR_CNFG);
+   if (i2c_dev->hw->has_config_load_reg) {
+   err = tegra_i2c_wait_for_config_load(i2c_dev);
+   if (err)
+   return err;
+   }
+   reg |= I2C_BC_ENABLE;
+   i2c_writel(i2c_dev, reg, I2C_BUS_CLEAR_CNFG);
+   tegra_i2c_unmask_irq(i2c_dev, I2C_INT_BUS_CLR_DONE);
+
+   time_left = wait_for_completion_timeout(_dev->msg_complete,
+   TEGRA_I2C_TIMEOUT);
+   if (time_left == 0) {
+   dev_err(i2c_dev->dev, "timed out for bus clear\n");
+   return -ETIMEDOUT;
+   }
+   reg = i2c_readl(i2c_dev, I2C_BUS_CLEAR_STATUS);
+   if (!(reg & I2C_BC_STATUS)) {
+   dev_err(i2c_dev->dev, "Un-recovered Arb lost\n");
+   return -EIO;
+   }
+   }
+
+   return -EAGAIN;
+}
+
 static int tegra_i2c_xfer_msg(struct tegra_i2c_dev *i2c_dev,
struct i2c_msg *msg, enum msg_end_type end_state)
 {
@@ -759,6 +813,12 @@ static int tegra_i2c_xfer_msg(struct 

Re: [PATCH v1 6/7] arm64: dts: sdm845: Increase alert trip point to 95 degrees

2019-01-10 Thread Viresh Kumar
On 10-01-19, 12:00, Matthias Kaehlcke wrote:
> Viresh helped me understand that we currently need to add cooling
> device entries for all CPUs to the DT, even though at most one will be
> active per freq domain at any time (I wonder if this could be changed
> though).

Actually we were only adding cooling-cells in CPU0 until now and I
fixed that, so that is going to stay :)

The idea is that the hardware should be described properly and not
partially. Even if all the CPUs are part of the same freq-domain, they
are all capable of being a cooling device here and the DT should
describe that. Kernel will ofcourse create a single cooling device.

-- 
viresh


Re: [PATCH 0/3] Fix virtio-blk issue with SWIOTLB

2019-01-10 Thread Jason Wang



On 2019/1/10 下午9:44, Joerg Roedel wrote:

Hi,

there is a problem with virtio-blk driven devices when
virtio-ring uses SWIOTLB through the DMA-API. This happens
for example in AMD-SEV enabled guests, where the guest RAM
is mostly encrypted and all emulated DMA has to happen
to/from the SWIOTLB aperture.

The problem is a limitation of the SWIOTLB implementation,
which does not support allocations larger than 256kb. When
the virtio-blk driver tries to read/write a block larger
than that, the allocation of the dma-handle fails and an IO
error is reported.

This patch-set adds a check to the virtio-code whether it
might be using SWIOTLB bounce buffering and limits the
maximum segment size in the virtio-blk driver in this case,
so that it doesn't try to do larger reads/writes.



Just wonder if my understanding is correct IOMMU_PLATFORM must be set 
for all virtio devices under AMD-SEV guests?


Thanks




Please review.

Thanks,

Joerg

Joerg Roedel (3):
   swiotlb: Export maximum allocation size
   virtio: Introduce virtio_max_dma_size()
   virtio-blk: Consider virtio_max_dma_size() for maximum segment size

  drivers/block/virtio_blk.c   | 10 ++
  drivers/virtio/virtio_ring.c | 11 +++
  include/linux/swiotlb.h  | 12 
  include/linux/virtio.h   |  2 ++
  4 files changed, 31 insertions(+), 4 deletions(-)



[PATCH 2/3] powerpc: remove redundant header search path additions

2019-01-10 Thread Masahiro Yamada
The same path -Iarch/$(ARCH) is passed to KBUILD_CPPFLAGS,
KBUILD_AFLAGS, and KBUILD_CFLAGS.

As you see in scripts/Makefile.lib, KBUILD_CPPFLAGS is passed
to c_flags and a_flags as well.

Passing it to KBUILD_CPPFLAGS is enough.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 488c9ed..ac03334 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -213,9 +213,9 @@ endif
 asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)
 
 KBUILD_CPPFLAGS+= -Iarch/$(ARCH) $(asinstr)
-KBUILD_AFLAGS  += -Iarch/$(ARCH) $(AFLAGS-y)
+KBUILD_AFLAGS  += $(AFLAGS-y)
 KBUILD_CFLAGS  += $(call cc-option,-msoft-float)
-KBUILD_CFLAGS  += -pipe -Iarch/$(ARCH) $(CFLAGS-y)
+KBUILD_CFLAGS  += -pipe $(CFLAGS-y)
 CPP= $(CC) -E $(KBUILD_CFLAGS)
 
 CHECKFLAGS += -m$(BITS) -D__powerpc__ -D__powerpc$(BITS)__
-- 
2.7.4



[PATCH 3/3] powerpc: math-emu: remove unneeded header search paths

2019-01-10 Thread Masahiro Yamada
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.

-Iinclude/math-emu seems unnecessary because all files include headers
in the form of #include .

I was able to build without these header search paths.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/math-emu/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/math-emu/Makefile b/arch/powerpc/math-emu/Makefile
index 494df26..a879403 100644
--- a/arch/powerpc/math-emu/Makefile
+++ b/arch/powerpc/math-emu/Makefile
@@ -17,4 +17,4 @@ obj-$(CONFIG_SPE) += math_efp.o
 CFLAGS_fabs.o = -fno-builtin-fabs
 CFLAGS_math.o = -fno-builtin-fabs
 
-ccflags-y = -I. -Iinclude/math-emu -w
+ccflags-y = -w
-- 
2.7.4



[PATCH 0/3] powerpc: some header search path cleanups

2019-01-10 Thread Masahiro Yamada
I am trying to get rid of crappy magic from Kbuild core makefiles.

Before that, I want to drop as many useless paths as possible.
Actually, many Makefiles are adding around pointless options.

This series cleans some powerpc Makefiles.
(only compile-tested by 0day bot)



Masahiro Yamada (3):
  KVM: powerpc: remove -I. header search paths
  powerpc: remove redundant header search path additions
  powerpc: math-emu: remove unneeded header search paths

 arch/powerpc/Makefile  | 4 ++--
 arch/powerpc/kvm/Makefile  | 5 -
 arch/powerpc/math-emu/Makefile | 2 +-
 3 files changed, 3 insertions(+), 8 deletions(-)

-- 
2.7.4



[PATCH 1/3] KVM: powerpc: remove -I. header search paths

2019-01-10 Thread Masahiro Yamada
The header search path -I. in kernel Makefiles is very suspicious;
it allows the compiler to search for headers in the top of $(srctree),
where obviously no header file exists.

Commit 46f43c6ee022 ("KVM: powerpc: convert marker probes to event
trace") first added these options, but they are completely useless.

Signed-off-by: Masahiro Yamada 
---

 arch/powerpc/kvm/Makefile | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 64f1135..3223aec 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -10,11 +10,6 @@ common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o
 common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
 common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 
-CFLAGS_e500_mmu.o := -I.
-CFLAGS_e500_mmu_host.o := -I.
-CFLAGS_emulate.o  := -I.
-CFLAGS_emulate_loadstore.o  := -I.
-
 common-objs-y += powerpc.o emulate_loadstore.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
 obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o
-- 
2.7.4



Re: Re: [PATCH] drm/mediatek: Add MTK Framebuffer-Device (mt7623)

2019-01-10 Thread CK Hu
Hi, Daniel:

On Thu, 2019-01-10 at 21:02 +0100, Daniel Vetter wrote:
> On Thu, Jan 10, 2019 at 08:01:37PM +0100, Frank Wunderlich wrote:
> > Hi Daniel,
> > 
> > > > Would be good to use the new generic fbdev emulation code here, for even
> > > > less code. Or at least know why this isn't possible to use for mtk (and
> > > > maybe address that in the core code). Hand-rolling fbdev code shouldn't 
> > > > be
> > > > needed anymore.
> > > 
> > > Back on the mailing list, no private replies please:
> > 
> > i don't wanted to spam all people with dumb questions ;)
> 
> There's no dumb questions, only insufficient documentation :-)
> 
> > > For examples please grep for drm_fbdev_generic_setup(). There's also a
> > > still in-flight series from Gerd Hoffmann to convert over bochs. That,
> > > plus all the kerneldoc linked from there should get you started.
> > > -Daniel
> > 
> > this is one of google best founds if i search for drm_fbdev_generic_setup:
> > 
> > https://lkml.org/lkml/2018/12/19/305
> > 
> > not very helpful...
> > 
> > so i tried kernel-doc
> > 
> > https://www.kernel.org/doc/html/latest/gpu/drm-kms-helpers.html?highlight=drm_fbdev_generic_setup#c.drm_fbdev_generic_setup
> > 
> > which is nice function-reference but i've found no generic workflow
> > 
> > as the posted driver is "only" a driver ported from kernel 4.4 by 
> > Alexander, i don't know if this new framework can be used and which parts 
> > need to be changed. I only try to bring his code Mainline
> > Maybe CK Hu can help here because driver is originally from him and he 
> > knows internals. Or maybe you can help here?
> > 
> > i personally make my first steps as spare-time kernel-developer :)
> 
> There's a ton of in-kernel users of that function already, I meant you can
> use those to serve as examples ... If those + the kerneldoc aren't
> good enough, then we need to improve them.
> -Daniel

I've tried with drm_fbdev_generic_setup() and it works fine with simple
modification. The patch is

--- a/drivers/gpu/drm/mediatek/mtk_drm_drv.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_drv.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -379,6 +380,7 @@ static void mtk_drm_kms_deinit(struct drm_device
*drm)
.gem_prime_get_sg_table = mtk_gem_prime_get_sg_table,
.gem_prime_import_sg_table = mtk_gem_prime_import_sg_table,
.gem_prime_mmap = mtk_drm_gem_mmap_buf,
+   .gem_prime_vmap = mtk_drm_gem_prime_vmap,
.ioctls = mtk_ioctls,
.num_ioctls = ARRAY_SIZE(mtk_ioctls),
.fops = _drm_fops,
@@ -416,6 +418,8 @@ static int mtk_drm_bind(struct device *dev)
if (ret < 0)
goto err_deinit;

+   drm_fbdev_generic_setup(drm, 32);
+
return 0;


But I implement .gem_prime_vmap with a workaround,


--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -280,3 +280,8 @@ int mtk_gem_create_ioctl(struct drm_device *dev,
void *data,
mtk_drm_gem_free_object(_gem->base);
return ret;
 }
+
+void *mtk_drm_gem_prime_vmap(struct drm_gem_object *obj)
+{
+   return (void *)1;
+}

Current drm_fb_helper depends on drm_client to create fbdev. When
drm_client create drm_client_buffer, it need to vmap to get kernel
vaddr. But I think for fbdev, the vaddr is useless. Do you agree that I
temporarily implement .gem_prime_vmap in such way?

Regards,
CK




Re: x86/fpu: Don't export __kernel_fpu_{begin,end}()

2019-01-10 Thread Kash Pande

> Yes, the "GPL condom" attempt doesn't work at all.  It's been shot down
> a long time ago in the courts.

SFLC maintains there is no kernel licensing issue[1].

As a side note, even Hellwig's suit against VMware was dismissed (he may
appeal)[2].

Debian and Canonical base their decision to ship DKMS source for ZFS on
Linux[3].

The GPL does not disqualify a user from compiling ZFS or Linux however
they see fit.
It is only the users' distribution rights that come into question.

No one is combing ZFS into Linux or even distributing binary modules here;
we're following the terms of GPL.

> My tolerance for ZFS is pretty non-existant.  Sun explicitly did not
> want their code to work on Linux, so why would we do extra work to get
> their code to work properly?

1. Should your personal feelings affect the quality of the Linux kernel?
I say no.

2. Did Sun or Oracle ever release any statement of any kind that backs
your statement?

3. What extra work is being done here aside from the dropping of a
pseudo-protection,
the "GPL ONLY" symbol export? Something tells me, even if someone else
did "the work"
and submitted patches, you would find a reason to tell them to get
stuffed and leave
it "as-is".


With all of that... why have ANY kind of tolerance for out of tree
kernel modules at all?


[1] https://www.softwarefreedom.org/resources/2016/linux-kernel-cddl.html

[2] https://opensource.com/law/16/8/gpl-enforcement-action-hellwig-v-vmware

[3] https://lists.debian.org/debian-devel-announce/2015/04/msg6.html







signature.asc
Description: OpenPGP digital signature


Re: x86/fpu: Don't export __kernel_fpu_{begin,end}()

2019-01-10 Thread Kash Pande


> On Thu, Jan 10, 2019 at 07:07:52PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2019-01-10 17:32:58 [+], Hutter, Tony wrote:
> > > > But since when did out-of-tree modules use __kernel_fpu_begin? 
It's an
> > > > x86-only thing, and shouldn't really be used by anyone, right?
> > >
> > > ZFS on Linux uses it for checksums.  Its removal is currently
breaking ZFS builds against 5.0:
> >
> > So btrfs uses crc32c() / kernel's crypto API for that and ZFS can't?
> > Well the crypto API is GPL only exported so that won't work. crc32c() is
> > EXPORT_SYMBOL() so it would work.
> > On the other hand it does not look right to provide a EXPORT_SYMBOL
> > wrapper around a GPL only interface…

> Yes, the "GPL condom" attempt doesn't work at all.  It's been shot down
> a long time ago in the courts.

SFLC maintains there is no kernel licensing issue[1].

As a side note, even Hellwig's suit against VMware was dismissed (he may
appeal)[2].

Debian and Canonical base their decision to ship DKMS source for ZFS on
Linux[3].

The GPL does not disqualify a user from compiling ZFS or Linux however
they see fit.
It is only the users' distribution rights that come into question.

No one is combing ZFS into Linux or even distributing binary modules here;
we're following the terms of GPL.

> My tolerance for ZFS is pretty non-existant.  Sun explicitly did not
> want their code to work on Linux, so why would we do extra work to get
> their code to work properly?


1. Should your personal feelings affect the quality of the Linux kernel?
I say no.

2. Did Sun or Oracle ever release any statement of any kind that backs
your statement?

3. What extra work is being done here aside from the dropping of a
pseudo-protection,
the "GPL ONLY" symbol export? Something tells me, even if someone else
did "the work"
and submitted patches, you would find a reason to tell them to get
stuffed and leave
it "as-is".


With all of that... why have ANY kind of tolerance for out of tree
kernel modules at all?


[1] https://www.softwarefreedom.org/resources/2016/linux-kernel-cddl.html

[2] https://opensource.com/law/16/8/gpl-enforcement-action-hellwig-v-vmware

[3] https://lists.debian.org/debian-devel-announce/2015/04/msg6.html





signature.asc
Description: OpenPGP digital signature


Re: [PATCH] ARM: imx: add i.MX7ULP cpuidle support

2019-01-10 Thread Shawn Guo
On Fri, Dec 14, 2018 at 08:23:25AM +, Anson Huang wrote:
> This patch adds cpuidle support for i.MX7ULP, 3 cpuidle
> states supported as below:
> 
> 1. WFI, just ARM wfi;
> 2. WAIT mode, mapped to SoC's partial stop mode #3;
> 3. STOP mode, mapped to SoC's partial stop mode #1.
> 
> In WAIT mode, system clock and bus clock will be enabled;
> In STOP mode, system clock and bus clock will be disabled.
> 
> Signed-off-by: Anson Huang 
> ---
>  arch/arm/mach-imx/Makefile  |  1 +
>  arch/arm/mach-imx/common.h  | 10 +++
>  arch/arm/mach-imx/cpuidle-imx7ulp.c | 60 
> +
>  arch/arm/mach-imx/cpuidle.h |  5 
>  arch/arm/mach-imx/mach-imx7ulp.c|  7 +
>  arch/arm/mach-imx/pm-imx7ulp.c  | 49 ++
>  6 files changed, 127 insertions(+), 5 deletions(-)
>  create mode 100644 arch/arm/mach-imx/cpuidle-imx7ulp.c
> 
> diff --git a/arch/arm/mach-imx/Makefile b/arch/arm/mach-imx/Makefile
> index 8af2f7e..12aa44a 100644
> --- a/arch/arm/mach-imx/Makefile
> +++ b/arch/arm/mach-imx/Makefile
> @@ -29,6 +29,7 @@ obj-$(CONFIG_SOC_IMX6SL) += cpuidle-imx6sl.o
>  obj-$(CONFIG_SOC_IMX6SLL) += cpuidle-imx6sx.o
>  obj-$(CONFIG_SOC_IMX6SX) += cpuidle-imx6sx.o
>  obj-$(CONFIG_SOC_IMX6UL) += cpuidle-imx6sx.o
> +obj-$(CONFIG_SOC_IMX7ULP) += cpuidle-imx7ulp.o
>  endif
>  
>  ifdef CONFIG_SND_IMX_SOC
> diff --git a/arch/arm/mach-imx/common.h b/arch/arm/mach-imx/common.h
> index bc915e5..a87fab1 100644
> --- a/arch/arm/mach-imx/common.h
> +++ b/arch/arm/mach-imx/common.h
> @@ -72,6 +72,15 @@ enum mxc_cpu_pwr_mode {
>   STOP_POWER_OFF, /* STOP + SRPG */
>  };
>  
> +enum ulp_cpu_pwr_mode {
> + HSRUN,/* High speed run mode */
> + RUN,  /* Run mode */
> + WAIT, /* Wait mode */
> + STOP, /* Stop mode */
> + VLPS, /* Very low power stop mode */
> + VLLS, /* very low leakage stop mode */

Can we prefix these enums a little bit, like UPL_PM_XXX?

> +};
> +
>  void imx_enable_cpu(int cpu, bool enable);
>  void imx_set_cpu_jump(int cpu, void *jump_addr);
>  u32 imx_get_cpu_arg(int cpu);
> @@ -98,6 +107,7 @@ int imx6_set_lpm(enum mxc_cpu_pwr_mode mode);
>  void imx6_set_int_mem_clk_lpm(bool enable);
>  void imx6sl_set_wait_clk(bool enter);
>  int imx_mmdc_get_ddr_type(void);
> +int imx7ulp_set_lpm(enum ulp_cpu_pwr_mode mode);
>  
>  void imx_cpu_die(unsigned int cpu);
>  int imx_cpu_kill(unsigned int cpu);
> diff --git a/arch/arm/mach-imx/cpuidle-imx7ulp.c 
> b/arch/arm/mach-imx/cpuidle-imx7ulp.c
> new file mode 100644
> index 000..a59df93
> --- /dev/null
> +++ b/arch/arm/mach-imx/cpuidle-imx7ulp.c
> @@ -0,0 +1,60 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2016 Freescale Semiconductor, Inc.
> + * Copyright 2017-2018 NXP
> + *   Anson Huang 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "common.h"
> +#include "cpuidle.h"
> +
> +static int imx7ulp_enter_wait(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> +{
> + if (index == 1)
> + imx7ulp_set_lpm(WAIT);
> + else
> + imx7ulp_set_lpm(STOP);
> +
> + cpu_do_idle();
> +
> + imx7ulp_set_lpm(RUN);
> +
> + return index;
> +}
> +
> +static struct cpuidle_driver imx7ulp_cpuidle_driver = {
> + .name = "imx7ulp_cpuidle",
> + .owner = THIS_MODULE,
> + .states = {
> + /* WFI */
> + ARM_CPUIDLE_WFI_STATE,
> + /* WAIT */
> + {
> + .exit_latency = 50,
> + .target_residency = 75,
> + .enter = imx7ulp_enter_wait,
> + .name = "WAIT",
> + .desc = "PSTOP2",
> + },
> + /* STOP */
> + {
> + .exit_latency = 100,
> + .target_residency = 150,
> + .enter = imx7ulp_enter_wait,
> + .name = "STOP",
> + .desc = "PSTOP1",
> + },
> + },
> + .state_count = 3,
> + .safe_state_index = 0,
> +};
> +
> +int __init imx7ulp_cpuidle_init(void)
> +{
> + return cpuidle_register(_cpuidle_driver, NULL);
> +}
> diff --git a/arch/arm/mach-imx/cpuidle.h b/arch/arm/mach-imx/cpuidle.h
> index f914012..7694c8f 100644
> --- a/arch/arm/mach-imx/cpuidle.h
> +++ b/arch/arm/mach-imx/cpuidle.h
> @@ -15,6 +15,7 @@ extern int imx5_cpuidle_init(void);
>  extern int imx6q_cpuidle_init(void);
>  extern int imx6sl_cpuidle_init(void);
>  extern int imx6sx_cpuidle_init(void);
> +extern int imx7ulp_cpuidle_init(void);
>  #else
>  static inline int imx5_cpuidle_init(void)
>  {
> @@ -32,4 +33,8 @@ static inline int imx6sx_cpuidle_init(void)
>  {
>   return 0;
>  }
> +static inline int imx7ulp_cpuidle_init(void)
> +{
> + return 0;
> +}
>  #endif
> diff --git a/arch/arm/mach-imx/mach-imx7ulp.c 
> b/arch/arm/mach-imx/mach-imx7ulp.c
> index 

Re: PROBLEM: syzkaller found / pool corruption-overwrite / page in user-area or NULL

2019-01-10 Thread Esme
> > [ 75.793150] RIP: 0010:rb_insert_color+0x189/0x1480
>
> What's in that line? Try,
>
> $ ./scripts/faddr2line vmlinux rb_insert_color+0x189/0x1480

rb_insert_color+0x189/0x1480:
__rb_insert at /home/files/git/linux/lib/rbtree.c:131
(inlined by) rb_insert_color at /home/files/git/linux/lib/rbtree.c:452

>
> What's steps to reproduce this?

The steps is the kernel config provided (proc.config) and I double checked the 
attached C code from the qemu image (attached here).  If the kernel does not 
immediately crash, a ^C will cause the fault to be noticed.  The report from 
earlier is the report from the same code, my assumption was that the possible 
pool/redzone corruption is making it a bit tricky to pin down.

If you would like alternative kernel settings please let me know, I can do 
that, also, my current test-bench has about 256 core's on x64, 64 of them are 
bare metal and 32 are arm64.  Any possible preferred configuration tweaks I'm 
all ears, I'll be including some of these steps you suggested to me in 
any/additional upcoming threads (Thank you for that so far and future 
suggestions).

Also, there is some occasionally varying stacks depending on the corruption, so 
this stack just now (another execution of test3.c);

./scripts/faddr2line vmlinux rcu_process_callbacks+0xd45/0x1650
rcu_process_callbacks+0xd45/0x1650:
rcu_lock_release at include/linux/rcupdate.h:228
(inlined by) __rcu_reclaim at kernel/rcu/rcu.h:234
(inlined by) rcu_do_batch at kernel/rcu/tree.c:2452
(inlined by) invoke_rcu_callbacks at kernel/rcu/tree.c:2773
(inlined by) rcu_process_callbacks at kernel/rcu/tree.c:2754

(stack from just now)


[12580.358392] 
==
[12580.360076] BUG: KASAN: double-free or invalid-free in 
rcu_process_callbacks+0xd45/0x1650
[12580.361738]
[12580.362068] CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc1+ #5
[12580.363383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.11.1-1ubuntu1 04/01/2014
[12580.365223] Call Trace:
[12580.365772]  dump_stack+0x1d3/0x2c2
[12580.366518]  ? dump_stack_print_info.cold.1+0x20/0x20
[12580.367608]  ? printk+0xad/0xd3
[12580.368278]  ? kmsg_dump_rewind_nolock+0xf0/0xf0
[12580.369261]  print_address_description.cold.5+0x9/0x208
[12580.370393]  ? rcu_process_callbacks+0xd45/0x1650
[12580.371376]  kasan_report_invalid_free+0x64/0xa0
[12580.372356]  ? rcu_process_callbacks+0xd45/0x1650
[12580.373358]  __kasan_slab_free+0x138/0x150
[12580.374196]  ? rcu_process_callbacks+0xd45/0x1650
[12580.375142]  kasan_slab_free+0xe/0x10
[12580.375905]  kfree+0xcf/0x220
[12580.376537]  rcu_process_callbacks+0xd45/0x1650
[12580.377464]  ? rcu_process_callbacks+0xcf8/0x1650
[12580.378431]  ? rcu_fwd_progress_check+0xf0/0xf0
[12580.379371]  ? compat_start_thread+0x80/0x80
[12580.380292]  ? kasan_check_write+0x14/0x20
[12580.381145]  ? finish_task_switch+0x2cb/0x880
[12580.382028]  ? finish_task_switch+0x189/0x880
[12580.382920]  ? preempt_notifier_register+0x210/0x210
[12580.383944]  ? lock_repin_lock+0x450/0x450
[12580.384808]  ? __do_softirq+0x27d/0xb6a
[12580.385618]  ? kasan_check_read+0x11/0x20
[12580.386461]  ? rcu_is_watching+0x9d/0x160
[12580.387341]  ? trace_hardirqs_on+0xce/0x310
[12580.388217]  ? rcu_pm_notify+0xd0/0xd0
[12580.389008]  __do_softirq+0x2eb/0xb6a
[12580.389816]  ? __irqentry_text_end+0x1f9d5b/0x1f9d5b
[12580.390838]  ? trace_hardirqs_off+0xc6/0x310
[12580.391729]  ? smpboot_thread_fn+0x419/0x900
[12580.392611]  ? trace_hardirqs_on+0x310/0x310
[12580.393503]  ? check_same_owner+0x350/0x350
[12580.394368]  ? takeover_tasklets+0xaa0/0xaa0
[12580.395268]  ? takeover_tasklets+0xaa0/0xaa0
[12580.396153]  run_ksoftirqd+0x8b/0x110
[12580.396922]  smpboot_thread_fn+0x419/0x900
[12580.397785]  ? sort_range+0x40/0x40
[12580.398513]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[12580.399697]  ? __kthread_parkme+0x106/0x1c0
[12580.400563]  ? sort_range+0x40/0x40
[12580.401270]  kthread+0x358/0x460
[12580.401956]  ? kthread_bind+0x40/0x40
[12580.402741]  ret_from_fork+0x24/0x30
[12580.403504]
[12580.403844] Allocated by task 0:
[12580.404524] (stack is not available)
[12580.405276]
[12580.405620] Freed by task 0:
[12580.406223] (stack is not available)
[12580.406955]
[12580.407273] The buggy address belongs to the object at 88805b13e4f8
[12580.407273]  which belongs to the cache kmemleak_object of size 360
[12580.409867] The buggy address is located 120 bytes inside of
[12580.409867]  360-byte region [88805b13e4f8, 88805b13e660)
[12580.412182] The buggy address belongs to the page:
[12580.413163] page:ea00016c4f80 count:1 mapcount:0 
mapping:88800fc13e40 index:0x0
[12580.414798] flags: 0x1fffc000200(slab)
[12580.415653] raw: 01fffc000200 ea00016c7fc8 ea0001aba308 
88800fc13e40
[12580.417245] raw:  88805b13e000 00010009 

[12580.418800] page dumped because: kasan: bad access detected
[12580.419969]
[12580.420300] 

Re: [PATCH v4 2/2] dts: arm64/sdm845: Add node for arm,mmu-500

2019-01-10 Thread Bjorn Andersson
On Tue 08 Jan 03:18 PST 2019, Vivek Gautam wrote:

> 
> On 1/8/2019 12:29 PM, Bjorn Andersson wrote:
> > On Thu 11 Oct 02:49 PDT 2018, Vivek Gautam wrote:
> > 
> > > Add device node for arm,mmu-500 available on sdm845.
> > > This MMU-500 with single TCU and multiple TBU architecture
> > > is shared among all the peripherals except gpu.
> > > 
> > Hi Vivek,
> > 
> > Applying this patch together with UFS ([1] and [2]) ontop of v5.0-rc1
> > causes my MTP reboot once the UFSHCD module is inserted and probed.
> > Independently the patches seems to work fine.
> > 
> > Do you have any suggestion to why this would be?
> 
> 
> Hi Bjorn,
> 
> Enabling SMMU on sdm845 when you have UFS also enabled, would need addition
> of
> 'iommus' property to ufs dt node.
> You will need to add the following with ufs:
> 
> iommus = <_smmu 0x100 0xf>;
> 

Thanks, this do address the sudden restart of my MTP, but provides a
fault.

[7.391117] arm-smmu 1500.iommu: Unhandled context fault: fsr=0x402, 
iova=0xdf3e0, fsynr=0x29, cb=0
[7.747406] ufshcd-qcom 1d84000.ufshc: ufshcd_verify_dev_init: NOP OUT 
failed -11

The only thing done ontop of v5.0-rc1, is to take your patch adding
apps_smmu, add the ufs nodes as Evan proposed and specify iommus in the
ufshcd node.


With Coreboot UFS seems to work without specifying iommus, but with it
UFS fails to come up.

Regards,
Bjorn

> Thanks
> Vivek
> 
> > 
> > [1] 
> > https://lore.kernel.org/lkml/20181210192826.241350-4-evgr...@chromium.org/
> > [2] 
> > https://lore.kernel.org/lkml/20181210192826.241350-5-evgr...@chromium.org/
> > 
> > Regards,
> > Bjorn
> > 
> > > Signed-off-by: Vivek Gautam 
> > > ---
> > > 
> > > Changes since v3:
> > >   - none.
> > > 
> > >   arch/arm64/boot/dts/qcom/sdm845.dtsi | 72 
> > > 
> > >   1 file changed, 72 insertions(+)
> > > 
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
> > > b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > index b72bdb0a31a5..0aace729643d 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
> > > @@ -1297,6 +1297,78 @@
> > >   cell-index = <0>;
> > >   };
> > > + apps_smmu: iommu@1500 {
> > > + compatible = "qcom,sdm845-smmu-500", "arm,mmu-500";
> > > + reg = <0x1500 0x8>;
> > > + #iommu-cells = <2>;
> > > + #global-interrupts = <1>;
> > > + interrupts = ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +  ,
> > > +

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2019-01-10 Thread Pingfan Liu
On Tue, Jan 8, 2019 at 10:34 PM Michal Hocko  wrote:
>
> On Thu 20-12-18 10:19:34, Michal Hocko wrote:
> > On Thu 20-12-18 15:19:39, Pingfan Liu wrote:
> > > Hi Michal,
> > >
> > > WIth this patch applied on the old one, I got the following message.
> > > Please get it from attachment.
> > [...]
> > > [0.409637] NUMA: Node 1 [mem 0x-0x0009] + [mem 
> > > 0x0010-0x7fff] -> [mem 0x-0x7fff]
> > > [0.419858] NUMA: Node 1 [mem 0x-0x7fff] + [mem 
> > > 0x1-0x47fff] -> [mem 0x-0x47fff]
> > > [0.430356] NODE_DATA(0) allocated [mem 0x87efd4000-0x87effefff]
> > > [0.436325] NODE_DATA(0) on node 5
> > > [0.440092] Initmem setup node 0 [mem 
> > > 0x-0x]
> > > [0.447078] node[0] zonelist:
> > > [0.450106] NODE_DATA(1) allocated [mem 0x47ffd5000-0x47fff]
> > > [0.456114] NODE_DATA(2) allocated [mem 0x87efa9000-0x87efd3fff]
> > > [0.462064] NODE_DATA(2) on node 5
> > > [0.465852] Initmem setup node 2 [mem 
> > > 0x-0x]
> > > [0.472813] node[2] zonelist:
> > > [0.475846] NODE_DATA(3) allocated [mem 0x87ef7e000-0x87efa8fff]
> > > [0.481827] NODE_DATA(3) on node 5
> > > [0.485590] Initmem setup node 3 [mem 
> > > 0x-0x]
> > > [0.492575] node[3] zonelist:
> > > [0.495608] NODE_DATA(4) allocated [mem 0x87ef53000-0x87ef7dfff]
> > > [0.501587] NODE_DATA(4) on node 5
> > > [0.505349] Initmem setup node 4 [mem 
> > > 0x-0x]
> > > [0.512334] node[4] zonelist:
> > > [0.515370] NODE_DATA(5) allocated [mem 0x87ef28000-0x87ef52fff]
> > > [0.521384] NODE_DATA(6) allocated [mem 0x87eefd000-0x87ef27fff]
> > > [0.527329] NODE_DATA(6) on node 5
> > > [0.531091] Initmem setup node 6 [mem 
> > > 0x-0x]
> > > [0.538076] node[6] zonelist:
> > > [0.541109] NODE_DATA(7) allocated [mem 0x87eed2000-0x87eefcfff]
> > > [0.547090] NODE_DATA(7) on node 5
> > > [0.550851] Initmem setup node 7 [mem 
> > > 0x-0x]
> > > [0.557836] node[7] zonelist:
> >
> > OK, so it is clear that building zonelists this early is not going to
> > fly. We do not have the complete information yet. I am not sure when do
> > we get that at this moment but I suspect the we either need to move that
> > initialization to a sooner stage or we have to reconsider whether the
> > phase when we build zonelists really needs to consider only online numa
> > nodes.
> >
> > [...]
> > > [1.067658] percpu: Embedded 46 pages/cpu @(ptrval) s151552 
> > > r8192 d28672 u262144
> > > [1.075692] node[1] zonelist: 1:Normal 1:DMA32 1:DMA 5:Normal
> > > [1.081376] node[5] zonelist: 5:Normal 1:Normal 1:DMA32 1:DMA
> >
> > I hope to get to this before I leave for christmas vacation, if not I
> > will stare into it after then.
>
> I am sorry but I didn't get to this sooner. But I've got another idea. I
> concluded that the whole dance is simply bogus and we should treat
> memory less nodes, well, as nodes with no memory ranges rather than
> special case them. Could you give the following a spin please?
>
> ---
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 1308f5408bf7..0e79445cfd85 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -216,8 +216,6 @@ static void __init alloc_node_data(int nid)
>
> node_data[nid] = nd;
> memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> -
> -   node_set_online(nid);
>  }
>
>  /**
> @@ -535,6 +533,7 @@ static int __init numa_register_memblks(struct 
> numa_meminfo *mi)
> /* Account for nodes with cpus and no memory */
> node_possible_map = numa_nodes_parsed;
> numa_nodemask_from_meminfo(_possible_map, mi);
> +   pr_info("parsed=%*pbl, possible=%*pbl\n", 
> nodemask_pr_args(_nodes_parsed), nodemask_pr_args(_possible_map));
> if (WARN_ON(nodes_empty(node_possible_map)))
> return -EINVAL;
>
> @@ -570,7 +569,7 @@ static int __init numa_register_memblks(struct 
> numa_meminfo *mi)
> return -EINVAL;
>
> /* Finally register nodes. */
> -   for_each_node_mask(nid, node_possible_map) {
> +   for_each_node_mask(nid, numa_nodes_parsed) {
> u64 start = PFN_PHYS(max_pfn);
> u64 end = 0;
>
> @@ -581,9 +580,6 @@ static int __init numa_register_memblks(struct 
> numa_meminfo *mi)
> end = max(mi->blk[i].end, end);
> }
>
> -   if (start >= end)
> -   continue;
> -
> /*
>  * Don't confuse VM with a node that doesn't have the
>  * minimum amount of memory:
> @@ -592,6 +588,8 @@ static int __init numa_register_memblks(struct 
> numa_meminfo *mi)
> continue;
>
>

[PATCH V8 2/3] arm64: dts: tegra: Add CQE Support for SDMMC4

2019-01-10 Thread Sowjanya Komatineni
Add CQE Support for Tegra186 and Tegra194 SDMMC4 controller

Signed-off-by: Sowjanya Komatineni 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 22815db4a3ed..3dfdc4701934 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -319,6 +319,7 @@
nvidia,default-trim = <0x9>;
nvidia,dqs-trim = <63>;
mmc-hs400-1_8v;
+   supports-cqe;
status = "disabled";
};
 
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca0b851..9ce1c91d4596 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -325,6 +325,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   supports-cqe;
status = "disabled";
};
 
-- 
2.7.4



Re: [RFC PATCH kernel] powerpc/stack_protector: Fix external modules building

2019-01-10 Thread Masahiro Yamada
On Thu, Jan 10, 2019 at 2:44 PM Alexey Kardashevskiy  wrote:
>
> c3ff2a519 "powerpc/32: add stack protector support" addes stack protector
> support so now powerpc's "prepare" target depends on prepare0 (via
> stack_protector_prepare target).
>
> It works fine until we try build an external module where it fails with:
> Run: 'make -j128 SYSSRC=/home/aik/p/kernel 
> SYSOUT=/home/aik/pbuild/kernel-le-pseries/ ARCH=powerpc'
> make[1]: Entering directory '/home/aik/p/kernel'
> make[2]: Entering directory '/home/aik/pbuild/kernel-le-pseries'
> make[2]: *** No rule to make target 'prepare0', needed by 
> 'stack_protector_prepare'.  Stop.
>
> The reason for that is that the main Linux Makefile defines "prepare0"
> only if KBUILD_EXTMOD=="".
>
> This hacks powerpc's Makefile to make external modules build again.
>
> Fixes: c3ff2a519 "powerpc/32: add stack protector support"
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
>
> It has been suggested that there is a better way of fixing this hence RFC.
>
>
> ---
>  arch/powerpc/Makefile | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 488c9ed..0492f62 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -419,7 +419,11 @@ archheaders:
>  ifdef CONFIG_STACKPROTECTOR
>  prepare: stack_protector_prepare
>
> +ifeq ($(KBUILD_EXTMOD),)
>  stack_protector_prepare: prepare0
> +else
> +stack_protector_prepare:
> +endif


Honestly, I think this is ugly.

Do you want me to send an alternative solution?




>  ifdef CONFIG_PPC64
> $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk 
> '{if ($$2 == "PACA_CANARY") print $$3;}' include/generated/asm-offsets.h))
>  else
> --
> 2.17.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH V8 1/3] dt-bindings: mmc: tegra: Add supports-cqe property

2019-01-10 Thread Sowjanya Komatineni
Add supports-cqe optional property for Tegra SDMMC.

Tegra186 and Tegra194 supports HW Command queue only
on SDMMC4 controller. This property is used to identify
command queue support controller in the tegra sdhci driver.

Signed-off-by: Sowjanya Komatineni 
---
 Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt 
b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
index 32b4b4e41923..fb14c2c8d7ee 100644
--- a/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
+++ b/Documentation/devicetree/bindings/mmc/nvidia,tegra20-sdhci.txt
@@ -72,6 +72,10 @@ Optional properties for Tegra210 and Tegra186:
 - nvidia,default-trim : Specify the default outbound clock trimmer
   value.
 - nvidia,dqs-trim : Specify DQS trim value for HS400 timing
+- supports-cqe : The presence of this property indicates that the
+  corresponding controller supports HW command queue feature.
+  Tegra186 and Tegra194 has 4 SDMMC Controllers and only SDMMC4
+  controller supports HW Command Queue with eMMC device.
 
   Notes on the pad calibration pull up and pulldown offset values:
 - The property values are drive codes which are programmed into the
-- 
2.7.4



[PATCH V8 3/3] mmc: tegra: HW Command Queue Support for Tegra SDMMC

2019-01-10 Thread Sowjanya Komatineni
This patch adds HW Command Queue for supported Tegra SDMMC
controllers.

Signed-off-by: Sowjanya Komatineni 
---
 drivers/mmc/host/Kconfig   |   1 +
 drivers/mmc/host/sdhci-tegra.c | 117 +++--
 drivers/mmc/host/sdhci.c   |   9 +++-
 drivers/mmc/host/sdhci.h   |   2 +
 4 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index e26b8145efb3..0739d26c001b 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -250,6 +250,7 @@ config MMC_SDHCI_TEGRA
depends on ARCH_TEGRA
depends on MMC_SDHCI_PLTFM
select MMC_SDHCI_IO_ACCESSORS
+   select MMC_CQHCI
help
  This selects the Tegra SD/MMC controller. If you have a Tegra
  platform with SD or MMC devices, say Y or M here.
diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
index e6ace31e2a41..94ea14651d86 100644
--- a/drivers/mmc/host/sdhci-tegra.c
+++ b/drivers/mmc/host/sdhci-tegra.c
@@ -33,6 +33,7 @@
 #include 
 
 #include "sdhci-pltfm.h"
+#include "cqhci.h"
 
 /* Tegra SDHOST controller vendor register definitions */
 #define SDHCI_TEGRA_VENDOR_CLOCK_CTRL  0x100
@@ -89,6 +90,9 @@
 #define NVQUIRK_NEEDS_PAD_CONTROL  BIT(7)
 #define NVQUIRK_DIS_CARD_CLK_CONFIG_TAPBIT(8)
 
+/* SDMMC CQE Base Address for Tegra Host Ver 4.1 and Higher */
+#define SDHCI_TEGRA_CQE_BASE_ADDR  0xF000
+
 struct sdhci_tegra_soc_data {
const struct sdhci_pltfm_data *pdata;
u32 nvquirks;
@@ -128,6 +132,7 @@ struct sdhci_tegra {
u32 default_tap;
u32 default_trim;
u32 dqs_trim;
+   bool enable_hwcq;
 };
 
 static u16 tegra_sdhci_readw(struct sdhci_host *host, int reg)
@@ -595,6 +600,20 @@ static void tegra_sdhci_parse_tap_and_trim(struct 
sdhci_host *host)
tegra_host->dqs_trim = 0x11;
 }
 
+static void tegra_sdhci_parse_dt(struct sdhci_host *host)
+{
+   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+   struct sdhci_tegra *tegra_host = sdhci_pltfm_priv(pltfm_host);
+
+   if (device_property_read_bool(host->mmc->parent, "supports-cqe"))
+   tegra_host->enable_hwcq = true;
+   else
+   tegra_host->enable_hwcq = false;
+
+   tegra_sdhci_parse_pad_autocal_dt(host);
+   tegra_sdhci_parse_tap_and_trim(host);
+}
+
 static void tegra_sdhci_set_clock(struct sdhci_host *host, unsigned int clock)
 {
struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
@@ -836,6 +855,49 @@ static void tegra_sdhci_voltage_switch(struct sdhci_host 
*host)
tegra_host->pad_calib_required = true;
 }
 
+static void sdhci_tegra_cqe_enable(struct mmc_host *mmc)
+{
+   struct cqhci_host *cq_host = mmc->cqe_private;
+   u32 cqcfg = 0;
+
+   /*
+* Tegra SDMMC Controller design prevents write access to BLOCK_COUNT
+* registers when CQE is enabled.
+*/
+   cqcfg = cqhci_readl(cq_host, CQHCI_CFG);
+   if (cqcfg & CQHCI_ENABLE)
+   cqhci_writel(cq_host, (cqcfg & ~CQHCI_ENABLE), CQHCI_CFG);
+
+   sdhci_cqe_enable(mmc);
+
+   if (cqcfg & CQHCI_ENABLE)
+   cqhci_writel(cq_host, cqcfg, CQHCI_CFG);
+}
+
+static void sdhci_tegra_dumpregs(struct mmc_host *mmc)
+{
+   sdhci_dumpregs(mmc_priv(mmc));
+}
+
+static u32 sdhci_tegra_cqhci_irq(struct sdhci_host *host, u32 intmask)
+{
+   int cmd_error = 0;
+   int data_error = 0;
+
+   if (!sdhci_cqe_irq(host, intmask, _error, _error))
+   return intmask;
+
+   cqhci_irq(host->mmc, intmask, cmd_error, data_error);
+
+   return 0;
+}
+
+static const struct cqhci_host_ops sdhci_tegra_cqhci_ops = {
+   .enable = sdhci_tegra_cqe_enable,
+   .disable = sdhci_cqe_disable,
+   .dumpregs = sdhci_tegra_dumpregs,
+};
+
 static const struct sdhci_ops tegra_sdhci_ops = {
.get_ro = tegra_sdhci_get_ro,
.read_w = tegra_sdhci_readw,
@@ -989,6 +1051,7 @@ static const struct sdhci_ops tegra186_sdhci_ops = {
.set_uhs_signaling = tegra_sdhci_set_uhs_signaling,
.voltage_switch = tegra_sdhci_voltage_switch,
.get_max_clock = tegra_sdhci_get_max_clock,
+   .irq = sdhci_tegra_cqhci_irq,
 };
 
 static const struct sdhci_pltfm_data sdhci_tegra186_pdata = {
@@ -1030,6 +1093,54 @@ static const struct of_device_id sdhci_tegra_dt_match[] 
= {
 };
 MODULE_DEVICE_TABLE(of, sdhci_tegra_dt_match);
 
+static int sdhci_tegra_add_host(struct sdhci_host *host)
+{
+   struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
+   struct sdhci_tegra *tegra_host = sdhci_pltfm_priv(pltfm_host);
+   struct cqhci_host *cq_host;
+   bool dma64;
+   int ret;
+
+   if (!tegra_host->enable_hwcq)
+   return sdhci_add_host(host);
+
+   host->v4_mode = true;
+
+   ret = sdhci_setup_host(host);
+   if (ret)
+  

Re: [PATCH] binder: create node flag to request sender's security context

2019-01-10 Thread kbuild test robot
Hi Todd,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.0-rc1 next-20190110]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Todd-Kjos/binder-create-node-flag-to-request-sender-s-security-context/20190111-095225
config: i386-randconfig-x009-201901 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/android/binder.c: In function 'binder_transaction':
>> drivers/android/binder.c:3067:21: warning: cast from pointer to integer of 
>> different size [-Wpointer-to-int-cast]
  t->security_ctx = (binder_uintptr_t)kptr +
^

vim +3067 drivers/android/binder.c

  2761  
  2762  static void binder_transaction(struct binder_proc *proc,
  2763 struct binder_thread *thread,
  2764 struct binder_transaction_data *tr, int 
reply,
  2765 binder_size_t extra_buffers_size)
  2766  {
  2767  int ret;
  2768  struct binder_transaction *t;
  2769  struct binder_work *w;
  2770  struct binder_work *tcomplete;
  2771  binder_size_t *offp, *off_end, *off_start;
  2772  binder_size_t off_min;
  2773  u8 *sg_bufp, *sg_buf_end;
  2774  struct binder_proc *target_proc = NULL;
  2775  struct binder_thread *target_thread = NULL;
  2776  struct binder_node *target_node = NULL;
  2777  struct binder_transaction *in_reply_to = NULL;
  2778  struct binder_transaction_log_entry *e;
  2779  uint32_t return_error = 0;
  2780  uint32_t return_error_param = 0;
  2781  uint32_t return_error_line = 0;
  2782  struct binder_buffer_object *last_fixup_obj = NULL;
  2783  binder_size_t last_fixup_min_off = 0;
  2784  struct binder_context *context = proc->context;
  2785  int t_debug_id = atomic_inc_return(_last_id);
  2786  char *secctx = NULL;
  2787  u32 secctx_sz = 0;
  2788  
  2789  e = binder_transaction_log_add(_transaction_log);
  2790  e->debug_id = t_debug_id;
  2791  e->call_type = reply ? 2 : !!(tr->flags & TF_ONE_WAY);
  2792  e->from_proc = proc->pid;
  2793  e->from_thread = thread->pid;
  2794  e->target_handle = tr->target.handle;
  2795  e->data_size = tr->data_size;
  2796  e->offsets_size = tr->offsets_size;
  2797  e->context_name = proc->context->name;
  2798  
  2799  if (reply) {
  2800  binder_inner_proc_lock(proc);
  2801  in_reply_to = thread->transaction_stack;
  2802  if (in_reply_to == NULL) {
  2803  binder_inner_proc_unlock(proc);
  2804  binder_user_error("%d:%d got reply transaction 
with no transaction stack\n",
  2805proc->pid, thread->pid);
  2806  return_error = BR_FAILED_REPLY;
  2807  return_error_param = -EPROTO;
  2808  return_error_line = __LINE__;
  2809  goto err_empty_call_stack;
  2810  }
  2811  if (in_reply_to->to_thread != thread) {
  2812  spin_lock(_reply_to->lock);
  2813  binder_user_error("%d:%d got reply transaction 
with bad transaction stack, transaction %d has target %d:%d\n",
  2814  proc->pid, thread->pid, 
in_reply_to->debug_id,
  2815  in_reply_to->to_proc ?
  2816  in_reply_to->to_proc->pid : 0,
  2817  in_reply_to->to_thread ?
  2818  in_reply_to->to_thread->pid : 0);
  2819  spin_unlock(_reply_to->lock);
  2820  binder_inner_proc_unlock(proc);
  2821  return_error = BR_FAILED_REPLY;
  2822  return_error_param = -EPROTO;
  2823  return_error_line = __LINE__;
  2824  in_reply_to = NULL;
  2825  goto err_bad_call_stack;
  2826  }
  2827  thread->transaction_stack = in_reply_to->to_parent;
  2828  binder_inner_proc_unlock(proc);
  2829  binder_set_nice(in_reply_to->saved_priority);
  2830  target_thread = 
binder_get_txn_fr

[PATCH][V2][resend] tty: fix race between flush_to_ldisc and tty_open

2019-01-10 Thread Li RongQing
There still is a race window after the commit b027e2298bd588
("tty: fix data race between tty_init_dev and flush of buf"),
if receive_buf call comes before tty initialization completes
in n_tty_open and tty->driver_data may be NULL.

CPU0CPU1

 n_tty_open
   tty_init_dev
 tty_ldisc_unlock
   schedule
flush_to_ldisc
  n_tty_receive_buf
uart_flush_chars
  uart_start
  /*tty->driver_data is NULL*/
   tty->ops->open
   /*init tty->driver_data*/

Extending ldisc semaphore lock in tty_init_dev to driver_data
initialized completely after tty->ops->open().

Signed-off-by: Zhang Yu 
Signed-off-by: Li RongQing 
---
 drivers/staging/speakup/spk_ttyio.c |  1 +
 drivers/tty/pty.c   |  2 ++
 drivers/tty/serdev/serdev-ttyport.c |  2 ++
 drivers/tty/tty_io.c| 14 +++---
 drivers/tty/tty_ldisc.c |  1 +
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/speakup/spk_ttyio.c 
b/drivers/staging/speakup/spk_ttyio.c
index 979e3ae249c1..c31f08c98383 100644
--- a/drivers/staging/speakup/spk_ttyio.c
+++ b/drivers/staging/speakup/spk_ttyio.c
@@ -155,6 +155,7 @@ static int spk_ttyio_initialise_ldisc(struct spk_synth 
*synth)
else
ret = -ENODEV;
 
+   tty_ldisc_unlock(tty);
if (ret) {
tty_unlock(tty);
return ret;
diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index 00099a8439d2..1b9684d4f718 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -873,9 +873,11 @@ static int ptmx_open(struct inode *inode, struct file 
*filp)
 
tty_debug_hangup(tty, "opening (count=%d)\n", tty->count);
 
+   tty_ldisc_unlock(tty);
tty_unlock(tty);
return 0;
 err_release:
+   tty_ldisc_unlock(tty);
tty_unlock(tty);
// This will also put-ref the fsi
tty_release(inode, filp);
diff --git a/drivers/tty/serdev/serdev-ttyport.c 
b/drivers/tty/serdev/serdev-ttyport.c
index fa1672993b4c..ce16cb303e28 100644
--- a/drivers/tty/serdev/serdev-ttyport.c
+++ b/drivers/tty/serdev/serdev-ttyport.c
@@ -123,6 +123,7 @@ static int ttyport_open(struct serdev_controller *ctrl)
if (ret)
goto err_close;
 
+   tty_ldisc_unlock(tty);
tty_unlock(serport->tty);
 
/* Bring the UART into a known 8 bits no parity hw fc state */
@@ -145,6 +146,7 @@ static int ttyport_open(struct serdev_controller *ctrl)
 err_close:
tty->ops->close(tty, NULL);
 err_unlock:
+   tty_ldisc_unlock(tty);
tty_unlock(tty);
tty_release_struct(tty, serport->tty_idx);
 
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 687250ec8032..199f45e2e1b1 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1351,7 +1351,6 @@ struct tty_struct *tty_init_dev(struct tty_driver 
*driver, int idx)
retval = tty_ldisc_setup(tty, tty->link);
if (retval)
goto err_release_tty;
-   tty_ldisc_unlock(tty);
/* Return the tty locked so that it cannot vanish under the caller */
return tty;
 
@@ -1926,7 +1925,7 @@ EXPORT_SYMBOL_GPL(tty_kopen);
  *   - concurrent tty removal from driver table
  */
 static struct tty_struct *tty_open_by_driver(dev_t device, struct inode *inode,
-struct file *filp)
+struct file *filp, bool *unlock)
 {
struct tty_struct *tty;
struct tty_driver *driver = NULL;
@@ -1970,6 +1969,7 @@ static struct tty_struct *tty_open_by_driver(dev_t 
device, struct inode *inode,
}
} else { /* Returns with the tty_lock held for now */
tty = tty_init_dev(driver, index);
+   *unlock = true;
mutex_unlock(_mutex);
}
 out:
@@ -2007,6 +2007,7 @@ static int tty_open(struct inode *inode, struct file 
*filp)
int noctty, retval;
dev_t device = inode->i_rdev;
unsigned saved_flags = filp->f_flags;
+   bool unlock = false;
 
nonseekable_open(inode, filp);
 
@@ -2017,7 +2018,7 @@ static int tty_open(struct inode *inode, struct file 
*filp)
 
tty = tty_open_current_tty(device, filp);
if (!tty)
-   tty = tty_open_by_driver(device, inode, filp);
+   tty = tty_open_by_driver(device, inode, filp, );
 
if (IS_ERR(tty)) {
tty_free_file(filp);
@@ -2042,6 +2043,10 @@ static int tty_open(struct inode *inode, struct file 
*filp)
if (retval) {
tty_debug_hangup(tty, "open error %d, releasing\n", retval);
 
+   if (unlock) {
+   unlock = false;
+   

Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2019-01-10 Thread John Hubbard
On 1/3/19 6:44 AM, Jerome Glisse wrote:
> On Thu, Jan 03, 2019 at 10:26:54AM +0100, Jan Kara wrote:
>> On Wed 02-01-19 20:55:33, Jerome Glisse wrote:
>>> On Wed, Dec 19, 2018 at 12:08:56PM +0100, Jan Kara wrote:
 On Tue 18-12-18 21:07:24, Jerome Glisse wrote:
> On Tue, Dec 18, 2018 at 03:29:34PM -0800, John Hubbard wrote:
>> OK, so let's take another look at Jerome's _mapcount idea all by itself 
>> (using
>> *only* the tracking pinned pages aspect), given that it is the lightest 
>> weight
>> solution for that.  
>>
>> So as I understand it, this would use page->_mapcount to store both the 
>> real
>> mapcount, and the dma pinned count (simply added together), but only do 
>> so for
>> file-backed (non-anonymous) pages:
>>
>>
>> __get_user_pages()
>> {
>>  ...
>>  get_page(page);
>>
>>  if (!PageAnon)
>>  atomic_inc(page->_mapcount);
>>  ...
>> }
>>
>> put_user_page(struct page *page)
>> {
>>  ...
>>  if (!PageAnon)
>>  atomic_dec(>_mapcount);
>>
>>  put_page(page);
>>  ...
>> }
>>
>> ...and then in the various consumers of the DMA pinned count, we use 
>> page_mapped(page)
>> to see if any mapcount remains, and if so, we treat it as DMA pinned. Is 
>> that what you 
>> had in mind?
>
> Mostly, with the extra two observations:
> [1] We only need to know the pin count when a write back kicks in
> [2] We need to protect GUP code with wait_for_write_back() in case
> GUP is racing with a write back that might not the see the
> elevated mapcount in time.
>
> So for [2]
>
> __get_user_pages()
> {
> get_page(page);
>
> if (!PageAnon) {
> atomic_inc(page->_mapcount);
> +   if (PageWriteback(page)) {
> +   // Assume we are racing and curent write back will not see
> +   // the elevated mapcount so wait for current write back and
> +   // force page fault
> +   wait_on_page_writeback(page);
> +   // force slow path that will fault again
> +   }
> }
> }

 This is not needed AFAICT. __get_user_pages() gets page reference (and it
 should also increment page->_mapcount) under PTE lock. So at that point we
 are sure we have writeable PTE nobody can change. So page_mkclean() has to
 block on PTE lock to make PTE read-only and only after going through all
 PTEs like this, it can check page->_mapcount. So the PTE lock provides
 enough synchronization.

> For [1] only needing pin count during write back turns page_mkclean into
> the perfect spot to check for that so:
>
> int page_mkclean(struct page *page)
> {
> int cleaned = 0;
> +   int real_mapcount = 0;
> struct address_space *mapping;
> struct rmap_walk_control rwc = {
> .arg = (void *),
> .rmap_one = page_mkclean_one,
> .invalid_vma = invalid_mkclean_vma,
> +   .mapcount = _mapcount,
> };
>
> BUG_ON(!PageLocked(page));
>
> if (!page_mapped(page))
> return 0;
>
> mapping = page_mapping(page);
> if (!mapping)
> return 0;
>
> // rmap_walk need to change to count mapping and return value
> // in .mapcount easy one
> rmap_walk(page, );
>
> // Big fat comment to explain what is going on
> +   if ((page_mapcount(page) - real_mapcount) > 0) {
> +   SetPageDMAPined(page);
> +   } else {
> +   ClearPageDMAPined(page);
> +   }

 This is the detail I'm not sure about: Why cannot rmap_walk_file() race
 with e.g. zap_pte_range() which decrements page->_mapcount and thus the
 check we do in page_mkclean() is wrong?

>>>
>>> Ok so i found a solution for that. First GUP must wait for racing
>>> write back. If GUP see a valid write-able PTE and the page has
>>> write back flag set then it must back of as if the PTE was not
>>> valid to force fault. It is just a race with page_mkclean and we
>>> want ordering between the two. Note this is not strictly needed
>>> so we can relax that but i believe this ordering is better to do
>>> in GUP rather then having each single user of GUP test for this
>>> to avoid the race.
>>>
>>> GUP increase mapcount only after checking that it is not racing
>>> with writeback it also set a page flag (SetPageDMAPined(page)).
>>>
>>> When clearing a write-able pte we set a special entry inside the
>>> page table (might need a new special swap type for this) and change
>>> page_mkclean_one() to clear to 0 those special entry.
>>>
>>>
>>> Now page_mkclean:
>>>
>>> int page_mkclean(struct page *page)
>>> {
>>> int cleaned = 0;
>>> +   int real_mapcount = 0;
>>> struct address_space *mapping;
>>> struct 

[PATCH] regulator: max14577: Remove redundant MODULE_ALIAS

2019-01-10 Thread Axel Lin
The modalias is set by the MODULE_DEVICE_TABLE, thus remove redundant
MODULE_ALIAS.

Signed-off-by: Axel Lin 
---
 drivers/regulator/max14577-regulator.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/regulator/max14577-regulator.c 
b/drivers/regulator/max14577-regulator.c
index bc7f4751bf9c..85a88a9e4d42 100644
--- a/drivers/regulator/max14577-regulator.c
+++ b/drivers/regulator/max14577-regulator.c
@@ -324,4 +324,3 @@ module_exit(max14577_regulator_exit);
 MODULE_AUTHOR("Krzysztof Kozlowski ");
 MODULE_DESCRIPTION("Maxim 14577/77836 regulator driver");
 MODULE_LICENSE("GPL");
-MODULE_ALIAS("platform:max14577-regulator");
-- 
2.17.1



linux-next: manual merge of the akpm-current tree with the security tree

2019-01-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  security/selinux/ss/services.c

between commit:

  3d252529480c ("SELinux: Remove unused selinux_is_enabled")

from the security tree and commit:

  dd4c293d6d4d ("selinux: convert to kvmalloc")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc security/selinux/ss/services.c
index d6e7b4856d93,fac47b306b6b..
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@@ -49,7 -49,7 +49,6 @@@
  #include 
  #include 
  #include 
- #include 
 -#include 
  #include 
  #include 
  


pgpcYhilb3Mya.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/2] ARM: dts: imx7ulp: add sim node

2019-01-10 Thread Shawn Guo
On Fri, Dec 14, 2018 at 08:20:22AM +, Anson Huang wrote:
> i.MX7ULP SoC revision info is inside the SIM mode's JTAG_ID
> register, add sim node to support SoC revision check.
> 
> Signed-off-by: Anson Huang 
> ---
>  arch/arm/boot/dts/imx7ulp.dtsi | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/imx7ulp.dtsi b/arch/arm/boot/dts/imx7ulp.dtsi
> index b86daf7..fca6e50 100644
> --- a/arch/arm/boot/dts/imx7ulp.dtsi
> +++ b/arch/arm/boot/dts/imx7ulp.dtsi
> @@ -347,4 +347,17 @@
>   gpio-ranges = < 0 96 32>;
>   };
>   };
> +
> + m4aips1: bus@4108 {
> + compatible = "simple-bus";
> + #address-cells = <1>;
> + #size-cells = <1>;
> + reg = <0x4108 0x8>;
> + ranges;
> +
> + sim: sim@410a3000 {
> + compatible = "fsl,imx7ulp-sim", "syscon";

The compatible needs to be documented. Also we generally compose a series
in order below.

 - Bindings patch
 - Kernel patch
 - DTS patch

Shawn

> + reg = <0x410a3000 0x1000>;
> + };
> + };
>  };
> -- 
> 2.7.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[PATCH] kconfig: clean generated *conf-cfg files

2019-01-10 Thread Masahiro Yamada
I accidentally dropped '*' in the previous renaming patch.

Revive it so that 'make mrproper' can clean the generated files.

Fixes: d86271af6460 ("kconfig: rename generated .*conf-cfg to *conf-cfg")
Signed-off-by: Masahiro Yamada 
---

 scripts/kconfig/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index c05ab00..1819735 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -206,4 +206,4 @@ filechk_conf_cfg = $(CONFIG_SHELL) $<
 $(obj)/%conf-cfg: $(src)/%conf-cfg.sh FORCE
$(call filechk,conf_cfg)
 
-clean-files += conf-cfg
+clean-files += *conf-cfg
-- 
2.7.4



linux-next: manual merge of the akpm-current tree with Linus' tree

2019-01-10 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/rmap.c

between commit:

  ba422731316d ("mm/mmu_notifier: mm/rmap.c: Fix a mmu_notifier range bug in 
try_to_unmap_one")

from Linus' tree and commit:

  f955d5dda846 ("mm/mmu_notifier: contextual information for event triggering 
invalidation v2")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/rmap.c
index 0454ecc29537,62e47f3462cf..
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@@ -1371,9 -1372,10 +1372,10 @@@ static bool try_to_unmap_one(struct pag
 * Note that the page can not be free in this function as call of
 * try_to_unmap() must hold a reference on the page.
 */
 -  mmu_notifier_range_init(, vma->vm_mm, vma->vm_start,
 -  min(vma->vm_end, vma->vm_start +
 +  mmu_notifier_range_init(, vma->vm_mm, address,
 +  min(vma->vm_end, address +
-   (PAGE_SIZE << compound_order(page;
+   (PAGE_SIZE << compound_order(page))),
+   MMU_NOTIFY_CLEAR);
if (PageHuge(page)) {
/*
 * If sharing is possible, start and end will be adjusted


pgpBXmhUBegOj.pgp
Description: OpenPGP digital signature


Re: Interpreting /sys/block//{,}/discard_alignment

2019-01-10 Thread james harvey
On Thu, Jan 10, 2019 at 7:04 PM Martin K. Petersen
 wrote:
> James,
>
> > Q1 - I'm hoping you can clarify how this should be interpreted.
> >
> > I originally took this to mean the number of bytes into the first
> > discard_granularity block that the partition resides at.  i.e.  If
> > discard_granularity_block is 128MB, and partition 1 starts at sector
> > 2048 with 512 byte sectors, that this should return 2048*512=1048576
> > (1MB.)
>
> The alignment offset is the offset for the given block device. It
> doesn't matter whether the block device in question is a partition, DM
> device or a full device. A block device is a block device.
>
> The common alignment scenario is 3584 on a device with 4K physical
> blocks. That's because of the 63-sector legacy FAT partition table
> offset. Which essentially means that the first LBA is misaligned and the
> first aligned HBA is 7.

If I can double check I'm understanding you correctly, if:

* Block device "A" has 512 byte sectors
* A has a partition table with partition A1 starting at sector 2048
(1048576 bytes)
* A and A1 have discard_granularity of 128MB (134217728 bytes)
* A has discard_alignment of 0

Then A1 should have a discard_alignment of 1048576, not 133169152
(128MB - 512 bytes/sector * 2048 sectors)?

> Many of the first 512e drives shipped with that intentional misalignment
> as default. And you could switch it to 0-aligned via a jumper. These
> days all drives are 0-aligned.
>
> > Q2 - At https://lkml.org/lkml/2018/12/5/1693 --- I saw you recently
> > said "... there are not many devices that actually report a non-zero
> > discard alignment..."  Does this mean that every filesystem needs to
> > look at the partition table to determine its correct value on its own,
> > rather than using discard_alignment?
>
> No, it needs to look at the device topology for the block device it is
> on. I don't believe we ever wired up an ioctl for the discard alignment
> so you'll have to find your device in sysfs. There's an alignment ioctl
> for the "regular" block alignment, though.

Ahh, good.  I took that the wrong way, originally worried you were
saying the value of discard_alignment couldn't be trusted.


Re: [PATCH] soc: fsl: guts: us devm_kstrdup_const() for RO data

2019-01-10 Thread Nicholas Mc Guire
On Thu, Jan 10, 2019 at 01:43:01PM -0600, Li Yang wrote:
> On Sat, Dec 22, 2018 at 2:02 AM Nicholas Mc Guire  wrote:
> >
> > On Fri, Dec 21, 2018 at 08:29:56PM -0600, Scott Wood wrote:
> > > On Fri, 2018-12-07 at 09:22 +0100, Nicholas Mc Guire wrote:
> > > > devm_kstrdup() may return NULL if internal allocation failed, but
> > > > as  machine  is from the device tree, and thus RO, devm_kstrdup_const()
> > > > can be used here, which will only copy the reference.
> > >
> > > Is it really going to only copy the reference?  That would require that
> > > is_kernel_rodata(machine) be true, which it shouldn't be since it's not 
> > > part
> > > of the kernel image.
> > >
> > I had tried to figure out what is RO and what not but was not
> > able to determine that - from the discussion it seemed that the
> > assumption of RO is correct though I did not ask if it would
> > satisfy is_kernel_rodata() so that explains the incorrect assertion.
> > see https://lkml.org/lkml/2018/12/6/42
> > So then the only option is to check the return and cleanup
> > on allocation failure as the orriginal patch proposed.
> 
> Thanks for the good discussion. I will drop the previous patch. But
> would it also be good to just have "soc_dev_attr.machine = machine"
> directly?
>
I think that the intent is to switch to 
managed devm API so that the cleanup is handled properly
currently you would get "machine" from 
 of_property_read_string_index 
  -> of_property_read_string_helper
   -> of_find_property
which does not do any allocation - so there would actually
not be anything to cleanup here - don´t see why your solution
would not be suitable given the current API. the only advantage
of the devm_kstrdup() is that underlying APIs internal changes
would have no effect.

thx!
hofrat


[PATCH] x86/fpu/init: Add __setup() functions back to fpu/init.c

2019-01-10 Thread Haoyu Tang
__setup() functions were removed in:

  commit 4f81cbafcce2 ("x86/fpu: Fix early FPU command-line parsing")

caused that FPU parameter is passed as an argument to init, the dummy
__setup() functions can avoid this.

Signed-off-by: Haoyu Tang 
---
 arch/x86/kernel/fpu/init.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd835..df325d0 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -301,3 +301,39 @@ void __init fpu__init_system(struct cpuinfo_x86 *c)
 
fpu__init_system_ctx_switch();
 }
+
+/* Although parse fpu parameters early before parse_early_param(),but
+ * __setup() is still mandatory in order not to pass fpu parameters
+ * into init as argument. At here, the interfaces don't need to set
+ * registers again, they are only used to indicate parse_args() that
+ * the parameter has been consumed.
+ */
+static int __init no_387(char *s)
+{
+   return 1;
+}
+__setup("no387", no_387);
+
+static int __init x86_noxsave_setup(char *s)
+{
+   return 1;
+}
+__setup("noxsave", x86_noxsave_setup);
+
+static int __init x86_noxsaveopt_setup(char *s)
+{
+   return 1;
+}
+__setup("noxsaveopt", x86_noxsaveopt_setup);
+
+static int __init x86_noxsaves_setup(char *s)
+{
+   return 1;
+}
+__setup("noxsaves", x86_noxsaves_setup);
+
+static int __init x86_nofxsr_setup(char *s)
+{
+   return 1;
+}
+__setup("nofxsr", x86_nofxsr_setup);
-- 
1.9.1



Re: [PATCHv5] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-10 Thread Pingfan Liu
On Wed, Jan 9, 2019 at 10:25 PM Baoquan He  wrote:
>
> On 01/08/19 at 05:48pm, Mike Rapoport wrote:
> > On Tue, Jan 08, 2019 at 05:01:38PM +0800, Baoquan He wrote:
> > > Hi Mike,
> > >
> > > On 01/08/19 at 10:05am, Mike Rapoport wrote:
> > > > I'm not thrilled by duplicating this code (yet again).
> > > > I liked the v3 of this patch [1] more, assuming we allow bottom-up mode 
> > > > to
> > > > allocate [0, kernel_start) unconditionally.
> > > > I'd just replace you first patch in v3 [2] with something like:
> > >
> > > In initmem_init(), we will restore the top-down allocation style anyway.
> > > While reserve_crashkernel() is called after initmem_init(), it's not
> > > appropriate to adjust memblock_find_in_range_node(), and we really want
> > > to find region bottom up for crashkernel reservation, no matter where
> > > kernel is loaded, better call __memblock_find_range_bottom_up().
> > >
> > > Create a wrapper to do the necessary handling, then call
> > > __memblock_find_range_bottom_up() directly, looks better.
> >
> > What bothers me is 'the necessary handling' which is already done in
> > several places in memblock in a similar, but yet slightly different way.
>
> The page aligning for start and the mirror flag setting, I suppose.
> >
> > memblock_find_in_range() and memblock_phys_alloc_nid() retry with different
> > MEMBLOCK_MIRROR, but memblock_phys_alloc_try_nid() does that only when
> > allocating from the specified node and does not retry when it falls back to
> > any node. And memblock_alloc_internal() has yet another set of fallbacks.
>
> Get what you mean, seems they are trying to allocate within mirrorred
> memory region, if fail, try the non-mirrorred region. If kernel data
> allocation failed, no need to care about if it's movable or not, it need
> to live firstly. For the bottom-up allocation wrapper, maybe we need do
> like this too?
>
> >
> > So what should be the necessary handling in the wrapper for
> > __memblock_find_range_bottom_up() ?
> >
> > BTW, even without any memblock modifications, retrying allocation in
> > reserve_crashkerenel() for different ranges, like the proposal at [1] would
> > also work, wouldn't it?
>
> Yes, it also looks good. This patch only calls once, seems a simpler
> line adding.
>
> In fact, below one and this patch, both is fine to me, as long as it
> fixes the problem customers are complaining about.
>
It seems that there is divergence on opinion. Maybe it is easier to
fix this bug by dyoung's patch. I will repost his patch.

Thanks and regards,
Pingfan
> >
> > [1] http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
>
> Thanks
> Baoquan


Re: [PATCH v7 0/3] Fixes for Tegra soctherm

2019-01-10 Thread Wei Ni
Hi Eduardo,
Do you have any more comments, will you take this serial?

Thanks.
Wei.

On 3/1/2019 6:12 PM, Wei Ni wrote:
> This series fixed some issues for Tegra soctherm,
> and add get_trend().
> 
> Main changes from v6:
> 1. Per Eduardo's comment, we can remove the change:
> "thermal: tegra: parse sensor id before sensor register"
> 
> Main changes from v5:
> 1. Move the get_trend() patch https://lkml.org/lkml/2018/11/20/643
> into this serial.
> 
> Main changes from v4:
> 1. fixed for the parsing sensor id.
> 2. keep warning for missing critical trips.
> 
> Main changes from v3:
> 1. updated codes for parsing sensor id, per Thierry's comments
> 
> Main changes from v2:
> 1. add codes to parse sensor id to avoid registration
> failure.
> 
> Main changes from v1:
> 1. Acked by Thierry Reding  for the patch
> "thermal: tegra: fix memory allocation".
> 2. Print out the sensor name when register failed.
> 2. Remove patch "thermal: tegra: fix coverity defect"
> 
> Wei Ni (3):
>   thermal: tegra: remove unnecessary warnings
>   thermal: tegra: fix memory allocation
>   thermal: tegra: add get_trend ops
> 
>  drivers/thermal/tegra/soctherm.c | 40 
> +---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 


[PATCH] block/blk-core.c: Remove doc about request_count argument

2019-01-10 Thread Marcos Paulo de Souza
This argument was removed in 5f0ed774ed29.

Signed-off-by: Marcos Paulo de Souza 
---
 block/blk-core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c78042975737..eba494f528cb 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -661,7 +661,6 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
  * blk_attempt_plug_merge - try to merge with %current's plugged list
  * @q: request_queue new bio is being queued at
  * @bio: new bio being queued
- * @request_count: out parameter for number of traversed plugged requests
  * @same_queue_rq: pointer to  request that gets filled in when
  * another request associated with @q is found on the plug list
  * (optional, may be %NULL)
-- 
2.16.4



Re: [PATCHv5] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-10 Thread Pingfan Liu
On Thu, Jan 10, 2019 at 3:57 PM Mike Rapoport  wrote:
>
> Hi Pingfan,
>
> On Wed, Jan 09, 2019 at 09:02:41PM +0800, Pingfan Liu wrote:
> > On Tue, Jan 8, 2019 at 11:49 PM Mike Rapoport  wrote:
> > >
> > > On Tue, Jan 08, 2019 at 05:01:38PM +0800, Baoquan He wrote:
> > > > Hi Mike,
> > > >
> > > > On 01/08/19 at 10:05am, Mike Rapoport wrote:
> > > > > I'm not thrilled by duplicating this code (yet again).
> > > > > I liked the v3 of this patch [1] more, assuming we allow bottom-up 
> > > > > mode to
> > > > > allocate [0, kernel_start) unconditionally.
> > > > > I'd just replace you first patch in v3 [2] with something like:
> > > >
> > > > In initmem_init(), we will restore the top-down allocation style anyway.
> > > > While reserve_crashkernel() is called after initmem_init(), it's not
> > > > appropriate to adjust memblock_find_in_range_node(), and we really want
> > > > to find region bottom up for crashkernel reservation, no matter where
> > > > kernel is loaded, better call __memblock_find_range_bottom_up().
> > > >
> > > > Create a wrapper to do the necessary handling, then call
> > > > __memblock_find_range_bottom_up() directly, looks better.
> > >
> > > What bothers me is 'the necessary handling' which is already done in
> > > several places in memblock in a similar, but yet slightly different way.
> > >
> > > memblock_find_in_range() and memblock_phys_alloc_nid() retry with 
> > > different
> > > MEMBLOCK_MIRROR, but memblock_phys_alloc_try_nid() does that only when
> > > allocating from the specified node and does not retry when it falls back 
> > > to
> > > any node. And memblock_alloc_internal() has yet another set of fallbacks.
> > >
> > > So what should be the necessary handling in the wrapper for
> > > __memblock_find_range_bottom_up() ?
> > >
> > Well, it is a hard choice.
> > > BTW, even without any memblock modifications, retrying allocation in
> > > reserve_crashkerenel() for different ranges, like the proposal at [1] 
> > > would
> > > also work, wouldn't it?
> > >
> > Yes, it can work. Then is it worth to expose the bottom-up allocation
> > style beside for hotmovable purpose?
>
> Some architectures use bottom-up as a "compatability" mode with bootmem.
> And, I believe, powerpc and s390 use bottom-up to make some of the
> allocations close to the kernel.
>
Ok, got it. Thanks.

Best regards,
Pingfan

> > Thanks,
> > Pingfan
> > > [1] http://lists.infradead.org/pipermail/kexec/2017-October/019571.html
> > >
> > > > Thanks
> > > > Baoquan
> > > >
> > > > >
> > > > > diff --git a/mm/memblock.c b/mm/memblock.c
> > > > > index 7df468c..d1b30b9 100644
> > > > > --- a/mm/memblock.c
> > > > > +++ b/mm/memblock.c
> > > > > @@ -274,24 +274,14 @@ phys_addr_t __init_memblock 
> > > > > memblock_find_in_range_node(phys_addr_t size,
> > > > >  * try bottom-up allocation only when bottom-up mode
> > > > >  * is set and @end is above the kernel image.
> > > > >  */
> > > > > -   if (memblock_bottom_up() && end > kernel_end) {
> > > > > -   phys_addr_t bottom_up_start;
> > > > > -
> > > > > -   /* make sure we will allocate above the kernel */
> > > > > -   bottom_up_start = max(start, kernel_end);
> > > > > -
> > > > > +   if (memblock_bottom_up()) {
> > > > > /* ok, try bottom-up allocation first */
> > > > > -   ret = __memblock_find_range_bottom_up(bottom_up_start, 
> > > > > end,
> > > > > +   ret = __memblock_find_range_bottom_up(start, end,
> > > > >   size, align, nid, 
> > > > > flags);
> > > > > if (ret)
> > > > > return ret;
> > > > >
> > > > > /*
> > > > > -* we always limit bottom-up allocation above the kernel,
> > > > > -* but top-down allocation doesn't have the limit, so
> > > > > -* retrying top-down allocation may succeed when bottom-up
> > > > > -* allocation failed.
> > > > > -*
> > > > >  * bottom-up allocation is expected to be fail very 
> > > > > rarely,
> > > > >  * so we use WARN_ONCE() here to see the stack trace if
> > > > >  * fail happens.
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/lkml/1545966002-3075-3-git-send-email-kernelf...@gmail.com/
> > > > > [2] 
> > > > > https://lore.kernel.org/lkml/1545966002-3075-2-git-send-email-kernelf...@gmail.com/
> > > > >
> > > > > > +
> > > > > > + return ret;
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * __memblock_find_range_top_down - find free area utility, in 
> > > > > > top-down
> > > > > >   * @start: start of candidate range
> > > > > > --
> > > > > > 2.7.4
> > > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours,
> > > > > Mike.
> > > > >
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
> >
>
> --
> Sincerely yours,
> Mike.
>


[BUG] net: huawei: hinic: a possible sleep-in-atomic-context bug in msg_to_mgmt_async

2019-01-10 Thread Jia-Ju Bai

The driver may sleep in an interrupt handler.
The function call path (from bottom to top) in the directory 
"drivers/net/ethernet/huawei/hinic/" in Linux-4.17 is:


[FUNC] down
hinic_hw_mgmt.c, 324: down in msg_to_mgmt_async
hinic_hw_mgmt.c, 408: msg_to_mgmt_async in mgmt_recv_msg_handler
hinic_hw_mgmt.c, 464:mgmt_recv_msg_handler in recv_mgmt_msg_handler
hinic_hw_mgmt.c, 484: recv_mgmt_msg_handler in mgmt_msg_aeqe_handler
hinic_hw_eqs.c, 264: [FUNC_PTR]mgmt_msg_aeqe_handler in aeq_irq_handler
hinic_hw_eqs.c, 355: aeq_irq_handler in eq_irq_handler
hinic_hw_eqs.c, 383: eq_irq_handler in ceq_tasklet

Note that [FUNC_PTR] means a function pointer call.

This bug is found by my static analysis tool (DSAC-2) and checked by my
manual code review.

I do not know how to correctly fix this bug, so I just report it.
A possible way may be to replace up() and down() with spin_lock() and 
spin_unlock().



Best wishes,
Jia-Ju Bai



[BUG] net: huawei: hinic: a possible sleep-in-atomic-context bug in hinic_get_stats64

2019-01-10 Thread Jia-Ju Bai

The driver may sleep while holding a RCU lock.
The function call path (from bottom to top) in Linux-4.17 is:

[FUNC] down
drivers/net/.../hinic/hinic_main.c, 775: down in hinic_get_stats64
net/core/dev.c, 8278: [FUNC_PTR]hinic_get_stats64 in dev_get_stats
net/core/net-sysfs.c, 568: dev_get_stats in netstat_show
net/core/net-sysfs.c, 565: _raw_read_lock in netstat_show

Note that [FUNC_PTR] means a function pointer call.

This bug is found by my static analysis tool (DSAC-2) and checked by my
manual code review.

I do not know how to correctly fix this bug, so I just report it.
A possible way may be to replace up() and down()
with spin_lock() and spin_unlock().


Best wishes,
Jia-Ju Bai


Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Linus Torvalds
On Thu, Jan 10, 2019 at 6:03 PM Dave Chinner  wrote:
>
> On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote:
> > And we *can* do sane things about RWF_NOWAIT. For example, we could
> > start async IO on RWF_NOWAIT, and suddenly it would go from "probe the
> > page cache" to "probe and fill", and be much harder to use as an
> > attack vector..
>
> We can only do that if the application submits the read via AIO and
> has an async IO completion reporting mechanism.

Oh, no, you misunderstand.

RWF_NOWAIT has a lot of situations where it will potentially return
early (the DAX and direct IO ones have their own), but I was thinking
of the one in generic_file_buffered_read(), which triggers when you
don't find a page mapping. That looks like the obvious "probe page
cache" case.

But we could literally move that test down just a few lines. Let it
start read-ahead.

.. and then it will actually trigger on the *second* case instead, where we have

if (!PageUptodate(page)) {
if (iocb->ki_flags & IOCB_NOWAIT) {
put_page(page);
goto would_block;
}

and that's where RWF_MNOWAIT would act.

It would still return EAGAIN.

But it would have started filling the page cache. So now the act of
probing would fill the page cache, and the attacker would be left high
and dry - the fact that the page cache now exists is because of the
attack, not because of whatever it was trying to measure.

See?

But obviously this kind of change only matters if we also have
mincore() not returning the probe data. mincore() obviously can't do
the same kind of read-ahead to defeat things.

  Linus


Re: [PATCH] drivers/md.c: Make bio_alloc_mddev return bio_alloc_bioset

2019-01-10 Thread Marcos Paulo de Souza
ping?

On Sat, Dec 22, 2018 at 08:08:45AM -0200, Marcos Paulo de Souza wrote:
> bio_alloc_bioset return a bio pointer or NULL, so we can avoid storing
> the returned data into a new variable.
> 
> Signed-off-by: Marcos Paulo de Souza 
> ---
>  drivers/md/md.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fc488cb30a94..42e018f014cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -207,15 +207,10 @@ static bool create_on_open = true;
>  struct bio *bio_alloc_mddev(gfp_t gfp_mask, int nr_iovecs,
>   struct mddev *mddev)
>  {
> - struct bio *b;
> -
>   if (!mddev || !bioset_initialized(>bio_set))
>   return bio_alloc(gfp_mask, nr_iovecs);
>  
> - b = bio_alloc_bioset(gfp_mask, nr_iovecs, >bio_set);
> - if (!b)
> - return NULL;
> - return b;
> + return bio_alloc_bioset(gfp_mask, nr_iovecs, >bio_set);
>  }
>  EXPORT_SYMBOL_GPL(bio_alloc_mddev);
>  
> -- 
> 2.16.4
> 

-- 
Thanks,
Marcos


Re: [PATCH] blk_types.h: Use REQ_OP_WRITE in op_is_write

2019-01-10 Thread Marcos Paulo de Souza
ping?

On Sat, Dec 22, 2018 at 08:03:54AM -0200, Marcos Paulo de Souza wrote:
> Instead of just using plain '1', as it improves readability.
> 
> Signed-off-by: Marcos Paulo de Souza 
> ---
>  include/linux/blk_types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 1dcf652ba0aa..905c666a0101 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -377,7 +377,7 @@ static inline void bio_set_op_attrs(struct bio *bio, 
> unsigned op,
>  
>  static inline bool op_is_write(unsigned int op)
>  {
> - return (op & 1);
> + return (op & REQ_OP_WRITE);
>  }
>  
>  /*
> -- 
> 2.16.4
> 

-- 
Thanks,
Marcos


Re: [PATCH] kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7

2019-01-10 Thread Masahiro Yamada
On Fri, Jan 11, 2019 at 3:11 AM Paul Burton  wrote:
>
> Hi Masahiro,
>
> On Thu, Jan 10, 2019 at 11:00:49AM +0900, Masahiro Yamada wrote:
> > On Thu, Jan 10, 2019 at 8:16 AM Paul Burton  wrote:
> > > When building using GCC 4.7 or older, -ffunction-sections & the -pg flag
> > > used by ftrace are incompatible. This causes warnings or build failures
> > > (where -Werror applies) such as the following:
> > >
> > >   arch/mips/generic/init.c:
> > > error: -ffunction-sections disabled; it makes profiling impossible
> > >
> > > This used to be taken into account by the ordering of calls to cc-option
> > > from within the top-level Makefile, which was introduced by commit
> > > 90ad4052e85c ("kbuild: avoid conflict between -ffunction-sections and
> > > -pg on gcc-4.7"). Unfortunately this was broken when the
> > > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to
> > > Kconfig in commit e85d1d65cd8a ("kbuild: test dead code/data elimination
> > > support in Kconfig"), because the flags used by this check no longer
> > > include -pg.
> > >
> > > Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be
> > > enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building
> > > using GCC 4.7 or older.
> > >
> > > Signed-off-by: Paul Burton 
> > > Fixes: e85d1d65cd8a ("kbuild: test dead code/data elimination support in 
> > > Kconfig")
> > > Reported-by: Geert Uytterhoeven 
> > > Cc: Masahiro Yamada 
> > > Cc: Nicholas Piggin 
> > > Cc: sta...@vger.kernel.org # v4.19+
> > > ---
> > >  init/Kconfig | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index d47cb77a220e..c787f782148d 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -1124,6 +1124,7 @@ config LD_DEAD_CODE_DATA_ELIMINATION
> > > bool "Dead code and data elimination (EXPERIMENTAL)"
> > > depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> > > depends on EXPERT
> > > +   depends on !FUNCTION_TRACER || !CC_IS_GCC || GCC_VERSION >= 40800
> > > depends on $(cc-option,-ffunction-sections -fdata-sections)
> > > depends on $(ld-option,--gc-sections)
> > > help
> >
> > Thanks for the fix.
> >
> > I prefer this explicit 'depends on'.
> >
> > Relying on the order of $(call cc-option, ...) in Makefile is fragile.
> >
> > We raise the compiler minimum version from time to time.
> > So, this 'depends on' will eventually go away in the future.
> >
> > BTW, which one do you think more readable?
> >
> > depends on !FUNCTION_TRACER || !CC_IS_GCC || GCC_VERSION >= 40800
> >
> > OR
> >
> > depends on !(FUNCTION_TRACER && CC_IS_GCC && GCC_VERSION < 40800)
>
> Thanks - yes I agree it's nice that this is more explicit than the
> ordering we previously relied upon.
>
> I personally don't mind either of the 2 options above - let me know if
> you'd like me to submit a v2 using your second option.
>
> Thanks,
> Paul



Personally, I slightly prefer this:
   depends on !(FUNCTION_TRACER && CC_IS_GCC && GCC_VERSION < 40800)


It is more consistent with your patch title:
   "Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7"


May I ask v2?



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2 1/2] Move ralink-gdma to its own directory

2019-01-10 Thread George Hilliard
On Thu, Jan 10, 2019, 6:21 PM NeilBrown 
> On Thu, Jan 10 2019, thirtythreefo...@gmail.com wrote:
>
> > From: George Hilliard 
> >
> > This is in preparation to allow it and the mt7621-dma drivers to be
> > built separately.  They are completely independent pieces of software,
> > and the Kconfig specifies very different requirements.
> >
> > Cc: linux-kernel@vger.kernel.org
> > Cc: de...@driverdev.osuosl.org
> > Cc: Neil Brown 
> > Signed-off-by: George Hilliard 
>
> Hi,
>  thanks for taking an interest in these drivers.
>  I original submitted this code because I though I needed it for my
>  mt7621 hardware, but I've subsequently realized that neither of these
>  dma drivers are used in this hardware.
>  Consequently I cannot test any changes you make.
>  But maybe you can - which would be excellent!
>
>  So this is just letting you and Greg know that despite my stated
>  interest, I cannot actually review or test this.
>
> Thanks,
> NeilBrown
>
>

Thanks for the heads up. Honestly I am not sure to what extent I can
test code changes either, at least with the DMA driver. I'm working
with the MT7688, and official docs for it and its cousin the MT7628
are pretty sparse, so I'm currently not even certain that the ralink
gdma driver works for my SoC.

Onion lists these drivers in their OpenWRT device trees' compatible
strings, so they're related. The SPI driver works out of the box at
least. The MMC driver wants to work but needs debugging. All the
drivers need to be better documented. etc. I hope I can improve them,
and I'll make sure I test any actual kernel code I change!

George


[BUG] net: huawei: hinic: a possible sleep-in-atomic-context bug in hinic_get_stats64

2019-01-10 Thread Jia-Ju Bai

The driver may sleep while holding a RCU lock.
The function call path (from bottom to top) in Linux-4.17 is:

[FUNC] down
drivers/net/.../hinic/hinic_main.c, 775: down in hinic_get_stats64
net/core/dev.c, 8278: [FUNC_PTR]hinic_get_stats64 in dev_get_stats
net/core/net-sysfs.c, 568: dev_get_stats in netstat_show
net/core/net-sysfs.c, 565: _raw_read_lock in netstat_show

Note that [FUNC_PTR] means a function pointer call.

This bug is found by my static analysis tool (DSAC-2) and checked by my
manual code review.

I do not know how to correctly fix this bug, so I just report it.
A possible way may be to replace up() and down()
with spin_lock() and spin_unlock().


Best wishes,
Jia-Ju Bai



[git pull] drm fixes for 5.0-rc2

2019-01-10 Thread Dave Airlie
Hi Linus,

Not a huge amount for rc2, assume the usual quiet period, and rc3 will
be most of it.

amdgpu:
- Powerplay fixes
- Virtual display pinning fixes
- Golden register updates for Vega
- Pitch and gem size validation fixes
- SR-IOV init error fix
- Pagetables in system RAM disable for some Raven system
- DP-MST resume fixes

tc358767 bridge:
- fix to work with displayport connector.

Dave.

drm-fixes-2019-01-11:
drm: amdgpu + tc358767 bridge + amd mst s/r fix
The following changes since commit bfeffd155283772bbe78c6a05dec7c0128ee500c:

  Linux 5.0-rc1 (2019-01-06 17:08:20 -0800)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2019-01-11

for you to fetch changes up to f34c48e06ddcc197f2cf7cbc006ceb74e28e1ccf:

  Merge branch 'drm-fixes-5.0' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes (2019-01-11
07:38:56 +1000)


drm: amdgpu + tc358767 bridge + amd mst s/r fix


Christian König (1):
  drm/amdgpu: disable system memory page tables for now

Dave Airlie (2):
  Merge tag 'drm-misc-fixes-2019-01-10' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
  Merge branch 'drm-fixes-5.0' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes

Emily Deng (3):
  drm/amdgpu/virtual_dce: No need to pin the fb's bo
  drm/amdgpu/virtual_dce: No need to pin the cursor bo
  drm/amdgpu/sriov:Correct pfvf exchange logic

Evan Quan (5):
  drm/amd/powerplay: support BOOTUP_DEFAULT power profile mode
  drm/amd/powerplay: update OD support flag for SKU with no OD capabilities
  drm/amd/powerplay: create pp_od_clk_voltage device file under OD support
  drm/amd/powerplay: avoid possible buffer overflow
  drm/amd/powerplay: drop the unnecessary uclk hard min setting

Jim Qu (1):
  drm/amdgpu: set WRITE_BURST_LENGTH to 64B to workaround SDMA1 hang

Kent Russell (1):
  drm/amdgpu: Cleanup 2 compiler warnings

Likun Gao (1):
  drm/amdgpu: make gfx9 enter into rlc safe mode when set MGCG

Lyude Paul (3):
  drm/amdgpu: Don't ignore rc from drm_dp_mst_topology_mgr_resume()
  drm/amdgpu: Don't fail resume process if resuming atomic state fails
  drm/dp_mst: Add __must_check to drm_dp_mst_topology_mgr_resume()

Tao Zhou (1):
  drm/amdgpu: fix CPDMA hang in PRT mode for VEGA20

Tiecheng Zhou (1):
  drm/amdgpu/gfx_v8_0: Reorder the gfx, kiq and kcq rings test sequence

Tomi Valkeinen (7):
  drm/bridge: tc358767: add bus flags
  drm/bridge: tc358767: add defines for DP1_SRCCTRL & PHY_2LANE
  drm/bridge: tc358767: fix single lane configuration
  drm/bridge: tc358767: fix initial DP0/1_SRCCTRL value
  drm/bridge: tc358767: reject modes which require too much BW
  drm/bridge: tc358767: fix output H/V syncs
  drm/bridge: tc358767: use DP connector if no panel set

Yu Zhao (2):
  drm/amdgpu: validate user pitch alignment
  drm/amdgpu: validate user GEM object size

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c| 38 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 22 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c |  3 --
 drivers/gpu/drm/amd/amdgpu/dce_virtual.c   | 17 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 48 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 14 ---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c |  3 +-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 37 +++--
 drivers/gpu/drm/amd/include/kgd_pp_interface.h | 13 +++---
 drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c| 24 ++-
 drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c   |  8 ++--
 drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c | 12 +++---
 drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 34 ++-
 drivers/gpu/drm/amd/powerplay/inc/hwmgr.h  |  2 +-
 drivers/gpu/drm/bridge/tc358767.c  | 48 +-
 include/drm/drm_dp_mst_helper.h|  3 +-
 18 files changed, 221 insertions(+), 119 deletions(-)


Re: [PATCH v3 0/2] perf tests: Check for ARM [vectors] page

2019-01-10 Thread Namhyung Kim
Hello,

Sorry for being so late.

On Thu, Dec 27, 2018 at 05:35:17PM -0800, Florian Fainelli wrote:
> Le 12/27/18 à 2:55 AM, Namhyung Kim a écrit :
> > Hello,
> > 
> > On Thu, Dec 20, 2018 at 07:43:35PM -0800, Florian Fainelli wrote:
> >> Hi all,
> >>
> >> I just painfully learned that perf would segfault when
> >> CONFIG_KUSER_HELPERS is disabled because it unconditionally makes use of
> > 
> > Could you please elaborate?
> 
> Sure, I was debugging why perf was segfaulting on my systems and saw
> that the faulting address was within 0x_ (high vectors); and
> because CONFIG_KUSER_HELPERS was not enabled, nothing was mapped at that
> address so this was a legitimate crash. This was on a variety of ARMv7A
> systems, Cortex-A9, Cortex-A5 etc.
> 
> Later on, I found that in tools/arch/arm/include/asm/barrier.h the
> barriers are unconditionally defined to make use of the [vectors] page
> that the ARM kernel only sets up when CONFIG_KUSER_HELPERS is enabled
> and this is the reason for the crash.
> 
> Testing for the page itself is pretty harmless if you think we should
> make something more robust around checking for HAVE_AUXTRACE_SUPPORT
> (which appears to be the specific location making use of barriers), let
> me know.

Thanks for the explanation.

Is there anything we can do instead if CONFIG_KUSER_HELPERS is not
defined?

I think it'd be better making the barriers into functions (probably
with "static inline") and configurable depending on a result of
runtime checking of the availability (like you did).

The init routine of the auxtrace (or other future users of barriers)
should call an arch-specific function to check the availability then.

Thanks,
Namhyung


Re: use generic DMA mapping code in powerpc V4

2019-01-10 Thread Christian Zigotzky
Next step: 891dcc1072f1fa27a83da920d88daff6ca08fc02 (powerpc/dma: remove 
dma_nommu_dma_supported)


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout 891dcc1072f1fa27a83da920d88daff6ca08fc02

Output:

Note: checking out '891dcc1072f1fa27a83da920d88daff6ca08fc02'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b 

HEAD is now at 891dcc1... powerpc/dma: remove dma_nommu_dma_supported

---

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


Results: PASEMI onboard ethernet works and the X5000 (P5020 board) 
boots. I also successfully tested sound, hardware 3D acceleration, 
Bluetooth, network, booting with a label etc. The uImages work also in a 
virtual e5500 quad-core QEMU machine.


-- Christian


On 09 January 2019 at 10:31AM, Christian Zigotzky wrote:
Next step: a64e18ba191ba9102fb174f27d707485ffd9389c (powerpc/dma: 
remove dma_nommu_get_required_mask)


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout a64e18ba191ba9102fb174f27d707485ffd9389c

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


Results: PASEMI onboard ethernet works and the X5000 (P5020 board) 
boots. I also successfully tested sound, hardware 3D acceleration, 
Bluetooth, network, booting with a label etc. The uImages work also in 
a virtual e5500 quad-core QEMU machine.


-- Christian


On 05 January 2019 at 5:03PM, Christian Zigotzky wrote:
Next step: c446404b041130fbd9d1772d184f24715cf2362f (powerpc/dma: 
remove dma_nommu_mmap_coherent)


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout c446404b041130fbd9d1772d184f24715cf2362f

Output:

Note: checking out 'c446404b041130fbd9d1772d184f24715cf2362f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in 
this

state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. 
Example:


  git checkout -b 

HEAD is now at c446404... powerpc/dma: remove dma_nommu_mmap_coherent

-

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6


Result: PASEMI onboard ethernet works and the X5000 (P5020 board) boots.

-- Christian








Re: [PATCHi v2] mm: put_and_wait_on_page_locked() while page is migrated

2019-01-10 Thread Hugh Dickins
On Thu, 10 Jan 2019, Vlastimil Babka wrote:
> 
> For the record, anyone backporting this to older kernels should make
> sure to also include 605ca5ede764 ("mm/huge_memory.c: reorder operations
> in __split_huge_page_tail()") or they are in for a lot of fun, like me.

Thanks a lot for alerting us all to this, Vlastimil.  Yes, I consider
Konstantin's 605ca5ede764 a must-have, and so had it already in all
the trees on which I was testing put_and_wait_on_page_locked(),
without being aware of the critical role it was playing.

But you do enjoy fun, don't you? So I shouldn't apologize :)

> 
> Long story [1] short, Konstantin was correct in 605ca5ede764 changelog,
> although it wasn't the main known issue he was fixing:
> 
>   clear_compound_head() also must be called before unfreezing page
>   reference because after successful get_page_unless_zero() might follow
>   put_page() which needs correct compound_head().
> 
> Which is exactly what happens in __migration_entry_wait():
> 
> if (!get_page_unless_zero(page))
> goto out;
> pte_unmap_unlock(ptep, ptl);
> put_and_wait_on_page_locked(page); -> does put_page(page)
> 
> while waiting on the THP split (which inserts those migration entries)
> to finish. Before put_and_wait_on_page_locked() it would wait first, and
> only then do put_page() on a page that's no longer tail page, so it
> would work out despite the dangerous get_page_unless_zero() on a tail
> page. Now it doesn't :)

It took me a while to follow there, but yes, agreed.

> 
> Now if only 605ca5ede764 had a CC:stable and a Fixes: tag... Machine
> Learning won this round though, because 605ca5ede764 was added to 4.14
> stable by Sasha...

I'm proud to have passed the Turing test in reverse, but actually
that was me, not ML.  My 173d9d9fd3dd ("mm/huge_memory: splitting set
mapping+index before unfreeze") in 4.20 built upon Konstantin's, so I
included his as a precursor when sending the stable guys pre-XArray
backports.  So Konstantin's is even in 4.9 stable now.

Hugh


Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Thu, Jan 10, 2019 at 02:11:01PM -0800, Linus Torvalds wrote:
> And we *can* do sane things about RWF_NOWAIT. For example, we could
> start async IO on RWF_NOWAIT, and suddenly it would go from "probe the
> page cache" to "probe and fill", and be much harder to use as an
> attack vector..

We can only do that if the application submits the read via AIO and
has an async IO completion reporting mechanism.  Otherwise we have
to wait for the IO to complete to copy the data into the user's
buffer. And given that the app is using RWF_NOWAIT to explicitly
avoid blocking on the IO

Also, keep in mind that RWF_NOWAIT also prevents blocking on
filesystem locks and full request queues. One of the prime drivers
of RWF_NOWAIT was to prevent AIO submission from blocking on
filesystem locks - it allows userspace to submit other IO while it
can't get all the access it requires to a single file or a single
block device is congested.

Hence I don't think there's a such a simple answer here - blocking
for IO breaks RWF_NOWAIT.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH] vfio_pci: set TRACE_INCLUDE_PATH to fix the build error

2019-01-10 Thread Steven Rostedt
On Fri, 11 Jan 2019 12:13:35 +1100
Alexey Kardashevskiy  wrote:

> > The words in TRACE_INCLUDE_PATH can be updated by C preprocessor defines. 
> > For
> > example, if for some reason you had:
> > 
> > #define pci special_pci
> > 
> > The above would turn into:
> > 
> >  ../../drivers/vfio/special_pci
> > 
> > and it wont build, and you will be left scratching your head wondering why. 
> >  
> 
> Lovely :) imho it is +1 for
> CFLAGS_vfio_pci_nvlink2.o += -I$(src)
> and a comment.

A more realistic example is:

#define pci 1

which I hit when I first tried to do it this way when I first
implemented this code (not with "pci" but a similar word).

I'll leave this up to the maintainers of the code to decide which way
they want to do it, as they are the ones that have to deal with the
fallout if something goes wrong ;-)

-- Steve


Re: [PATCH v2] kernel/dma: Fix panic caused by passing swiotlb to command line

2019-01-10 Thread He Zhe



On 1/11/19 9:46 AM, He Zhe wrote:
>
> On 1/11/19 1:27 AM, Konrad Rzeszutek Wilk wrote:
>> Let's skip it. There was another patch that would allocate a default 4MB 
>> size if it there was an misue of swiotlb parameters.
> But this patch mainly fixes a crash. Could you please point me to the patch 
> you mentioned?

And the v2 is modified according to your suggestion.

Zhe

>
> Thanks,
> Zhe
>
>>
>>
>> On Mon, Jan 7, 2019, 4:07 AM Christoph Hellwig >  wrote:
>>
>> On Mon, Jan 07, 2019 at 04:46:51PM +0800, He Zhe wrote:
>> > Kindly ping.
>>
>> Konrad, I'll pick this up through the DMA mapping tree unless you
>> protest in the next few days.
>>
>



Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

2019-01-10 Thread Dave Chinner
On Wed, Jan 09, 2019 at 09:26:41PM -0800, Andy Lutomirski wrote:
> Since direct IO has been brought up, I have a question.  I've wondered
> for years why direct IO works the way it does.  If I were implementing
> it from scratch, my first inclination would be to use the page cache
> instead of fighting it.  To do a single-page direct read, I would look
> that page up in the page cache (i.e. i_pages these days).  If the page
> is there, I would do a normal buffered read.  If the page is not

Therein lies the problem. Copying data is prohibitively expensive,
and that's the primary reason for O_DIRECT existing.  i.e. O_DIRECT
is a low-overhead, zero-copy data movement interface.

The moment we switch from using CPU to dispatch IO to copying data,
performance goes down because we will be unable to keep storage
pipelines full.  IOWs, any rework of O_DIRECT that involves copying
data is a non-starter.

But let's bring this back to the issue at hand - observability of
page cache residency of file pages. If th epage is caceh resident,
then it will have a latency of copying that data out of the page
(i.e. very low latency). If the page is not resident, then it will
do IO and take much, much longer to complete. i.e. we have clear
timing differences between cachce hit and cache miss IO.  This is
exactly the timing information needed for observing page cache
residency.

We need to work out how to make page cache residency less
observable, not add new, near perfect observation mechanisms that
third parties can easily exploit...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH v3 0/6] Static calls

2019-01-10 Thread Nadav Amit
> On Jan 10, 2019, at 4:56 PM, Andy Lutomirski  wrote:
> 
> On Thu, Jan 10, 2019 at 3:02 PM Linus Torvalds
>  wrote:
>> On Thu, Jan 10, 2019 at 12:52 PM Josh Poimboeuf  wrote:
>>> Right, emulating a call instruction from the #BP handler is ugly,
>>> because you have to somehow grow the stack to make room for the return
>>> address.  Personally I liked the idea of shifting the iret frame by 16
>>> bytes in the #DB entry code, but others hated it.
>> 
>> Yeah, I hated it.
>> 
>> But I'm starting to think it's the simplest solution.
>> 
>> So still not loving it, but all the other models have had huge issues too.
> 
> Putting my maintainer hat on:
> 
> I'm okay-ish with shifting the stack by 16 bytes.  If this is done, I
> want an assertion in do_int3() or wherever the fixup happens that the
> write isn't overlapping pt_regs (which is easy to implement because
> that code has the relevant pt_regs pointer).  And I want some code
> that explicitly triggers the fixup when a CONFIG_DEBUG_ENTRY=y or
> similar kernel is built so that this whole mess actually gets
> exercised.  Because the fixup only happens when a
> really-quite-improbable race gets hit, and the issues depend on stack
> alignment, which is presumably why Josh was able to submit a buggy
> series without noticing.
> 
> BUT: this is going to be utterly gross whenever anyone tries to
> implement shadow stacks for the kernel, and we might need to switch to
> a longjmp-like approach if that happens.

Here is an alternative idea (although similar to Steven’s and my code).

Assume that we always clobber R10, R11 on static-calls explicitly, as anyhow
should be done by the calling convention (and gcc plugin should allow us to
enforce). Also assume that we hold a table with all source RIP and the
matching target.

Now, in the int3 handler can you take the faulting RIP and search for it in
the “static-calls” table, writing the RIP+5 (offset) into R10 (return
address) and the target into R11. You make the int3 handler to divert the
code execution by changing pt_regs->rip to point to a new function that does:

push R10
jmp __x86_indirect_thunk_r11

And then you are done. No?



Re: [PATCH v2] kernel/dma: Fix panic caused by passing swiotlb to command line

2019-01-10 Thread He Zhe



On 1/11/19 1:27 AM, Konrad Rzeszutek Wilk wrote:
> Let's skip it. There was another patch that would allocate a default 4MB size 
> if it there was an misue of swiotlb parameters.

But this patch mainly fixes a crash. Could you please point me to the patch you 
mentioned?

Thanks,
Zhe

>
>
>
> On Mon, Jan 7, 2019, 4:07 AM Christoph Hellwig   wrote:
>
> On Mon, Jan 07, 2019 at 04:46:51PM +0800, He Zhe wrote:
> > Kindly ping.
>
> Konrad, I'll pick this up through the DMA mapping tree unless you
> protest in the next few days.
>



Re: [PATCH] vfio_pci: set TRACE_INCLUDE_PATH to fix the build error

2019-01-10 Thread Alex Williamson
On Fri, 11 Jan 2019 12:13:35 +1100
Alexey Kardashevskiy  wrote:

> On 11/01/2019 01:47, Steven Rostedt wrote:
> > On Tue, Jan 08, 2019 at 12:08:03PM +0900, Masahiro Yamada wrote:  
> >> ---
> >>
> >>  drivers/vfio/pci/trace.h | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/vfio/pci/trace.h b/drivers/vfio/pci/trace.h
> >> index 228ccdb..4d13e51 100644
> >> --- a/drivers/vfio/pci/trace.h
> >> +++ b/drivers/vfio/pci/trace.h
> >> @@ -94,7 +94,7 @@ TRACE_EVENT(vfio_pci_npu2_mmap,
> >>  #endif /* _TRACE_VFIO_PCI_H */
> >>  
> >>  #undef TRACE_INCLUDE_PATH
> >> -#define TRACE_INCLUDE_PATH .
> >> +#define TRACE_INCLUDE_PATH ../../drivers/vfio/pci  
> > 
> > Note, the reason why I did not show this method in the samples/trace_events/
> > is that there's one "gotcha" that you need to be careful about. It may not 
> > be
> > an issue here, but please be aware of it.
> > 
> > The words in TRACE_INCLUDE_PATH can be updated by C preprocessor defines. 
> > For
> > example, if for some reason you had:
> > 
> > #define pci special_pci
> > 
> > The above would turn into:
> > 
> >  ../../drivers/vfio/special_pci
> > 
> > and it wont build, and you will be left scratching your head wondering why. 
> >  

Thanks for the info Steve, that'd definitely be a head scratcher, but
it also seems really unlikely for this path.

> Lovely :) imho it is +1 for
> CFLAGS_vfio_pci_nvlink2.o += -I$(src)
> and a comment.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d1fc1176c055c9ec9c6ec4d113a284e0bad9d09a

Obviously we can still refine further, but I don't see this new piece
of information making a meaningful difference in the choice.  Thanks,

Alex


Re: [PATCH v5 2/5] Bluetooth: hci_qca: Deassert RTS while baudrate change command

2019-01-10 Thread Matthias Kaehlcke
On Thu, Jan 10, 2019 at 08:22:12PM +0530, Balakrishna Godavarthi wrote:
> Hi Johan,
> 
> On 2019-01-10 20:09, Johan Hovold wrote:
> > On Thu, Jan 10, 2019 at 08:04:12PM +0530, Balakrishna Godavarthi wrote:
> > > Hi Johan,
> > > 
> > > On 2019-01-09 20:22, Johan Hovold wrote:
> > > > On Thu, Dec 20, 2018 at 08:16:36PM +0530, Balakrishna Godavarthi wrote:
> > > >> This patch will help to stop frame reassembly errors while changing
> > > >> the baudrate. This is because host send a change baudrate request
> > > >> command to the chip with 115200 bps, Whereas chip will change their
> > > >> UART clocks to the enable for new baudrate and sends the response
> > > >> for the change request command with newer baudrate, On host side
> > > >> we are still operating in 115200 bps which results of reading garbage
> > > >> data. Here we are pulling RTS line, so that chip we will wait to send
> > > >> data
> > > >> to host until host change its baudrate.
> > 
> > > >> +  /* Deassert RTS while changing the baudrate of chip and 
> > > >> host.
> > > >> +   * This will prevent chip from transmitting its 
> > > >> response with
> > > >> +   * the new baudrate while the host port is still 
> > > >> operating at
> > > >> +   * the old speed.
> > > >> +   */
> > > >> +  qcadev = serdev_device_get_drvdata(hu->serdev);
> > > >> +  if (qcadev->btsoc_type == QCA_WCN3990)
> > > >> +  serdev_device_set_rts(hu->serdev, false);
> > > >> +
> > > >
> > > > This may not do what you want unless you also disable hardware flow
> > > > control.
> > 
> > > Here my requirement here is to block the chip to send its data before
> > > HOST changes it is baudrate. So if i disable flow control lines of
> > > HOST which will be in low state.  so that the chip will send it data
> > > before HOST change the baudrate of HOST. which results in frame
> > > reassembly error.
> > 
> > Not sure I understand what you're trying to say above. My point is that
> > you cannot reliable control RTS when you have automatic flow control
> > enabled (i.e. it is managed by hardware and it's state reflects whether
> > there's room in the UART receive FIFO).
> > 
> > Johan
> 
> [Bala]: Yes i got your point, but our driver

I suppose with "our driver" you refer to a Qualcomm UART driver like
qcom_geni_serial.c. Unless the Bluetooth controller is really tied to
some specific SoC (e.g. because it is on-chip) you shouldn't make
assumptions about the UART driver or hardware beyond standard
behavior.

But even if we assume that the driver you mention is used, I think you
are rather confirming Johan's concern than dispersing it:

> will not support automatic flow control (based on the FIFO status)
> unless we explicitly enabled via software. i.e. if we enable the
> flow, hardware will look for it else it will not looks for CTS or
> RTS Line.

So we agree that the UART hardware may change RTS if hardware flow
control is enabled?

static int qca_send_power_pulse(struct hci_uart *hu, u8 cmd)
{
  ...
  hci_uart_set_flow_control(hu, false);
  ...
}

I still find it utterly confusing that set_flow_control(false) enables
flow control, but that's what it does, hence after
qca_send_power_pulse() flow control is (re-)enabled.

So far I haven't seen problems with qcom_geni_serial.c overriding the
level set with serdev_device_set_rts(), but I tend to agree with Johan
that this could be a problem (if not with this UART (driver) then with
another). I'm not keen about adding more flow control on/off clutter,
but if that is needed for the driver to operate reliably across
platforms so be it.

Cheers

Matthias


Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible

2019-01-10 Thread Sean Christopherson
On Thu, Jan 10, 2019 at 04:59:55PM -0800, h...@zytor.com wrote:
> On January 10, 2019 9:42:57 AM PST, Sean Christopherson 
>  wrote:
> >On Thu, Jan 10, 2019 at 12:32:43PM -0500, Steven Rostedt wrote:
> >> On Thu, 10 Jan 2019 11:20:04 -0600
> >> Josh Poimboeuf  wrote:
> >> 
> >> 
> >> > > While I can't find a reason for hypervisors to emulate this
> >instruction,
> >> > > smarter people might find ways to turn it into a security
> >exploit.  
> >> > 
> >> > Interesting point... but I wonder if it's a realistic concern. 
> >BTW,
> >> > text_poke_bp() also relies on undocumented behavior.
> >> 
> >> But we did get an official OK from Intel that it will work. Took a
> >bit
> >> of arm twisting to get them to do so, but they did. And it really is
> >> pretty robust.
> >
> >Did we (they?) list any caveats for this behavior?  E.g. I'm fairly
> >certain atomicity guarantees go out the window if WC memtype is used.
> 
> If you run code from non-WB memory, all bets are off and you better
> not be doing cross-modifying code.

I wasn't thinking of running code from non-WB, but rather running code
in WB while doing a CMC write via WC.


Re: x86/sgx: uapi change proposal

2019-01-10 Thread Sean Christopherson
On Thu, Jan 10, 2019 at 04:30:06PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 10, 2019 at 3:54 PM Sean Christopherson
>  wrote:
> >
> > Sort of.  A guest that is running under KVM (i.e. VMX) is much more
> > contained than a random userspace program.  A rogue enclave in a VMX
> > guest can attack the guest kernel/OS, but barring a bug (or more likely,
> > several major bugs) elsewhere in the virtualization stack the enclave
> > can't do anything nasty to the host.  An enclave would let someone hide
> > code, but enclaves are even more restricted than cpl3, i.e. there's not
> > a lot it can do without coordinating with unencrypted code in the guest.
> >
> > And if someone has sufficient permissions to run a KVM guest, they're
> > much more likely to do something malcious in the guest kernel, not an
> > enclave.
> 
> Are you sure?  On my laptop, /dev/kvm is 0666, and that's the distro
> default.  I don't think this is at all unusual.

Wow, that's suprising.  A quick search suggests that this may be Debian
specific[1], e.g. my Ubuntu systems have:

crw-rw 1 root kvm 10, 232 Jan  9 09:30 /dev/kvm

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1431876

>  I'm not particularly
> concerned about a guest attacking itself, but it's conceptually
> straightforward to bypass whatever restrictions the host has by simply
> opening /dev/kvm and sticking your enclave in a VM.

VMs by nature allow a user to bypass all sorts of restrictions, e.g. the
kernel doesn't let userspace run arbitrary cpl0 code, but launch a VM
and voila.  It's what you can do with the new privileges that matters.

> > All that aside, I don't see any justification for singling out SGX for
> > extra scrutiny, there are other ways for a user with KVM permissions to
> > hide malicious code in guest (and at cpl0!), e.g. AMD's SEV{-ES}.
> 
> I'm not singling out SGX.  I'm just saying that the KVM should not
> magically bypass host policy.  If you want to assign a virtual
> function on your NIC to a KVM guest, you need to give your QEMU
> process that privilege.  Similarly, if someone has a MAC policy that
> controls which processes can launch which enclaves and they want to
> run Windows with full SGX support in a VM guest, then they should
> authorize that in their MAC policy by giving QEMU unrestricted launch
> privileges.

MAC systems exist to protect assets, and IMO an enclave isn't an asset.
E.g. AppArmor (via LSM) isn't protecting files, it's protecting the
contents of the file or what can be done with the file.  And the MAC
is only part of the overall protection scheme, e.g. userspace is also
relying on the kernel to not screw up the page tables.

In SGX terms, a LSM hook might use enclave signatures to protect some
asset 'X', e.g. access to persistent identifier.  But that doesn't mean
that whitelisting enclave signatures is the only way to protect 'X'.

> Similarly, if access to a persistent provisioning identifier is
> restricted, access to /dev/kvm shouldn't magically bypass it.  Just
> give the QEMU process the relevant privileges.

Agreed, but that's not same as applying a host's whitelist against a
guest's enclaves.


Re: Aw: Re: [PATCH] drm/mediatek: Add MTK Framebuffer-Device (mt7623)

2019-01-10 Thread CK Hu
Hi, Frank:

On Thu, 2019-01-10 at 20:01 +0100, Frank Wunderlich wrote:
> Hi Daniel,
> 
> > > Would be good to use the new generic fbdev emulation code here, for even
> > > less code. Or at least know why this isn't possible to use for mtk (and
> > > maybe address that in the core code). Hand-rolling fbdev code shouldn't be
> > > needed anymore.
> > 
> > Back on the mailing list, no private replies please:
> 
> i don't wanted to spam all people with dumb questions ;)
> 
> > For examples please grep for drm_fbdev_generic_setup(). There's also a
> > still in-flight series from Gerd Hoffmann to convert over bochs. That,
> > plus all the kerneldoc linked from there should get you started.
> > -Daniel
> 
> this is one of google best founds if i search for drm_fbdev_generic_setup:
> 
> https://lkml.org/lkml/2018/12/19/305
> 
> not very helpful...
> 
> so i tried kernel-doc
> 
> https://www.kernel.org/doc/html/latest/gpu/drm-kms-helpers.html?highlight=drm_fbdev_generic_setup#c.drm_fbdev_generic_setup
> 
> which is nice function-reference but i've found no generic workflow
> 
> as the posted driver is "only" a driver ported from kernel 4.4 by Alexander, 
> i don't know if this new framework can be used and which parts need to be 
> changed. I only try to bring his code Mainline
> Maybe CK Hu can help here because driver is originally from him and he knows 
> internals. Or maybe you can help here?

I could help on this but I'm a little busy now, so I'm not sure how long
this process takes.

Regards,
CK

> 
> i personally make my first steps as spare-time kernel-developer :)
> 
> regards Frank
> 
> ___
> Linux-mediatek mailing list
> linux-media...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek




Re: PROBLEM: syzkaller found / pool corruption-overwrite / page in user-area or NULL

2019-01-10 Thread Qian Cai



On 1/10/19 5:58 PM, Esme wrote:
> The console debug/stacks/info from just now.  The previous config, current 
> kernel from github.
> --
> Esme
> 
> [   75.783231] kasan: CONFIG_KASAN_INLINE enabled
> [   75.785870] kasan: GPF could be caused by NULL-ptr deref or user memory 
> access
> [   75.787695] general protection fault:  [#1] SMP KASAN
> [   75.789084] CPU: 0 PID: 3434 Comm: systemd-journal Not tainted 5.0.0-rc1+ 
> #5
> [   75.790938] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.11.1-1ubuntu1 04/01/2014
> [   75.793150] RIP: 0010:rb_insert_color+0x189/0x1480

What's in that line? Try,

$ ./scripts/faddr2line vmlinux rb_insert_color+0x189/0x1480

What's steps to reproduce this?


Re: [PATCHv2 0/6] perf session: Add reader object

2019-01-10 Thread Namhyung Kim
Hi Jirka,

On Thu, Jan 10, 2019 at 11:12:55AM +0100, Jiri Olsa wrote:
> hi,
> this patchset adds reader object to interface event
> processing for any data. It's defined as:
> 
>   struct reader {
> int fd;
> u64 data_size;
> u64 data_offset;
>   };
> 
> Now we can simply define reader object for arbitrary file
> data portion and pass it to reader__process_events function
> to process its data.
> 
> It's preparation for multiple file storage under perf.data
> directory.

I'm looking forward to seeing it soon! :)

Acked-by: Namhyung Kim 

Thanks,
Namhyung


> 
> Available also in:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
>   perf/reader
> 
> Changes in v2:
>   - rebased on latest perf/core
>   - added dependency patch (1/6) that was omitted in the original post
> 
> thanks,
> jirka
> 
> 
> ---
> Jiri Olsa (6):
>   perf session: Rearrange perf_session__process_events function
>   perf session: Get rid of file_size variable
>   perf session: Add reader object
>   perf session: Add data_size to reader object
>   perf session: Add data_offset to reader object
>   perf session: Add reader__process_events function
> 
>  tools/perf/util/session.c | 85 
> ++---
>  1 file changed, 50 insertions(+), 35 deletions(-)


Re: [PATCH v15 4/6] x86/boot: Introduce bios_get_rsdp_addr() to search RSDP in memory

2019-01-10 Thread Chao Fan
On Thu, Jan 10, 2019 at 10:27:47PM +0100, Borislav Petkov wrote:
>On Mon, Jan 07, 2019 at 11:22:41AM +0800, Chao Fan wrote:
>> Memory information in SRAT table is necessary to fix the conflict
>> between KASLR and memory-hotremove. So RSDP and SRAT should be parsed.
>> 
>> When booting form KEXEC/EFI/BIOS, the methods to compute RSDP
>> are different. When booting from BIOS, there is no variable who can
>> point to RSDP directly, so scan memory for the RSDP and verify RSDP
>> by signature and checksum.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/acpi.c | 86 +
>>  1 file changed, 86 insertions(+)
>
>...
>
>> +/* Search RSDP address, based on acpi_find_root_pointer(). */
>> +static acpi_physical_address bios_get_rsdp_addr(void)
>> +{
>> +u8 *table_ptr;
>> +u32 address;
>> +u8 *rsdp;
>
>But those u8's together:
>
>   u8 *table_ptr, *rsdp;
>   u32 address;

Thanks, will change that.

>
>> +
>> +/* Get the location of the Extended BIOS Data Area (EBDA) */
>> +table_ptr = (u8 *)ACPI_EBDA_PTR_LOCATION;
>> +address = *(u16 *)table_ptr;
>> +address <<= 4;
>> +table_ptr = (u8 *)(long)address;
>> +
>> +/*
>> + * Search EBDA paragraphs (EBDA is required to be a minimum of
>> + * 1K length)
>> + */
>> +if (address > 0x400) {
>> +rsdp = scan_mem_for_rsdp(table_ptr, ACPI_EBDA_WINDOW_SIZE);
>> +if (rsdp) {
>> +address += (u32)ACPI_PTR_DIFF(rsdp, table_ptr);
>> +return address;
>> +}
>> +}
>> +
>> +/* Search upper memory: 16-byte boundaries in Eh-Fh */
>> +table_ptr = (u8 *)ACPI_HI_RSDP_WINDOW_BASE;
>> +rsdp = scan_mem_for_rsdp(table_ptr, ACPI_HI_RSDP_WINDOW_SIZE);
>> +
>
>Superfluous newline.

Will drop it.

Thanks,
Chao Fan

>
>> +if (rsdp) {
>> +address = (u32)(ACPI_HI_RSDP_WINDOW_BASE +
>> +ACPI_PTR_DIFF(rsdp, table_ptr));
>> +return address;
>> +}
>> +return 0;
>> +}
>
>-- 
>Regards/Gruss,
>Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>




Re: [PATCH v3] arm64: dts: meson: Fix mmc cd-gpios polarity

2019-01-10 Thread Kevin Hilman
Loys Ollivier  writes:

> Commit 89a5e15bcba8 ("gpio/mmc/of: Respect polarity in the device tree")
> changed the behavior of "cd-inverted" to follow the device tree bindings
> specification.
> Lines specifying "cd-inverted" are now "active high".
>
> Fix the SD card for meson by setting the cd-gpios as "active low" according
> to the boards specifications.
>
> Fixes: 89a5e15bcba8 ("gpio/mmc/of: Respect polarity in the device tree")
> Signed-off-by: Loys Ollivier 

Thanks for fixing this!

Queued as a fix for v5.1-rc,

I see that this is being fixed in the driver also, but I think fixing
the DTs is necessary also.

Kevin


Re: general protection fault in ebitmap_destroy

2019-01-10 Thread Paul Moore
On Wed, Jan 9, 2019 at 11:11 AM Stephen Smalley  wrote:
> On Wed, 2019-01-09 at 07:41 -0800, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:a88cc8da0279 Merge branch 'akpm' (patches from
> > Andrew)
> > git tree:   upstream
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=1722da4f40
> > kernel config:
> > https://syzkaller.appspot.com/x/.config?x=edf1c3031097c304
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=6664500f0f18f07a5c0e
> > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:
> > https://syzkaller.appspot.com/x/repro.syz?x=12d43580c0
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the
> > commit:
> > Reported-by: syzbot+6664500f0f18f07a5...@syzkaller.appspotmail.com
> >
> > SELinux: failed to load policy
> > sel_write_load: 238 callbacks suppressed
> > SELinux: failed to load policy
> > kasan: CONFIG_KASAN_INLINE enabled
> > kasan: GPF could be caused by NULL-ptr deref or user memory access
> > general protection fault:  [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9316 Comm: syz-executor2 Not tainted 5.0.0-rc1+ #16
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS
> > Google 01/01/2011
> > RIP: 0010:ebitmap_destroy+0x32/0xf0 security/selinux/ss/ebitmap.c:334
> > Code: 49 89 fd 41 54 53 e8 9d e6 36 fe 4d 85 ed 0f 84 99 00 00 00 e8
> > 8f e6
> > 36 fe 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02
> > 00 0f
> > 85 98 00 00 00 49 be 00 00 00 00 00 fc ff df 4d 8b
> > RSP: 0018:88808967f5c0 EFLAGS: 00010202
> > RAX: dc00 RBX: 88808967f6a8 RCX: dc00
> > RDX: 0001 RSI: 834b1081 RDI: 0008
> > RBP: 88808967f5e0 R08: 8880972a8140 R09: ed1015cc5b90
> > R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 888099d993c0
> > R13: 0008 R14: 888099d993c0 R15: 88808967f648
> > FS:  7f70cd9e5700() GS:8880ae60()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 015e7938 CR3: 96c4a000 CR4: 001406f0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
> > Call Trace:
> >   sens_destroy+0x49/0xa0 security/selinux/ss/policydb.c:735
> >   sens_read+0x25d/0x460 security/selinux/ss/policydb.c:1636
> >   policydb_read+0xed9/0x60d0 security/selinux/ss/policydb.c:2430
> >   security_load_policy+0x423/0x1830
> > security/selinux/ss/services.c:2129
> >   sel_write_load+0x25a/0x470 security/selinux/selinuxfs.c:565
> >   __vfs_write+0x116/0xb40 fs/read_write.c:485
> >   vfs_write+0x20c/0x580 fs/read_write.c:549
> >   ksys_write+0x105/0x260 fs/read_write.c:598
> >   __do_sys_write fs/read_write.c:610 [inline]
> >   __se_sys_write fs/read_write.c:607 [inline]
> >   __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457ec9
> > Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
> > 89 f7
> > 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
> > f0 ff
> > ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:7f70cd9e4c78 EFLAGS: 0246 ORIG_RAX:
> > 0001
> > RAX: ffda RBX: 0003 RCX: 00457ec9
> > RDX: 005c RSI: 2000 RDI: 0003
> > RBP: 0073bf00 R08:  R09: 
> > R10:  R11: 0246 R12: 7f70cd9e56d4
> > R13: 004c720f R14: 004dc9a0 R15: 
> > Modules linked in:
> > ---[ end trace 78ea480790940b53 ]---
> > RIP: 0010:ebitmap_destroy+0x32/0xf0 security/selinux/ss/ebitmap.c:334
> > Code: 49 89 fd 41 54 53 e8 9d e6 36 fe 4d 85 ed 0f 84 99 00 00 00 e8
> > 8f e6
> > 36 fe 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02
> > 00 0f
> > 85 98 00 00 00 49 be 00 00 00 00 00 fc ff df 4d 8b
> > RSP: 0018:88808967f5c0 EFLAGS: 00010202
> > RAX: dc00 RBX: 88808967f6a8 RCX: dc00
> > RDX: 0001 RSI: 834b1081 RDI: 0008
> > RBP: 88808967f5e0 R08: 8880972a8140 R09: ed1015cc5b90
> > R10: ed1015cc5b8f R11: 8880ae62dc7b R12: 888099d993c0
> > R13: 0008 R14: 888099d993c0 R15: 88808967f648
> > FS:  7f70cd9e5700() GS:8880ae70()
> > knlGS:
> > CS:  0010 DS:  ES:  CR0: 80050033
> > CR2: 0073c000 CR3: 96c4a000 CR4: 001406e0
> > DR0:  DR1:  DR2: 
> > DR3:  DR6: fffe0ff0 DR7: 0400
>
> Possible fix below
>
> From cc9324299f32db326447a28a836c462fc16bc945 Mon Sep 17 00:00:00 2001
> From: Stephen Smalley 
> Date: Wed, 9 

Re: [PATCH v15 3/6] x86/boot: Introduce efi_get_rsdp_addr() to find RSDP from EFI table

2019-01-10 Thread Chao Fan
On Thu, Jan 10, 2019 at 10:15:23PM +0100, Borislav Petkov wrote:
>On Mon, Jan 07, 2019 at 11:22:40AM +0800, Chao Fan wrote:
>> Memory information in SRAT is necessary to fix the conflict between
>> KASLR and memory-hotremove. So RSDP and SRAT should be parsed.
>> 
>> When booting form KEXEC/EFI/BIOS, the methods to compute RSDP
>> are different. When booting from EFI, EFI table points to RSDP.
>> So parse the EFI table and find the RSDP.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/acpi.c | 83 +
>>  1 file changed, 83 insertions(+)
>> 
>> diff --git a/arch/x86/boot/compressed/acpi.c 
>> b/arch/x86/boot/compressed/acpi.c
>> index 7ca5001d7639..f74c5d033d79 100644
>> --- a/arch/x86/boot/compressed/acpi.c
>> +++ b/arch/x86/boot/compressed/acpi.c
>> @@ -5,6 +5,8 @@
>>  #include "../string.h"
>>  
>>  #include 
>> +#include 
>> +#include 
>>  
>>  /*
>>   * Max length of 64-bit hex address string is 19, prefix "0x" + 16 hex
>> @@ -28,3 +30,84 @@ static acpi_physical_address get_acpi_rsdp(void)
>>  #endif
>>  return 0;
>>  }
>> +
>> +/* Search EFI table for RSDP. */
>> +static acpi_physical_address efi_get_rsdp_addr(void)
>> +{
>> +acpi_physical_address rsdp_addr = 0;
>
>
>< newline here.
>

Will add it.

>> +#ifdef CONFIG_EFI
>> +efi_system_table_t *systab;
>> +struct efi_info *ei;
>> +bool efi_64;
>> +char *sig;
>> +int size;
>> +int i;
>> +
>> +ei = _params->efi_info;
>> +sig = (char *)>efi_loader_signature;
>> +
>> +if (!strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
>> +efi_64 = true;
>> +} else if (!strncmp(sig, EFI32_LOADER_SIGNATURE, 4)) {
>> +efi_64 = false;
>> +} else {
>> +debug_putstr("Wrong EFI loader signature.\n");
>> +return 0;
>> +}
>> +
>> +/* Get systab from boot params. Based on efi_init(). */
>> +#ifdef CONFIG_X86_64
>> +systab = (efi_system_table_t *)(ei->efi_systab | 
>> ((__u64)ei->efi_systab_hi<<32));
>> +#else
>> +if (ei->efi_systab_hi || ei->efi_memmap_hi) {
>> +debug_putstr("Error getting RSDP address: EFI system table 
>> located above 4GB.\n");
>> +return 0;
>> +}
>> +systab = (efi_system_table_t *)ei->efi_systab;
>> +#endif
>> +
>> +if (!systab)
>> +error("EFI system table is not found.");
>
>s/is//

Will drop the 'is'.

>
>> +
>> +/*
>> + * Get EFI tables from systab. Based on efi_config_init() and
>> + * efi_config_parse_tables().
>> + */
>> +size = efi_64 ? sizeof(efi_config_table_64_t) :
>> +sizeof(efi_config_table_32_t);
>> +
>> +for (i = 0; i < systab->nr_tables; i++) {
>> +void *config_tables;
>> +unsigned long table;
>> +efi_guid_t guid;
>> +
>> +config_tables = (void *)(systab->tables + size * i);
>> +if (efi_64) {
>> +efi_config_table_64_t *tmp_table;
>> +u64 table64;
>> +
>> +tmp_table = config_tables;
>> +guid = tmp_table->guid;
>> +table64 = tmp_table->table;
>> +table = table64;
>
>That table64 looks superfluous.

Yes, 'table64' looks superfluous here, but after these lines, there is:
if (!IS_ENABLED(CONFIG_X86_64) && table64 >> 32) {
so the 'table64' is useful here for i386. 'table' is unsigned long, it
can't do the right shift. But the 'table64' who is u64 can do that right
shift.

Thanks,
Chao Fan

>
>-- 
>Regards/Gruss,
>Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>




Re: [PATCH v2 1/2] Move ralink-gdma to its own directory

2019-01-10 Thread NeilBrown
On Thu, Jan 10 2019, thirtythreefo...@gmail.com wrote:

> From: George Hilliard 
>
> This is in preparation to allow it and the mt7621-dma drivers to be
> built separately.  They are completely independent pieces of software,
> and the Kconfig specifies very different requirements.
>
> Cc: linux-kernel@vger.kernel.org
> Cc: de...@driverdev.osuosl.org
> Cc: Neil Brown 
> Signed-off-by: George Hilliard 

Hi,
 thanks for taking an interest in these drivers.
 I original submitted this code because I though I needed it for my
 mt7621 hardware, but I've subsequently realized that neither of these
 dma drivers are used in this hardware.
 Consequently I cannot test any changes you make.
 But maybe you can - which would be excellent!

 So this is just letting you and Greg know that despite my stated
 interest, I cannot actually review or test this.

Thanks,
NeilBrown


> ---
>  drivers/staging/Kconfig   | 2 ++
>  drivers/staging/Makefile  | 1 +
>  drivers/staging/mt7621-dma/Kconfig| 6 --
>  drivers/staging/mt7621-dma/Makefile   | 1 -
>  drivers/staging/ralink-gdma/Kconfig   | 6 ++
>  drivers/staging/ralink-gdma/Makefile  | 3 +++
>  drivers/staging/{mt7621-dma => ralink-gdma}/ralink-gdma.c | 0
>  7 files changed, 12 insertions(+), 7 deletions(-)
>  create mode 100644 drivers/staging/ralink-gdma/Kconfig
>  create mode 100644 drivers/staging/ralink-gdma/Makefile
>  rename drivers/staging/{mt7621-dma => ralink-gdma}/ralink-gdma.c (100%)
>
> diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
> index e4f608815c05..b4cfde38e856 100644
> --- a/drivers/staging/Kconfig
> +++ b/drivers/staging/Kconfig
> @@ -110,6 +110,8 @@ source "drivers/staging/mt7621-spi/Kconfig"
>  
>  source "drivers/staging/mt7621-dma/Kconfig"
>  
> +source "drivers/staging/ralink-gdma/Kconfig"
> +
>  source "drivers/staging/mt7621-mmc/Kconfig"
>  
>  source "drivers/staging/mt7621-eth/Kconfig"
> diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
> index 5868631e8f1b..e095f427177c 100644
> --- a/drivers/staging/Makefile
> +++ b/drivers/staging/Makefile
> @@ -45,6 +45,7 @@ obj-$(CONFIG_SOC_MT7621)+= mt7621-pci/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-pinctrl/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-spi/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-dma/
> +obj-$(CONFIG_SOC_MT7621) += ralink-gdma/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-mmc/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-eth/
>  obj-$(CONFIG_SOC_MT7621) += mt7621-dts/
> diff --git a/drivers/staging/mt7621-dma/Kconfig 
> b/drivers/staging/mt7621-dma/Kconfig
> index 2423c40099d1..b6e48a682c44 100644
> --- a/drivers/staging/mt7621-dma/Kconfig
> +++ b/drivers/staging/mt7621-dma/Kconfig
> @@ -1,9 +1,3 @@
> -config DMA_RALINK
> - tristate "RALINK DMA support"
> - depends on RALINK && !SOC_RT288X
> - select DMA_ENGINE
> - select DMA_VIRTUAL_CHANNELS
> -
>  config MTK_HSDMA
>   tristate "MTK HSDMA support"
>   depends on RALINK && SOC_MT7621
> diff --git a/drivers/staging/mt7621-dma/Makefile 
> b/drivers/staging/mt7621-dma/Makefile
> index d3152d45cf45..c9e3e1619ab0 100644
> --- a/drivers/staging/mt7621-dma/Makefile
> +++ b/drivers/staging/mt7621-dma/Makefile
> @@ -1,4 +1,3 @@
> -obj-$(CONFIG_DMA_RALINK) += ralink-gdma.o
>  obj-$(CONFIG_MTK_HSDMA) += mtk-hsdma.o
>  
>  ccflags-y += -I$(srctree)/drivers/dma
> diff --git a/drivers/staging/ralink-gdma/Kconfig 
> b/drivers/staging/ralink-gdma/Kconfig
> new file mode 100644
> index ..a12b2c672d48
> --- /dev/null
> +++ b/drivers/staging/ralink-gdma/Kconfig
> @@ -0,0 +1,6 @@
> +config DMA_RALINK
> + tristate "RALINK DMA support"
> + depends on RALINK && !SOC_RT288X
> + select DMA_ENGINE
> + select DMA_VIRTUAL_CHANNELS
> +
> diff --git a/drivers/staging/ralink-gdma/Makefile 
> b/drivers/staging/ralink-gdma/Makefile
> new file mode 100644
> index ..5d917e0729bb
> --- /dev/null
> +++ b/drivers/staging/ralink-gdma/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_DMA_RALINK) += ralink-gdma.o
> +
> +ccflags-y += -I$(srctree)/drivers/dma
> diff --git a/drivers/staging/mt7621-dma/ralink-gdma.c 
> b/drivers/staging/ralink-gdma/ralink-gdma.c
> similarity index 100%
> rename from drivers/staging/mt7621-dma/ralink-gdma.c
> rename to drivers/staging/ralink-gdma/ralink-gdma.c
> -- 
> 2.20.1


signature.asc
Description: PGP signature


Re: [PATCH v15 2/6] x86/boot: Introduce get_acpi_rsdp() to parse RSDP in cmdline from KEXEC

2019-01-10 Thread Chao Fan
On Thu, Jan 10, 2019 at 06:01:03PM +0100, Borislav Petkov wrote:
>On Mon, Jan 07, 2019 at 11:22:39AM +0800, Chao Fan wrote:
>> KASLR may randomly choose some positions which are located in movable
>> memory regions. This will make the movable memory chosen by KASLR
>> can't be removed.
>> 
>> Memory information in SRAT is necessary to fix the conflict between
>> KASLR and memory-hotremove.
>> 
>> ACPI SRAT (System/Static Resource Affinity Table) shows the details
>> about memory ranges, including ranges of memory provided by hot-added
>> memory devices. SRAT is introduced by Root System Description
>> Pointer(RSDP). So RSDP should be found firstly.
>> 
>> When booting form KEXEC/EFI/BIOS, the methods to find RSDP
>> are different. When booting from KEXEC, 'acpi_rsdp=' may have been
>> added to cmdline, so parse cmdline to find RSDP.
>> 
>> Signed-off-by: Chao Fan 
>> ---
>>  arch/x86/boot/compressed/acpi.c | 30 ++
>>  1 file changed, 30 insertions(+)
>>  create mode 100644 arch/x86/boot/compressed/acpi.c
>> 
>> diff --git a/arch/x86/boot/compressed/acpi.c 
>> b/arch/x86/boot/compressed/acpi.c
>> new file mode 100644
>> index ..7ca5001d7639
>> --- /dev/null
>> +++ b/arch/x86/boot/compressed/acpi.c
>> @@ -0,0 +1,30 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#define BOOT_CTYPE_H
>> +#include "misc.h"
>> +#include "error.h"
>> +#include "../string.h"
>> +
>> +#include 
>
>Ok, I corrected it to this, please replace in your version:
>
>/*
> * Max length of 64-bit hex address string is 19, prefix "0x" + 16 hex
> * digits, and '\0' for termination.
> */
>#define MAX_ADDR_LEN 19
>
>static acpi_physical_address get_acpi_rsdp(void)
>{
>   acpi_physical_address addr = 0;
>
>#ifdef CONFIG_KEXEC
>   char val[MAX_ADDR_LEN] = { };
>   int ret;
>
>   ret = cmdline_find_option("acpi_rsdp", val, MAX_ADDR_LEN);
>   if (ret < 0)
>   return 0;
>
>   if (kstrtoull(val, 16, ))
>   return 0;
>#endif
>   return addr;
>}
>

Thanks for your suggestion, will change it.

Thanks,
Chao Fan

>
>-- 
>Regards/Gruss,
>Boris.
>
>Good mailing practices for 400: avoid top-posting and trim the reply.
>
>




Re: [PATCH v9 1/2] dmaengine: 8250_mtk_dma: add MediaTek uart DMA support

2019-01-10 Thread Long Cheng
On Thu, 2019-01-10 at 18:33 +0800, Long Cheng wrote:
fix spell error

> On Fri, 2019-01-04 at 22:49 +0530, Vinod Koul wrote:
> > On 02-01-19, 10:12, Long Cheng wrote:
> > > In DMA engine framework, add 8250 uart dma to support MediaTek uart.
> > > If MediaTek uart enabled(SERIAL_8250_MT6577), and want to improve
> > > the performance, can enable the function.
> > 
> > Is the DMA controller UART specific, can it work with other controllers
> > as well, if so you should get rid of uart name in patch
> > 
> 
> I don't know that it can work or not on other controller. but it's for
> MediaTek SOC
> 
> > > +#define MTK_UART_APDMA_CHANNELS  (CONFIG_SERIAL_8250_NR_UARTS * 
> > > 2)
> > 
> > Why are the channels not coming from DT?
> > 
> 
> i will using dma-requests install of it.
> 
i will using 'dma-requests' instead of it.

> > > +
> > > +#define VFF_EN_B BIT(0)
> > > +#define VFF_STOP_B   BIT(0)
> > > +#define VFF_FLUSH_B  BIT(0)
> > > +#define VFF_4G_SUPPORT_B BIT(0)
> > > +#define VFF_RX_INT_EN0_B BIT(0)  /*rx valid size >=  vff thre*/
> > > +#define VFF_RX_INT_EN1_B BIT(1)
> > > +#define VFF_TX_INT_EN_B  BIT(0)  /*tx left size >= vff thre*/
> > 
> > space around /* space */ also run checkpatch to check for style errors
> > 
> 
> ok.
> 
> > > +static void mtk_uart_apdma_start_tx(struct mtk_chan *c)
> > > +{
> > > + unsigned int len, send, left, wpt, d_wpt, tmp;
> > > + int ret;
> > > +
> > > + left = mtk_uart_apdma_read(c, VFF_LEFT_SIZE);
> > > + if (!left) {
> > > + mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
> > > + return;
> > > + }
> > > +
> > > + /* Wait 1sec for flush,  can't sleep*/
> > > + ret = readx_poll_timeout(readl, c->base + VFF_FLUSH, tmp,
> > > + tmp != VFF_FLUSH_B, 0, 100);
> > > + if (ret)
> > > + dev_warn(c->vc.chan.device->dev, "tx: fail, debug=0x%x\n",
> > > + mtk_uart_apdma_read(c, VFF_DEBUG_STATUS));
> > > +
> > > + send = min_t(unsigned int, left, c->desc->avail_len);
> > > + wpt = mtk_uart_apdma_read(c, VFF_WPT);
> > > + len = mtk_uart_apdma_read(c, VFF_LEN);
> > > +
> > > + d_wpt = wpt + send;
> > > + if ((d_wpt & VFF_RING_SIZE) >= len) {
> > > + d_wpt = d_wpt - len;
> > > + d_wpt = d_wpt ^ VFF_RING_WRAP;
> > > + }
> > > + mtk_uart_apdma_write(c, VFF_WPT, d_wpt);
> > > +
> > > + c->desc->avail_len -= send;
> > > +
> > > + mtk_uart_apdma_write(c, VFF_INT_EN, VFF_TX_INT_EN_B);
> > > + if (mtk_uart_apdma_read(c, VFF_FLUSH) == 0U)
> > > + mtk_uart_apdma_write(c, VFF_FLUSH, VFF_FLUSH_B);
> > > +}
> > > +
> > > +static void mtk_uart_apdma_start_rx(struct mtk_chan *c)
> > > +{
> > > + struct mtk_uart_apdma_desc *d = c->desc;
> > > + unsigned int len, wg, rg, cnt;
> > > +
> > > + if ((mtk_uart_apdma_read(c, VFF_VALID_SIZE) == 0U) ||
> > > + !d || !vchan_next_desc(>vc))
> > > + return;
> > > +
> > > + len = mtk_uart_apdma_read(c, VFF_LEN);
> > > + rg = mtk_uart_apdma_read(c, VFF_RPT);
> > > + wg = mtk_uart_apdma_read(c, VFF_WPT);
> > > + if ((rg ^ wg) & VFF_RING_WRAP)
> > > + cnt = (wg & VFF_RING_SIZE) + len - (rg & VFF_RING_SIZE);
> > > + else
> > > + cnt = (wg & VFF_RING_SIZE) - (rg & VFF_RING_SIZE);
> > > +
> > > + c->rx_status = cnt;
> > > + mtk_uart_apdma_write(c, VFF_RPT, wg);
> > > +
> > > + list_del(>vd.node);
> > > + vchan_cookie_complete(>vd);
> > > +}
> > 
> > this looks odd, why do you have different rx and tx start routines?
> > 
> 
> Would you like explain it in more detail? thanks.
> In tx function, will wait the last data flush done. and the count the
> size that send.
> In Rx function, will count the size that receive.
> Any way, in rx / tx, need andle WPT or RPT.
> 
Any way, in rx / tx, need handle WPT or RPT.

> > > +static int mtk_uart_apdma_alloc_chan_resources(struct dma_chan *chan)
> > > +{
> > > + struct mtk_uart_apdmadev *mtkd = to_mtk_uart_apdma_dev(chan->device);
> > > + struct mtk_chan *c = to_mtk_uart_apdma_chan(chan);
> > > + u32 tmp;
> > > + int ret;
> > > +
> > > + pm_runtime_get_sync(mtkd->ddev.dev);
> > > +
> > > + mtk_uart_apdma_write(c, VFF_ADDR, 0);
> > > + mtk_uart_apdma_write(c, VFF_THRE, 0);
> > > + mtk_uart_apdma_write(c, VFF_LEN, 0);
> > > + mtk_uart_apdma_write(c, VFF_RST, VFF_WARM_RST_B);
> > > +
> > > + ret = readx_poll_timeout(readl, c->base + VFF_EN, tmp,
> > > + tmp == 0, 10, 100);
> > > + if (ret) {
> > > + dev_err(chan->device->dev, "dma reset: fail, timeout\n");
> > > + return ret;
> > > + }
> > 
> > register read does reset?
> > 
> 
> 'mtk_uart_apdma_write(c, VFF_RST, VFF_WARM_RST_B);' is reset. resd just
> poll reset done.
> 
> > > +
> > > + if (!c->requested) {
> > > + c->requested = true;
> > > + ret = request_irq(mtkd->dma_irq[chan->chan_id],
> > > +   mtk_uart_apdma_irq_handler, IRQF_TRIGGER_NONE,
> > > +   KBUILD_MODNAME, chan);
> > 
> > why is the irq not requested in 

<    1   2   3   4   5   6   7   8   9   10   >