Re: [PATCH 1/3] mm/gup: introduce __put_user_pages()

2019-07-22 Thread Christoph Hellwig
On Mon, Jul 22, 2019 at 03:34:13PM -0700, john.hubb...@gmail.com wrote:
> +enum pup_flags_t {
> + PUP_FLAGS_CLEAN = 0,
> + PUP_FLAGS_DIRTY = 1,
> + PUP_FLAGS_LOCK  = 2,
> + PUP_FLAGS_DIRTY_LOCK= 3,
> +};

Well, the enum defeats the ease of just being able to pass a boolean
expression to the function, which would simplify a lot of the caller,
so if we need to support the !locked version I'd rather see that as
a separate helper.

But do we actually have callers where not using the _lock version is
not a bug?  set_page_dirty makes sense in the context of a file systems
that have a reference to the inode the page hangs off, but that is
(almost?) never the case for get_user_pages.


Re: [PATCH] [RFC] dmaengine: add fifo_size member

2019-07-22 Thread Sameer Pujar



On 7/19/2019 10:34 AM, Vinod Koul wrote:

On 05-07-19, 11:45, Sameer Pujar wrote:

Hi Vinod,

What are your final thoughts regarding this?

Hi sameer,

Sorry for the delay in replying

On this, I am inclined to think that dma driver should not be involved.
The ADMAIF needs this configuration and we should take the path of
dma_router for this piece and add features like this to it


Hi Vinod,

The configuration is needed by both ADMA and ADMAIF. The size is 
configurable

on ADMAIF side. ADMA needs to know this info and program accordingly.
Not sure if dma_router can help to achieve this.

I checked on dma_router. It would have been useful when a configuration 
exported

via ADMA, had to be applied to ADMAIF. Please correct me if I am wrong here.


Thanks,
Sameer.


Where does ADMAIF driver reside in kernel, who configures it for normal
dma txns..?

Not yet, we are in the process of upstreaming ADMAIF driver.
To describe briefly, audio subsystem is using ALSA SoC(ASoC) layer.
ADMAIF is
registered as platform driver and exports DMA functionality. It
registers PCM
devices for each Rx/Tx ADMAIF channel. During PCM playback/capture
operations,
ALSA callbacks configure DMA channel using API dmaengine_slave_config().
RFC patch proposed, is to help populate FIFO_SIZE value as well during
above
call, since ADMA requires it.

Also it wold have helped the long discussion if that part was made clear
rather than talking about peripheral all this time :(

Thought it was clear, though should have avoided using 'peripheral' in
the
discussions. Sorry for the confusion.


Re: [PATCHv2] mm: treewide: Clarify pgtable_page_{ctor,dtor}() naming

2019-07-22 Thread Mike Rapoport
On Mon, Jul 22, 2019 at 03:11:33PM +0100, Mark Rutland wrote:
> The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
> people, and until recently arm64 used these erroneously/pointlessly for
> other levels of page table.
> 
> To make it incredibly clear that these only apply to the PTE level, and
> to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename
> them to pgtable_pte_page_{ctor,dtor}().
> 
> These changes were generated with the following shell script:
> 
> 
> git grep -lw 'pgtable_page_.tor' | while read FILE; do
> sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
> sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
> done
> 
> 
> ... with the documentation re-flowed to remain under 80 columns, and
> whitespace fixed up in macros to keep backslashes aligned.
> 
> There should be no functional change as a result of this patch.
> 
> Signed-off-by: Mark Rutland 
> Cc: Andrew Morton 
> Cc: Anshuman Khandual 
> Cc: Matthew Wilcox 
> Cc: Michal Hocko 
> Cc: Yu Zhao 
> Cc: linux...@kvack.org

Reviewed-by: Mike Rapoport 

> ---
>  Documentation/vm/split_page_table_lock.rst | 10 +-
>  arch/arc/include/asm/pgalloc.h |  4 ++--
>  arch/arm/include/asm/tlb.h |  2 +-
>  arch/arm/mm/mmu.c  |  2 +-
>  arch/arm64/include/asm/tlb.h   |  2 +-
>  arch/arm64/mm/mmu.c|  2 +-
>  arch/csky/include/asm/pgalloc.h|  2 +-
>  arch/hexagon/include/asm/pgalloc.h |  2 +-
>  arch/ia64/include/asm/pgalloc.h|  4 ++--
>  arch/m68k/include/asm/mcf_pgalloc.h|  6 +++---
>  arch/m68k/include/asm/motorola_pgalloc.h   |  6 +++---
>  arch/m68k/include/asm/sun3_pgalloc.h   |  2 +-
>  arch/microblaze/include/asm/pgalloc.h  |  4 ++--
>  arch/mips/include/asm/pgalloc.h|  2 +-
>  arch/nios2/include/asm/pgalloc.h   |  2 +-
>  arch/openrisc/include/asm/pgalloc.h|  6 +++---
>  arch/powerpc/mm/pgtable-frag.c |  6 +++---
>  arch/riscv/include/asm/pgalloc.h   |  2 +-
>  arch/s390/mm/pgalloc.c |  6 +++---
>  arch/sh/include/asm/pgalloc.h  |  6 +++---
>  arch/sparc/mm/init_64.c|  4 ++--
>  arch/sparc/mm/srmmu.c  |  4 ++--
>  arch/um/include/asm/pgalloc.h  |  2 +-
>  arch/unicore32/include/asm/tlb.h   |  2 +-
>  arch/x86/mm/pgtable.c  |  2 +-
>  arch/xtensa/include/asm/pgalloc.h  |  4 ++--
>  include/asm-generic/pgalloc.h  |  8 
>  include/linux/mm.h |  4 ++--
>  28 files changed, 54 insertions(+), 54 deletions(-)
> 
> Since v1 [1]:
> * Rebase to v5.3-rc1
> * Use shell rather than coccinelle
> 
> [1] https://lore.kernel.org/r/20190610163354.24835-1-mark.rutl...@arm.com
> 
> diff --git a/Documentation/vm/split_page_table_lock.rst 
> b/Documentation/vm/split_page_table_lock.rst
> index 889b00be469f..ff51f4a5494d 100644
> --- a/Documentation/vm/split_page_table_lock.rst
> +++ b/Documentation/vm/split_page_table_lock.rst
> @@ -54,9 +54,9 @@ Hugetlb-specific helpers:
>  Support of split page table lock by an architecture
>  ===
> 
> -There's no need in special enabling of PTE split page table lock:
> -everything required is done by pgtable_page_ctor() and pgtable_page_dtor(),
> -which must be called on PTE table allocation / freeing.
> +There's no need in special enabling of PTE split page table lock: everything
> +required is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), 
> which
> +must be called on PTE table allocation / freeing.
> 
>  Make sure the architecture doesn't use slab allocator for page table
>  allocation: slab uses page->slab_cache for its pages.
> @@ -74,7 +74,7 @@ paths: i.e X86_PAE preallocate few PMDs on pgd_alloc().
> 
>  With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.
> 
> -NOTE: pgtable_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
> +NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail -- it must
>  be handled properly.
> 
>  page->ptl
> @@ -94,7 +94,7 @@ trick:
> split lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costs
> one more cache line for indirect access;
> 
> -The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
> +The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and in
>  pgtable_pmd_page_ctor() for PMD table.
> 
>  Please, never access page->ptl directly -- use appropriate helper.
> diff --git a/arch/arc/include/asm/pgalloc.h b/arch/arc/include/asm/pgalloc.h
> index 9bdb8ed5b0db..c2b754b63846 100644
> --- a/arch/arc/include/asm/pgalloc.h
> +++ b/arch/arc/include/asm/pgalloc.h
> @@ -108,7 +108,7 @@ pte_alloc_one(struct mm_struct *mm)
>   return 0;
>   memzero((void *)pte_pg, PTRS_PER_PTE * sizeof(pte_t));
>   page = 

[PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64

2019-07-22 Thread Hanjun Guo
From: Jia He 

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic on x86 due to specific memory mapping on x86_64 which will
skip valid pfns as well, so Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.

Daniel said:
"On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines."

Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
selected for arm64, using the new config option to guard the
memblock_next_valid_pfn().

This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
is pretty impressive for bootmem_init() at boot:

with 384G memory,
before: 13310ms
after:  1415ms

with 1T memory,
before: 20s
after:  2s

Suggested-by: Daniel Vacek 
Signed-off-by: Jia He 
Signed-off-by: Hanjun Guo 
---
 arch/arm64/Kconfig |  1 +
 include/linux/mmzone.h |  9 +
 mm/Kconfig |  3 +++
 mm/memblock.c  | 31 +++
 mm/page_alloc.c|  4 +++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 697ea0510729..058eb26579be 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
 
 config HAVE_ARCH_PFN_VALID
def_bool y
+   select HAVE_MEMBLOCK_PFN_VALID
 
 config HW_PERF_EVENTS
def_bool y
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..24cb6bdb1759 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
 #endif
 
 #define early_pfn_valid(pfn)   pfn_valid(pfn)
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#define next_valid_pfn(pfn)memblock_next_valid_pfn(pfn)
+#endif
 void sparse_init(void);
 #else
 #define sparse_init()  do {} while (0)
@@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
 #define early_pfn_valid(pfn)   (1)
 #endif
 
+/* fallback to default definitions */
+#ifndef next_valid_pfn
+#define next_valid_pfn(pfn)(pfn + 1)
+#endif
+
 void memory_present(int nid, unsigned long start, unsigned long end);
 
 /*
diff --git a/mm/Kconfig b/mm/Kconfig
index f0c76ba47695..c578374b6413 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
bool
 
+config HAVE_MEMBLOCK_PFN_VALID
+   bool
+
 config HAVE_GENERIC_GUP
bool
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61ae666a..d57ba51bb9cd 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
return 0;
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   /* addr is within the region, so pfn is valid */
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+   else
+   return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 /**
  * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..70933c40380a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5811,8 +5811,10 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * function.  They do not exist on hotplugged memory.
 */
if (context == MEMMAP_EARLY) {
-   if (!early_pfn_valid(pfn))
+   if (!early_pfn_valid(pfn)) {
+   pfn = next_valid_pfn(pfn) - 1;
continue;
+   }
if (!early_pfn_in_nid(pfn, nid))
 

[PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn

2019-07-22 Thread Hanjun Guo
From: Jia He 

After skipping some invalid pfns in memmap_init_zone(), there is still
some room for improvement.

E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
instead of doing the binary search in memblock_next_valid_pfn.

Furthermore, if the pfn is in a gap of two memory region, skip to next
region directly to speedup the binary search.

Signed-off-by: Jia He 
Signed-off-by: Hanjun Guo 
---
 mm/memblock.c | 37 +++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index d57ba51bb9cd..95d5916716a0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1256,28 +1256,53 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
 unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
 {
struct memblock_type *type = 
+   struct memblock_region *regions = type->regions;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
+   unsigned long start_pfn, end_pfn, next_start_pfn;
phys_addr_t addr = PFN_PHYS(++pfn);
+   static int early_region_idx __initdata_memblock = -1;
 
+   /* fast path, return pfn+1 if next pfn is in the same region */
+   if (early_region_idx != -1) {
+   start_pfn = PFN_DOWN(regions[early_region_idx].base);
+   end_pfn = PFN_DOWN(regions[early_region_idx].base +
+   regions[early_region_idx].size);
+
+   if (pfn >= start_pfn && pfn < end_pfn)
+   return pfn;
+
+   /* try slow path */
+   if (++early_region_idx == type->cnt)
+   goto slow_path;
+
+   next_start_pfn = PFN_DOWN(regions[early_region_idx].base);
+
+   if (pfn >= end_pfn && pfn <= next_start_pfn)
+   return next_start_pfn;
+   }
+
+slow_path:
+   /* slow path, do the binary searching */
do {
mid = (right + left) / 2;
 
-   if (addr < type->regions[mid].base)
+   if (addr < regions[mid].base)
right = mid;
-   else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
+   else if (addr >= (regions[mid].base + regions[mid].size))
left = mid + 1;
else {
-   /* addr is within the region, so pfn is valid */
+   early_region_idx = mid;
return pfn;
}
} while (left < right);
 
if (right == type->cnt)
return -1UL;
-   else
-   return PHYS_PFN(type->regions[right].base);
+
+   early_region_idx = right;
+
+   return PHYS_PFN(regions[early_region_idx].base);
 }
 EXPORT_SYMBOL(memblock_next_valid_pfn);
 #endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
-- 
2.19.1



[PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64

2019-07-22 Thread Hanjun Guo
Here is new version of "[PATCH v11 0/3] remain and optimize
memblock_next_valid_pfn on arm and arm64" from Jia He, which is suggested
by Ard to respin this patch set [1].

In the new version, I squashed patch 1/3 and patch 2/3 in v11 into
one patch, fixed a bug for possible out of bound accessing the
regions, and just introduce memblock_next_valid_pfn() for arm64 only
as I don't have a arm32 platform to test.

Ard asked to "with the new data points added for documentation, and
crystal clear about how the meaning of PFN validity differs between
ARM and other architectures, and why the assumptions that the
optimization is based on are guaranteed to hold", to be honest, I
didn't see PFN validity differs between ARM and x86 architecture,
but there is a bug in commit b92df1de5d28 ("mm: page_alloc: skip over
regions of invalid pfns where possible") which has a possible out of
bound accessing the regions as well, so not sure that is the root cause.

Testing on a HiSilicon ARM64 server (a 4 sockets system), I can get
pretty much speedup for bootmem_init() at boot:

with 384G memory,
before: 13310ms
after:  1415ms
   
with 1T memory,
before: 20s
after:  2s

[1]: https://lkml.org/lkml/2019/6/10/412

Jia He (2):
  mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  mm: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn

 arch/arm64/Kconfig |  1 +
 include/linux/mmzone.h |  9 +++
 mm/Kconfig |  3 +++
 mm/memblock.c  | 56 ++
 mm/page_alloc.c|  4 ++-
 5 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.19.1



Re: [RESEND PATCH 1/2] clk: imx8mm: rename lcdif pixel clock

2019-07-22 Thread Shawn Guo
On Wed, Jul 10, 2019 at 04:13:37AM +, Fancy Fang wrote:
> Rename 'lcdif' pixel clock related names to 'disp' names, since:
> 
> First, the lcdif pixel clock is not supplied to LCDIF controller
> directly, but to some LPCG clock in display mix. So rename it to
> 'disp' pixel clock is more accurate.
> 
> Second, in the imx8mn CCM specification which is designed after
> imx8mm, this same pixel root clock name has been modified from
> 'LCDIF_PIXEL_CLK_ROOT' to 'DISPLAY_PIXEL_CLK_ROOT'.
> 
> Signed-off-by: Fancy Fang 
> ---

When you resend patches, please state the reason for resending.

Shawn

>  drivers/clk/imx/clk-imx8mm.c | 4 ++--
>  include/dt-bindings/clock/imx8mm-clock.h | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/clk/imx/clk-imx8mm.c b/drivers/clk/imx/clk-imx8mm.c
> index 6b8e75df994d..42f1227a4952 100644
> --- a/drivers/clk/imx/clk-imx8mm.c
> +++ b/drivers/clk/imx/clk-imx8mm.c
> @@ -210,7 +210,7 @@ static const char *imx8mm_pcie1_aux_sels[] = {"osc_24m", 
> "sys_pll2_200m", "sys_p
>  static const char *imx8mm_dc_pixel_sels[] = {"osc_24m", "video_pll1_out", 
> "audio_pll2_out", "audio_pll1_out",
>"sys_pll1_800m", "sys_pll2_1000m", 
> "sys_pll3_out", "clk_ext4", };
>  
> -static const char *imx8mm_lcdif_pixel_sels[] = {"osc_24m", "video_pll1_out", 
> "audio_pll2_out", "audio_pll1_out",
> +static const char *imx8mm_disp_pixel_sels[] = {"osc_24m", "video_pll1_out", 
> "audio_pll2_out", "audio_pll1_out",
>   "sys_pll1_800m", 
> "sys_pll2_1000m", "sys_pll3_out", "clk_ext4", };
>  
>  static const char *imx8mm_sai1_sels[] = {"osc_24m", "audio_pll1_out", 
> "audio_pll2_out", "video_pll1_out",
> @@ -535,7 +535,7 @@ static int __init imx8mm_clocks_init(struct device_node 
> *ccm_node)
>   clks[IMX8MM_CLK_PCIE1_PHY] = imx8m_clk_composite("pcie1_phy", 
> imx8mm_pcie1_phy_sels, base + 0xa380);
>   clks[IMX8MM_CLK_PCIE1_AUX] = imx8m_clk_composite("pcie1_aux", 
> imx8mm_pcie1_aux_sels, base + 0xa400);
>   clks[IMX8MM_CLK_DC_PIXEL] = imx8m_clk_composite("dc_pixel", 
> imx8mm_dc_pixel_sels, base + 0xa480);
> - clks[IMX8MM_CLK_LCDIF_PIXEL] = imx8m_clk_composite("lcdif_pixel", 
> imx8mm_lcdif_pixel_sels, base + 0xa500);
> + clks[IMX8MM_CLK_DISP_PIXEL] = imx8m_clk_composite("disp_pixel", 
> imx8mm_disp_pixel_sels, base + 0xa500);
>   clks[IMX8MM_CLK_SAI1] = imx8m_clk_composite("sai1", imx8mm_sai1_sels, 
> base + 0xa580);
>   clks[IMX8MM_CLK_SAI2] = imx8m_clk_composite("sai2", imx8mm_sai2_sels, 
> base + 0xa600);
>   clks[IMX8MM_CLK_SAI3] = imx8m_clk_composite("sai3", imx8mm_sai3_sels, 
> base + 0xa680);
> diff --git a/include/dt-bindings/clock/imx8mm-clock.h 
> b/include/dt-bindings/clock/imx8mm-clock.h
> index 07e6c686f3ef..91ef77efebd9 100644
> --- a/include/dt-bindings/clock/imx8mm-clock.h
> +++ b/include/dt-bindings/clock/imx8mm-clock.h
> @@ -119,7 +119,7 @@
>  #define IMX8MM_CLK_PCIE1_PHY 104
>  #define IMX8MM_CLK_PCIE1_AUX 105
>  #define IMX8MM_CLK_DC_PIXEL  106
> -#define IMX8MM_CLK_LCDIF_PIXEL   107
> +#define IMX8MM_CLK_DISP_PIXEL107
>  #define IMX8MM_CLK_SAI1  108
>  #define IMX8MM_CLK_SAI2  109
>  #define IMX8MM_CLK_SAI3  110
> -- 
> 2.17.1
> 


Re: WARNING in __mmdrop

2019-07-22 Thread Jason Wang



On 2019/7/23 下午1:02, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 11:55:28AM +0800, Jason Wang wrote:

On 2019/7/22 下午4:02, Michael S. Tsirkin wrote:

On Mon, Jul 22, 2019 at 01:21:59PM +0800, Jason Wang wrote:

On 2019/7/21 下午6:02, Michael S. Tsirkin wrote:

On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:

syzbot has bisected this bug to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +

   vhost: access vq metadata through kernel virtual address

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
start commit:   6d21a41b Add linux-next specific files for 20190718
git tree:   linux-next
final crash:https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
console output: https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
kernel config:  https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
dashboard link: https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10139e6860

Reported-by: syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual
address")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

OK I poked at this for a bit, I see several things that
we need to fix, though I'm not yet sure it's the reason for
the failures:


1. mmu_notifier_register shouldn't be called from vhost_vring_set_num_addr
  That's just a bad hack,

This is used to avoid holding lock when checking whether the addresses are
overlapped. Otherwise we need to take spinlock for each invalidation request
even if it was the va range that is not interested for us. This will be very
slow e.g during guest boot.

KVM seems to do exactly that.
I tried and guest does not seem to boot any slower.
Do you observe any slowdown?


Yes I do.



Now I took a hard look at the uaddr hackery it really makes
me nervious. So I think for this release we want something
safe, and optimizations on top. As an alternative revert the
optimization and try again for next merge window.


Will post a series of fixes, let me know if you're ok with that.

Thanks

I'd prefer you to take a hard look at the patch I posted
which makes code cleaner,



I did. But it looks to me a series that is only about 60 lines of code 
can fix all the issues we found without reverting the uaddr optimization.




  and ad optimizations on top.
But other ways could be ok too.



I'm waiting for the test result from syzbot and will post. Let's see if 
you are OK with that.


Thanks






Re: Linux 5.3-rc1

2019-07-22 Thread Christoph Hellwig
The fix was sent last morning my time:

https://marc.info/?l=linux-scsi=156378725427719=2


Re: [PATCH -tip] lib/timerqueue: Rely on rbtree semantics for next timer

2019-07-22 Thread Davidlohr Bueso

ping (with the merge window now closed).


Re: [PATCH v5 1/5] mm: introduce MADV_COLD

2019-07-22 Thread Minchan Kim
On Wed, Jul 17, 2019 at 03:14:57PM -0700, Suren Baghdasaryan wrote:
> Hi Minchan,
> Couple comments inline.
> Thanks!
> 
> On Sun, Jul 14, 2019 at 4:34 PM Minchan Kim  wrote:
> >
> > When a process expects no accesses to a certain memory range, it could
> > give a hint to kernel that the pages can be reclaimed when memory pressure
> > happens but data should be preserved for future use.  This could reduce
> > workingset eviction so it ends up increasing performance.
> >
> > This patch introduces the new MADV_COLD hint to madvise(2) syscall.
> > MADV_COLD can be used by a process to mark a memory range as not expected
> > to be used in the near future. The hint can help kernel in deciding which
> > pages to evict early during memory pressure.
> >
> > It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
> >
> > active file page -> inactive file LRU
> > active anon page -> inacdtive anon LRU
> >
> > Unlike MADV_FREE, it doesn't move active anonymous pages to inactive
> > file LRU's head because MADV_COLD is a little bit different symantic.
> > MADV_FREE means it's okay to discard when the memory pressure because
> > the content of the page is *garbage* so freeing such pages is almost zero
> > overhead since we don't need to swap out and access afterward causes just
> > minor fault. Thus, it would make sense to put those freeable pages in
> > inactive file LRU to compete other used-once pages. It makes sense for
> > implmentaion point of view, too because it's not swapbacked memory any
> > longer until it would be re-dirtied. Even, it could give a bonus to make
> > them be reclaimed on swapless system. However, MADV_COLD doesn't mean
> > garbage so reclaiming them requires swap-out/in in the end so it's bigger
> > cost. Since we have designed VM LRU aging based on cost-model, anonymous
> > cold pages would be better to position inactive anon's LRU list, not file
> > LRU. Furthermore, it would help to avoid unnecessary scanning if system
> > doesn't have a swap device. Let's start simpler way without adding
> > complexity at this moment. However, keep in mind, too that it's a caveat
> > that workloads with a lot of pages cache are likely to ignore MADV_COLD
> > on anonymous memory because we rarely age anonymous LRU lists.
> >
> > * man-page material
> >
> > MADV_COLD (since Linux x.x)
> >
> > Pages in the specified regions will be treated as less-recently-accessed
> > compared to pages in the system with similar access frequencies.
> > In contrast to MADV_FREE, the contents of the region are preserved
> > regardless of subsequent writes to pages.
> >
> > MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
> > pages.
> >
> > * v2
> >  * add up the warn with lots of page cache workload - mhocko
> >  * add man page stuff - dave
> >
> > * v1
> >  * remove page_mapcount filter - hannes, mhocko
> >  * remove idle page handling - joelaf
> >
> > * RFCv2
> >  * add more description - mhocko
> >
> > * RFCv1
> >  * renaming from MADV_COOL to MADV_COLD - hannes
> >
> > * internal review
> >  * use clear_page_youn in deactivate_page - joelaf
> >  * Revise the description - surenb
> >  * Renaming from MADV_WARM to MADV_COOL - surenb
> >
> > Acked-by: Michal Hocko 
> > Acked-by: Johannes Weiner 
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/linux/swap.h   |   1 +
> >  include/uapi/asm-generic/mman-common.h |   1 +
> >  mm/internal.h  |   2 +-
> >  mm/madvise.c   | 180 -
> >  mm/oom_kill.c  |   2 +-
> >  mm/swap.c  |  42 ++
> >  6 files changed, 224 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index de2c67a33b7e..0ce997edb8bb 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu);
> >  extern void lru_add_drain_all(void);
> >  extern void rotate_reclaimable_page(struct page *page);
> >  extern void deactivate_file_page(struct page *page);
> > +extern void deactivate_page(struct page *page);
> >  extern void mark_page_lazyfree(struct page *page);
> >  extern void swap_setup(void);
> >
> > diff --git a/include/uapi/asm-generic/mman-common.h 
> > b/include/uapi/asm-generic/mman-common.h
> > index 63b1f506ea67..ef8a56927b12 100644
> > --- a/include/uapi/asm-generic/mman-common.h
> > +++ b/include/uapi/asm-generic/mman-common.h
> > @@ -45,6 +45,7 @@
> >  #define MADV_SEQUENTIAL2   /* expect sequential page 
> > references */
> >  #define MADV_WILLNEED  3   /* will need these pages */
> >  #define MADV_DONTNEED  4   /* don't need these pages */
> > +#define MADV_COLD  5   /* deactivatie these pages */
> 
> s/deactivatie/deactivate

Fixed.

> 
> >
> >  /* common parameters: try to keep these consistent across architectures */
> >  #define 

Re: WARNING in __mmdrop

2019-07-22 Thread Jason Wang



On 2019/7/23 下午1:01, Michael S. Tsirkin wrote:

On Tue, Jul 23, 2019 at 12:01:40PM +0800, Jason Wang wrote:

On 2019/7/22 下午4:08, Michael S. Tsirkin wrote:

On Mon, Jul 22, 2019 at 01:24:24PM +0800, Jason Wang wrote:

On 2019/7/21 下午8:18, Michael S. Tsirkin wrote:

On Sun, Jul 21, 2019 at 06:02:52AM -0400, Michael S. Tsirkin wrote:

On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:

syzbot has bisected this bug to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang
Date:   Fri May 24 08:12:18 2019 +

   vhost: access vq metadata through kernel virtual address

bisection log:https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
start commit:   6d21a41b Add linux-next specific files for 20190718
git tree:   linux-next
final crash:https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
console output:https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
kernel config:https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
dashboard link:https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
syz repro:https://syzkaller.appspot.com/x/repro.syz?x=10139e6860

Reported-by:syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual
address")

For information about bisection process see:https://goo.gl/tpsmEJ#bisection

OK I poked at this for a bit, I see several things that
we need to fix, though I'm not yet sure it's the reason for
the failures:


1. mmu_notifier_register shouldn't be called from vhost_vring_set_num_addr
  That's just a bad hack, in particular I don't think device
  mutex is taken and so poking at two VQs will corrupt
  memory.
  So what to do? How about a per vq notifier?
  Of course we also have synchronize_rcu
  in the notifier which is slow and is now going to be called twice.
  I think call_rcu would be more appropriate here.
  We then need rcu_barrier on module unload.
  OTOH if we make pages linear with map then we are good
  with kfree_rcu which is even nicer.

2. Doesn't map leak after vhost_map_unprefetch?
  And why does it poke at contents of the map?
  No one should use it right?

3. notifier unregister happens last in vhost_dev_cleanup,
  but register happens first. This looks wrong to me.

4. OK so we use the invalidate count to try and detect that
  some invalidate is in progress.
  I am not 100% sure why do we care.
  Assuming we do, uaddr can change between start and end
  and then the counter can get negative, or generally
  out of sync.

So what to do about all this?
I am inclined to say let's just drop the uaddr optimization
for now. E.g. kvm invalidates unconditionally.
3 should be fixed independently.

Above implements this but is only build-tested.
Jason, pls take a look. If you like the approach feel
free to take it from here.

One thing the below does not have is any kind of rate-limiting.
Given it's so easy to restart I'm thinking it makes sense
to add a generic infrastructure for this.
Can be a separate patch I guess.

I don't get why must use kfree_rcu() instead of synchronize_rcu() here.

synchronize_rcu has very high latency on busy systems.
It is not something that should be used on a syscall path.
KVM had to switch to SRCU to keep it sane.
Otherwise one guest can trivially slow down another one.


I think you mean the synchronize_rcu_expedited()? Rethink of the code, the
synchronize_rcu() in ioctl() could be removed, since it was serialized with
memory accessor.


Really let's just use kfree_rcu. It's way cleaner: fire and forget.



Looks not, you need rate limit the fire as you've figured out? And in 
fact, the synchronization is not even needed, does it help if I leave a 
comment to explain?






Btw, for kvm ioctl it still uses synchronize_rcu() in kvm_vcpu_ioctl(),
(just a little bit more hard to trigger):


AFAIK these never run in response to guest events.
So they can take very long and guests still won't crash.



What if guest manages to escape to qemu?

Thanks






     case KVM_RUN: {
...
         if (unlikely(oldpid != task_pid(current))) {
             /* The thread running this VCPU changed. */
             struct pid *newpid;

             r = kvm_arch_vcpu_run_pid_change(vcpu);
             if (r)
                 break;

             newpid = get_task_pid(current, PIDTYPE_PID);
             rcu_assign_pointer(vcpu->pid, newpid);
             if (oldpid)
                 synchronize_rcu();
             put_pid(oldpid);
         }
...
         break;



Signed-off-by: Michael S. Tsirkin

Let me try to figure out the root cause then decide whether or not to go for
this way.

Thanks

The root cause of the crash is relevant, but we still need
to fix issues 1-4.

More issues (my patch tries to fix them too):

5. page not dirtied when mappings are torn down outside
 of invalidate callback


Yes.



6. potential cross-VM DOS by one guest keeping system 

Re: [PATCH] lightnvm: introduce pr_fmt for the previx nvm

2019-07-22 Thread Javier González
> On 22 Jul 2019, at 06.31, Minwoo Im  wrote:
> 
>>> @@ -,27 +1112,27 @@ static int nvm_init(struct nvm_dev *dev)
>>> int ret = -EINVAL;
>>> 
>>> if (dev->ops->identity(dev)) {
>>> -   pr_err("nvm: device could not be identified\n");
>>> +   pr_err("device could not be identified\n");
>>> goto err;
>>> }
>>> 
>>> -   pr_debug("nvm: ver:%u.%u nvm_vendor:%x\n",
>>> +   pr_debug("ver:%u.%u nvm_vendor:%x\n",
>>> geo->major_ver_id, geo->minor_ver_id,
>>> geo->vmnt);
>> The above last 2 lines can be squashed and pr_debug call can be made in
>> 2 lines since you have removed the "nvm:" which shortens the first line.
> 
> Yeah Okay.  Will prepare V2 with this and also s/previx/prefix in the
> title.
> 
> Thanks for the review.
> 
> Minwoo Im

Besides Chaitanya’s comments, looks good. You can add my review on V2.

Reviewed-by: Javier González 


signature.asc
Description: Message signed with OpenPGP


RE: [PATCH] selinux: convert struct sidtab count to refcount_t

2019-07-22 Thread Gote, Nitin R


> -Original Message-
> From: Ondrej Mosnacek [mailto:omosn...@redhat.com]
> Sent: Monday, July 22, 2019 6:48 PM
> To: Gote, Nitin R 
> Cc: Kees Cook ; kernel-
> harden...@lists.openwall.com; Paul Moore ;
> Stephen Smalley ; Eric Paris ;
> SElinux list ; Linux kernel mailing list  ker...@vger.kernel.org>
> Subject: Re: [PATCH] selinux: convert struct sidtab count to refcount_t
> 
> On Mon, Jul 22, 2019 at 1:35 PM NitinGote  wrote:
> > refcount_t type and corresponding API should be used instead of
> > atomic_t when the variable is used as a reference counter. This allows
> > to avoid accidental refcounter overflows that might lead to
> > use-after-free situations.
> >
> > Signed-off-by: NitinGote 
> 
> Nack.
> 
> The 'count' variable is not used as a reference counter here. It tracks the
> number of entries in sidtab, which is a very specific lookup table that can
> only grow (the count never decreases). I only made it atomic because the
> variable is read outside of the sidtab's spin lock and thus the reads and
> writes to it need to be guaranteed to be atomic. The counter is only updated
> under the spin lock, so insertions do not race with each other.

Agreed. Thanks for clarification. 
I'm going to discontinue this patch.

> 
> Your patch, however, lead me to realize that I forgot to guard against
> overflow above SIDTAB_MAX when a new entry is being inserted. It is
> extremely unlikely to happen in practice, but should be fixed anyway.
> I'll send a patch shortly.
> 

Thank you.

> > ---
> >  security/selinux/ss/sidtab.c | 16 
> > security/selinux/ss/sidtab.h |  2 +-
> >  2 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/security/selinux/ss/sidtab.c
> > b/security/selinux/ss/sidtab.c index e63a90ff2728..20fe235c6c71 100644
> > --- a/security/selinux/ss/sidtab.c
> > +++ b/security/selinux/ss/sidtab.c
> > @@ -29,7 +29,7 @@ int sidtab_init(struct sidtab *s)
> > for (i = 0; i < SECINITSID_NUM; i++)
> > s->isids[i].set = 0;
> >
> > -   atomic_set(>count, 0);
> > +   refcount_set(>count, 0);
> >
> > s->convert = NULL;
> >
> > @@ -130,7 +130,7 @@ static struct context *sidtab_do_lookup(struct
> > sidtab *s, u32 index, int alloc)
> >
> >  static struct context *sidtab_lookup(struct sidtab *s, u32 index)  {
> > -   u32 count = (u32)atomic_read(>count);
> > +   u32 count = refcount_read(>count);
> >
> > if (index >= count)
> > return NULL;
> > @@ -245,7 +245,7 @@ static int sidtab_reverse_lookup(struct sidtab *s,
> struct context *context,
> >  u32 *index)
> >  {
> > unsigned long flags;
> > -   u32 count = (u32)atomic_read(>count);
> > +   u32 count = (u32)refcount_read(>count);
> > u32 count_locked, level, pos;
> > struct sidtab_convert_params *convert;
> > struct context *dst, *dst_convert;
> > @@ -272,7 +272,7 @@ static int sidtab_reverse_lookup(struct sidtab *s,
> struct context *context,
> > spin_lock_irqsave(>lock, flags);
> >
> > convert = s->convert;
> > -   count_locked = (u32)atomic_read(>count);
> > +   count_locked = (u32)refcount_read(>count);
> > level = sidtab_level_from_count(count_locked);
> >
> > /* if count has changed before we acquired the lock, then catch up 
> > */
> > @@ -315,7 +315,7 @@ static int sidtab_reverse_lookup(struct sidtab *s,
> struct context *context,
> > }
> >
> > /* at this point we know the insert won't fail */
> > -   atomic_set(>target->count, count + 1);
> > +   refcount_set(>target->count, count + 1);
> > }
> >
> > if (context->len)
> > @@ -328,7 +328,7 @@ static int sidtab_reverse_lookup(struct sidtab *s,
> struct context *context,
> > /* write entries before writing new count */
> > smp_wmb();
> >
> > -   atomic_set(>count, count + 1);
> > +   refcount_set(>count, count + 1);
> >
> > rc = 0;
> >  out_unlock:
> > @@ -418,7 +418,7 @@ int sidtab_convert(struct sidtab *s, struct
> sidtab_convert_params *params)
> > return -EBUSY;
> > }
> >
> > -   count = (u32)atomic_read(>count);
> > +   count = (u32)refcount_read(>count);
> > level = sidtab_level_from_count(count);
> >
> > /* allocate last leaf in the new sidtab (to avoid race with
> > @@ -431,7 +431,7 @@ int sidtab_convert(struct sidtab *s, struct
> sidtab_convert_params *params)
> > }
> >
> > /* set count in case no new entries are added during conversion */
> > -   atomic_set(>target->count, count);
> > +   refcount_set(>target->count, count);
> >
> > /* enable live convert of new entries */
> > s->convert = params;
> > diff --git a/security/selinux/ss/sidtab.h b/security/selinux/ss/sidtab.h
> > index bbd5c0d1f3bd..68dd96a5beba 100644
> > --- a/security/selinux/ss/sidtab.h
> > +++ 

[PATCH v3] nvme: fix multipath crash when ANA deactivated

2019-07-22 Thread Marta Rybczynska
Fix a crash with multipath activated. It happends when ANA log
page is larger than MDTS and because of that ANA is disabled.
The driver then tries to access unallocated buffer when connecting
to a nvme target. The signature is as follows:

[  300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192).
[  300.435387] nvme nvme0: disabling ANA support.
[  300.437835] nvme nvme0: creating 4 I/O queues.
[  300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009
[  300.464609] BUG: unable to handle kernel NULL pointer dereference at 
0008
[  300.466342] #PF error: [normal kernel read fault]
[  300.467385] PGD 0 P4D 0
[  300.467987] Oops:  [#1] SMP PTI
[  300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4
[  300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[  300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[  300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 
55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 
08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[  300.477374] RSP: 0018:a50e80fd7cb8 EFLAGS: 00010296
[  300.478334] RAX: 0001 RBX: 9130f1872258 RCX: 
[  300.479784] RDX: c06c4c30 RSI: 9130edad4280 RDI: 9130f1872258
[  300.481488] RBP:  R08: 0001 R09: 0044
[  300.483203] R10: 0220 R11: 0040 R12: 9130f18722c0
[  300.484928] R13: 9130f18722d0 R14: 9130edad4280 R15: 9130f18722c0
[  300.486626] FS:  () GS:9130f7b8() 
knlGS:
[  300.488538] CS:  0010 DS:  ES:  CR0: 80050033
[  300.489907] CR2: 0008 CR3: 0002365e6000 CR4: 06e0
[  300.491612] DR0:  DR1:  DR2: 
[  300.493303] DR3:  DR6: fffe0ff0 DR7: 0400
[  300.494991] Call Trace:
[  300.495645]  nvme_mpath_add_disk+0x5c/0xb0 [nvme_core]
[  300.496880]  nvme_validate_ns+0x2ef/0x550 [nvme_core]
[  300.498105]  ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core]
[  300.499539]  nvme_scan_work+0x2b4/0x370 [nvme_core]
[  300.500717]  ? __switch_to_asm+0x35/0x70
[  300.501663]  process_one_work+0x171/0x380
[  300.502340]  worker_thread+0x49/0x3f0
[  300.503079]  kthread+0xf8/0x130
[  300.503795]  ? max_active_store+0x80/0x80
[  300.504690]  ? kthread_bind+0x10/0x10
[  300.505502]  ret_from_fork+0x35/0x40
[  300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm 
ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle 
ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle 
iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan 
ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter 
bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net 
virtio_console net_failover virtio_blk failover ata_piix serio_raw libata 
virtio_pci virtio_ring virtio
[  300.514984] CR2: 0008
[  300.515569] ---[ end trace faa2eefad7e7f218 ]---
[  300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core]
[  300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 
55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 
08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48
[  300.520353] RSP: 0018:a50e80fd7cb8 EFLAGS: 00010296
[  300.521229] RAX: 0001 RBX: 9130f1872258 RCX: 
[  300.522399] RDX: c06c4c30 RSI: 9130edad4280 RDI: 9130f1872258
[  300.523560] RBP:  R08: 0001 R09: 0044
[  300.524734] R10: 0220 R11: 0040 R12: 9130f18722c0
[  300.525915] R13: 9130f18722d0 R14: 9130edad4280 R15: 9130f18722c0
[  300.527084] FS:  () GS:9130f7b8() 
knlGS:
[  300.528396] CS:  0010 DS:  ES:  CR0: 80050033
[  300.529440] CR2: 0008 CR3: 0002365e6000 CR4: 06e0
[  300.530739] DR0:  DR1:  DR2: 
[  300.531989] DR3:  DR6: fffe0ff0 DR7: 0400
[  300.533264] Kernel panic - not syncing: Fatal exception
[  300.534338] Kernel Offset: 0x17c0 from 0x8100 (relocation 
range: 0x8000-0xbfff)
[  300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]---

Condition check refactoring from Christoph Hellwig.

Signed-off-by: Marta Rybczynska 
Tested-by: Jean-Baptiste Riaux 
---
 drivers/nvme/host/multipath.c | 8 ++--
 drivers/nvme/host/nvme.h  | 6 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git 

Re: [PATCH 00/10] Add further support for PHYTEC phyBOARD-Segin

2019-07-22 Thread Shawn Guo
On Tue, Jul 09, 2019 at 09:19:17AM +0200, Stefan Riedmueller wrote:
> This patchstack adjusts the already existing naming for the PHYTEC
> phyBOARD-Segin to the PHYTEC naming scheme that is already used with the
> phyCORE-i.MX 6 and the phyBOARD-Mira.
> 
> Furthermore it introduces some small fixes and adds support for the PHYTEC
> phyCORE-i.MX 6ULL which also comes with the phyBORAD-Segin. It comes in a
> full featured option with either NAND flash or eMMC and a low cost option
> only with NAND flash.
> 
> Stefan Riedmueller (10):
>   ARM: dts: imx6ul: phyboard-segin: Rename dts to PHYTEC name scheme
>   ARM: dts: imx6ul: segin: Add boot media to dts filename
>   ARM: dts: imx6ul: segin: Reduce eth drive strength
>   ARM: dts: imx6ul: segin: Fix LED naming for phyCORE and PEB-EVAL-01
>   ARM: dts: imx6ul: segin: Make FEC and ethphy configurable in dts
>   ARM: dts: imx6ul: segin: Only enable NAND if it is populated
>   ARM: dts: imx6ul: phycore: Add eMMC at usdhc2
>   ARM: dts: imx6ul: segin: Move ECSPI interface to board include file
>   ARM: dts: imx6ul: segin: Move machine include to dts files
>   ARM: dts: imx6ull: Add support for PHYTEC phyBOARD-Segin with i.MX
> 6ULL

I applied the series, but please send a follow-up patch for those
undocumented board compatibles.

Shawn


Re: [PATCH 5/5] arm64: dts: qcom: sdm845-cheza: remove macro from unit name

2019-07-22 Thread Bjorn Andersson
On Mon 22 Jul 22:14 PDT 2019, Vinod Koul wrote:

> On 23-07-19, 10:38, Amit Kucheria wrote:
> > On Mon, Jul 22, 2019 at 6:06 PM Vinod Koul  wrote:
> > >
> > > Unit name is supposed to be a number, using a macro with hex value is
> > 
> > /s/name/address?
> 
> Right, will fix.
> 
> > > not recommended, so add the value in unit name.
> > >
> > > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:966.16-969.4: Warning 
> > > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4d: 
> > > unit name should not have leading "0x"
> > > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:971.16-974.4: Warning 
> > > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4e: 
> > > unit name should not have leading "0x"
> > > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:976.16-979.4: Warning 
> > > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4f: 
> > > unit name should not have leading "0x"
> > > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:981.16-984.4: Warning 
> > > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x50: 
> > > unit name should not have leading "0x"
> > > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:986.16-989.4: Warning 
> > > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x51: 
> > > unit name should not have leading "0x"
> > >
> > > Signed-off-by: Vinod Koul 
> > > ---
> > >  arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 10 +-
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi 
> > > b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > > index 1ebbd568dfd7..9b27b8346ba1 100644
> > > --- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > > +++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > > @@ -963,27 +963,27 @@ ap_ts_i2c:  {
> > >  };
> > >
> > >  _adc {
> > > -   adc-chan@ADC5_AMUX_THM1_100K_PU {
> > > +   adc-chan@4d {
> > > reg = ;

When I read this define I instantly know which channel we're referring
to. The 4d above is simply there for syntactical purposes and needs only
to be cared about if the reg is ever changed.

So I like this form.

> > 
> > I'm a little conflicted about this change. If we're replacing the
> > address with actual values, perhaps we should do that same for the reg
> > property to keep them in sync? Admittedly though, it is a bit easier
> > to read the macro name and figure out its meaning.
> 
> Well this was how Bjorn suggested, am okay if we do in any
> other way. This fixes warning but keeps it bit readable too
> 
> Other way would be to make defines decimal values instead of hex
> 

While the ePAPRR states that the unit address must match the first reg,
dtc enforces that the unit address string matches "%x" of the reg.

Regards,
Bjorn

> Any better suggestions :)
> 
> -- 
> ~Vinod


[PATCH] mbcache: Speed up cache entry creation

2019-07-22 Thread Sultan Alsawaf
From: Sultan Alsawaf 

In order to prevent redundant entry creation by racing against itself,
mb_cache_entry_create scans through a hash-list of all current entries
in order to see if another allocation for the requested new entry has
been made. Furthermore, it allocates memory for a new entry before
scanning through this hash-list, which results in that allocated memory
being discarded when the requested new entry is already present.

Speed up cache entry creation by keeping a small linked list of
requested new entries in progress, and scanning through that first
instead of the large hash-list. And don't allocate memory for a new
entry until it's known that the allocated memory will be used.

Signed-off-by: Sultan Alsawaf 
---
 fs/mbcache.c | 82 
 1 file changed, 57 insertions(+), 25 deletions(-)

diff --git a/fs/mbcache.c b/fs/mbcache.c
index 97c54d3a2227..289f3664061e 100644
--- a/fs/mbcache.c
+++ b/fs/mbcache.c
@@ -25,9 +25,14 @@
  * size hash table is used for fast key lookups.
  */
 
+struct mb_bucket {
+   struct hlist_bl_head hash;
+   struct list_head req_list;
+};
+
 struct mb_cache {
/* Hash table of entries */
-   struct hlist_bl_head*c_hash;
+   struct mb_bucket*c_bucket;
/* log2 of hash table size */
int c_bucket_bits;
/* Maximum entries in cache to avoid degrading hash too much */
@@ -42,15 +47,21 @@ struct mb_cache {
struct work_struct  c_shrink_work;
 };
 
+struct mb_cache_req {
+   struct list_head lnode;
+   u32 key;
+   u64 value;
+};
+
 static struct kmem_cache *mb_entry_cache;
 
 static unsigned long mb_cache_shrink(struct mb_cache *cache,
 unsigned long nr_to_scan);
 
-static inline struct hlist_bl_head *mb_cache_entry_head(struct mb_cache *cache,
-   u32 key)
+static inline struct mb_bucket *mb_cache_entry_bucket(struct mb_cache *cache,
+ u32 key)
 {
-   return >c_hash[hash_32(key, cache->c_bucket_bits)];
+   return >c_bucket[hash_32(key, cache->c_bucket_bits)];
 }
 
 /*
@@ -77,6 +88,8 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, 
u32 key,
struct mb_cache_entry *entry, *dup;
struct hlist_bl_node *dup_node;
struct hlist_bl_head *head;
+   struct mb_cache_req *tmp_req, req;
+   struct mb_bucket *bucket;
 
/* Schedule background reclaim if there are too many entries */
if (cache->c_entry_count >= cache->c_max_entries)
@@ -85,9 +98,33 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t 
mask, u32 key,
if (cache->c_entry_count >= 2*cache->c_max_entries)
mb_cache_shrink(cache, SYNC_SHRINK_BATCH);
 
+   bucket = mb_cache_entry_bucket(cache, key);
+   head = >hash;
+   hlist_bl_lock(head);
+   list_for_each_entry(tmp_req, >req_list, lnode) {
+   if (tmp_req->key == key && tmp_req->value == value) {
+   hlist_bl_unlock(head);
+   return -EBUSY;
+   }
+   }
+   hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) {
+   if (dup->e_key == key && dup->e_value == value) {
+   hlist_bl_unlock(head);
+   return -EBUSY;
+   }
+   }
+   req.key = key;
+   req.value = value;
+   list_add(, >req_list);
+   hlist_bl_unlock(head);
+
entry = kmem_cache_alloc(mb_entry_cache, mask);
-   if (!entry)
+   if (!entry) {
+   hlist_bl_lock(head);
+   list_del();
+   hlist_bl_unlock(head);
return -ENOMEM;
+   }
 
INIT_LIST_HEAD(>e_list);
/* One ref for hash, one ref returned */
@@ -96,15 +133,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t 
mask, u32 key,
entry->e_value = value;
entry->e_reusable = reusable;
entry->e_referenced = 0;
-   head = mb_cache_entry_head(cache, key);
+
hlist_bl_lock(head);
-   hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) {
-   if (dup->e_key == key && dup->e_value == value) {
-   hlist_bl_unlock(head);
-   kmem_cache_free(mb_entry_cache, entry);
-   return -EBUSY;
-   }
-   }
+   list_del();
hlist_bl_add_head(>e_hash_list, head);
hlist_bl_unlock(head);
 
@@ -133,7 +164,7 @@ static struct mb_cache_entry *__entry_find(struct mb_cache 
*cache,
struct hlist_bl_node *node;
struct hlist_bl_head *head;
 
-   head = mb_cache_entry_head(cache, key);
+   head = _cache_entry_bucket(cache, key)->hash;
hlist_bl_lock(head);
if (entry && !hlist_bl_unhashed(>e_hash_list))
node = entry->e_hash_list.next;
@@ -202,7 +233,7 @@ 

Re: [v4 PATCH 2/2] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind

2019-07-22 Thread Yang Shi




On 7/22/19 6:02 PM, Andrew Morton wrote:

On Mon, 22 Jul 2019 09:25:09 +0200 Vlastimil Babka  wrote:


since there may be pages off LRU temporarily.  We should migrate other
pages if MPOL_MF_MOVE* is specified.  Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.

With this change the above test would return -EIO as expected.

Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Signed-off-by: Yang Shi 

Reviewed-by: Vlastimil Babka 

Thanks.

I'm a bit surprised that this doesn't have a cc:stable.  Did we
consider that?


The VM_BUG just happens on 4.9, and it is enabled only by CONFIG_VM. For 
post-4.9 kernel, this fixes the semantics of mbind which should be not a 
regression IMHO.




Also, is this patch dependent upon "mm: mempolicy: make the behavior
consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified"?
Doesn't look that way..


No, it depends on patch #1.



Also, I have a note that you had concerns with "mm: mempolicy: make the
behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were
specified".  What is the status now?


Vlastimil had given his Reviewed-by.




From: Yang Shi 
Subject: mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and 
MPOL_MF_STRICT were specified

When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.

There are three different sub-cases:
1. vma is not migratable
2. vma is migratable, but there are unmovable pages
3. vma is migratable, pages are movable, but migrate_pages() fails

If #1 happens, kernel would just abort immediately, then return -EIO,
after the commit a7f40cfe3b7ada57af9b62fd28430eeb4a7cfcb7 ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified").

If #3 happens, kernel would set policy and migrate pages with best-effort,
but won't rollback the migrated pages and reset the policy back.

Before that commit, they behaves in the same way.  It'd better to keep
their behavior consistent.  But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.

Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt rollback,
determine which exact page(s) are violating the policy, etc.

Make queue_pages_range() return 1 to indicate there are unmovable pages or
vma is not migratable.

The #2 is not handled correctly in the current kernel, the following patch
will fix it.

Link: 
http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang@linux.alibaba.com
Signed-off-by: Yang Shi 
Cc: Vlastimil Babka 
Cc: Michal Hocko 
Cc: Mel Gorman 
Signed-off-by: Andrew Morton 
---

  mm/mempolicy.c |   84 +--
  1 file changed, 60 insertions(+), 24 deletions(-)

--- 
a/mm/mempolicy.c~mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified
+++ a/mm/mempolicy.c
@@ -429,11 +429,14 @@ static inline bool queue_pages_required(
  }
  
  /*

- * queue_pages_pmd() has three possible return values:
+ * queue_pages_pmd() has four possible return values:
+ * 2 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
   * 1 - pages are placed on the right node or queued successfully.
   * 0 - THP was split.
- * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
- *page was already on a node that does not follow the policy.
+ * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
+ *existing page was already on a node that does not follow the
+ *policy.
   */
  static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -463,7 +466,7 @@ static int queue_pages_pmd(pmd_t *pmd, s
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma)) {
-   ret = -EIO;
+   ret = 2;
goto unlock;
}
  
@@ -488,16 +491,29 @@ static int queue_pages_pte_range(pmd_t *

struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
int ret;
+   bool has_unmovable = false;
pte_t *pte;
spinlock_t *ptl;
  
  	ptl = pmd_trans_huge_lock(pmd, vma);

if (ptl) {
ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
-   if (ret > 0)
+   switch (ret) {
+   /* THP was split, fall through to pte walk */
+   case 0:
+   break;
+   /* Pages are placed on the right node or queued successfully */
+   case 1:
return 0;
-   else if (ret < 0)
+   /*

Re: [PATCH] memremap: move from kernel/ to mm/

2019-07-22 Thread Dan Williams
On Mon, Jul 22, 2019 at 2:42 AM Christoph Hellwig  wrote:
>
> memremap.c implements MM functionality for ZONE_DEVICE, so it really
> should be in the mm/ directory, not the kernel/ one.
>
> Signed-off-by: Christoph Hellwig 

Acked-by: Dan Williams 


Re: Linux 5.3-rc1

2019-07-22 Thread James Bottomley
On Mon, 2019-07-22 at 19:42 -0700, Guenter Roeck wrote:
> On 7/22/19 4:45 PM, James Bottomley wrote:
> > [linux-scsi added to cc]
> > On Mon, 2019-07-22 at 15:21 -0700, Guenter Roeck wrote:
> > > On Sun, Jul 21, 2019 at 02:33:38PM -0700, Linus Torvalds wrote:
> > > 
> > > [ ... ]
> > > > 
> > > > Go test,
> > > > 
> > > 
> > > Things looked pretty good until a few days ago. Unfortunately,
> > > the last few days brought in a couple of issues.
> > > 
> > > riscv:virt:defconfig:scsi[virtio]
> > > riscv:virt:defconfig:scsi[virtio-pci]
> > > 
> > > Boot tests crash with no useful backtrace. Bisect points to
> > > merge ac60602a6d8f ("Merge tag 'dma-mapping-5.3-1'"). Log is at
> > > https://kerneltests.org/builders/qemu-riscv64-master/builds/238/s
> > > teps
> > > /qemubuildcommand_1/logs/stdio
> > > 
> > > ppc:mpc8544ds:mpc85xx_defconfig:sata-sii3112
> > > ppc64:pseries:pseries_defconfig:sata-sii3112
> > > ppc64:pseries:pseries_defconfig:little:sata-sii3112
> > > ppc64:ppce500:corenet64_smp_defconfig:e5500:sata-sii3112
> > > 
> > > ata1: lost interrupt (Status 0x50)
> > > ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> > > ata1.00: failed command: READ DMA
> > > 
> > > and many similar errors. Boot ultimately times out. Bisect points
> > > to
> > > merge
> > > f65420df914a ("Merge tag 'scsi-fixes'").
> > > 
> > > Logs:
> > > https://kerneltests.org/builders/qemu-ppc64-master/builds/1212/st
> > > eps/
> > > qemubuildcommand/logs/stdio
> > > https://kerneltests.org/builders/qemu-ppc-master/builds/1255/step
> > > s/qe
> > > mubuildcommand/logs/stdio
> > > 
> > > Guenter
> > > 
> > > ---
> > > riscv bisect log
> > > 
> > > # bad: [5f9e832c137075045d15cd6899ab0505cfb2ca4b] Linus 5.3-rc1
> > > # good: [bdd17bdef7d8da4d8eee254abb4c92d8a566bdc1] scsi: core:
> > > take
> > > the DMA max mapping size into account
> > > git bisect start 'HEAD' 'bdd17bdef7d8'
> > > # good: [237f83dfbe668443b5e31c3c7576125871cca674] Merge
> > > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> > > git bisect good 237f83dfbe668443b5e31c3c7576125871cca674
> > > # good: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag
> > > 'drm-
> > > next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
> > > git bisect good be8454afc50f43016ca8b6130d9673bdd0bd56ec
> > > # good: [d4df33b0e9925c158b313a586fb1557cf29cfdf4] Merge branch
> > > 'for-
> > > linus-5.2' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
> > > git bisect good d4df33b0e9925c158b313a586fb1557cf29cfdf4
> > > # good: [f90b8fda3a9d72a9422ea80ae95843697f94ea4a] ARM: dts:
> > > gemini:
> > > Set DIR-685 SPI CS as active low
> > > git bisect good f90b8fda3a9d72a9422ea80ae95843697f94ea4a
> > > # good: [31cc088a4f5d83481c6f5041bd6eb06115b974af] Merge tag
> > > 'drm-
> > > next-2019-07-19' of git://anongit.freedesktop.org/drm/drm
> > > git bisect good 31cc088a4f5d83481c6f5041bd6eb06115b974af
> > > # good: [ad21a4ce040cc41b4a085417169b558e86af56b7] dt-bindings:
> > > pinctrl: aspeed: Fix 'compatible' schema errors
> > > git bisect good ad21a4ce040cc41b4a085417169b558e86af56b7
> > > # good: [e6023adc5c6af79ac8ac5b17939f58091fa0d870] Merge branch
> > > 'core-urgent-for-linus' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > > git bisect good e6023adc5c6af79ac8ac5b17939f58091fa0d870
> > > # bad: [ac60602a6d8f6830dee89f4b87ee005f62eb7171] Merge tag 'dma-
> > > mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping
> > > git bisect bad ac60602a6d8f6830dee89f4b87ee005f62eb7171
> > > # good: [6e67d77d673d785631b0c52314b60d3c68ebe809] perf vendor
> > > events
> > > s390: Add JSON files for machine type 8561
> > > git bisect good 6e67d77d673d785631b0c52314b60d3c68ebe809
> > > # good: [a0d14b8909de55139b8702fe0c7e80b69763dcfb] x86/mm,
> > > tracing:
> > > Fix CR2 corruption
> > > git bisect good a0d14b8909de55139b8702fe0c7e80b69763dcfb
> > > # good: [6879298bd0673840cadd1fb36d7225485504ceb4] x86/entry/64:
> > > Prevent clobbering of saved CR2 value
> > > git bisect good 6879298bd0673840cadd1fb36d7225485504ceb4
> > > # good: [449fa54d6815be8c2c1f68fa9dbbae9384a7c03e] dma-direct:
> > > correct the physical addr in dma_direct_sync_sg_for_cpu/device
> > > git bisect good 449fa54d6815be8c2c1f68fa9dbbae9384a7c03e
> > > # good: [e0c5c5e308ee9b3548844f0d88da937782b895ef] Merge tag
> > > 'perf-
> > > core-for-mingo-5.3-20190715' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into
> > > perf/urgent
> > > git bisect good e0c5c5e308ee9b3548844f0d88da937782b895ef
> > > # good: [c6dd78fcb8eefa15dd861889e0f59d301cb5230c] Merge branch
> > > 'x86-
> > > urgent-for-linus' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > > git bisect good c6dd78fcb8eefa15dd861889e0f59d301cb5230c
> > > # first bad commit: [ac60602a6d8f6830dee89f4b87ee005f62eb7171]
> > > Merge
> > > tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-
> > > mapping
> > > 
> > > -
> > > ppc/ppc64 bisect log
> > > 
> 

Re: Issue with sequence to switch to HS400

2019-07-22 Thread Adrian Hunter
On 23/07/19 1:31 AM, Alan Cooper wrote:
> I'm having a problem with a new SD/MMC controller and PHY in our
> latest SoC's. The issue I'm seeing is that I can't switch into HS400
> mode. This looks like something the driver is doing that doesn't meet
> the JEDEC spec. In the "HS400 timing mode selection" section of the
> JEDEC spec , in step 7 it states:
> 
> 7) Set the “Timing Interface” parameter in the HS_TIMING [185] field
> of the Extended CSD register to 0x1 to switch to High Speed mode and
> then set the clock frequency to a value not greater than 52 MHz.
> 
> In the function mmc_select_hs400() in mmc.c, I see that a switch
> command is done to set the eMMC device to HS mode and then
> mmc_set_timing(card->host, MMC_TIMING_MMC_HS) is used to change the
> controller to HS mode. The problem is that the "SD Host Controller
> Standard Specification" states that "UHS Mode Select" field of the
> "Host Control 2 Register" controls the mode when the "1.8V Signaling
> Enable" bit in the same register is set, so mmc_set_timing() is
> actually leaving the controller in SDR12 mode and mmc_select_hs400()
> will then set the clock to 52MHz. This causes our PHY to detect an
> illegal combination and return an error.
> 
> I think the easiest fix would be to change mmc_set_timing(card->host,
> MMC_TIMING_MMC_HS) to mmc_set_timing(card->host,
> MMC_TIMING_UHS_SDR25). The other possibility would be to change
> mmc_set_timing to handle the "1.8V Signaling Enable" bit properly.
> I'll submit a patch based on the feedback I get.

eMMC is governed by JEDEC specs not SD specs.

Please consider making a change in your driver instead.  For example, hook
->set_ios() and if 1.8V is enabled and timing is set to MMC_TIMING_MMC_HS
then change it to MMC_TIMING_UHS_SDR25.


Re: [PATCH 5/5] arm64: dts: qcom: sdm845-cheza: remove macro from unit name

2019-07-22 Thread Vinod Koul
On 23-07-19, 10:38, Amit Kucheria wrote:
> On Mon, Jul 22, 2019 at 6:06 PM Vinod Koul  wrote:
> >
> > Unit name is supposed to be a number, using a macro with hex value is
> 
> /s/name/address?

Right, will fix.

> > not recommended, so add the value in unit name.
> >
> > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:966.16-969.4: Warning 
> > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4d: 
> > unit name should not have leading "0x"
> > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:971.16-974.4: Warning 
> > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4e: 
> > unit name should not have leading "0x"
> > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:976.16-979.4: Warning 
> > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4f: 
> > unit name should not have leading "0x"
> > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:981.16-984.4: Warning 
> > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x50: 
> > unit name should not have leading "0x"
> > arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:986.16-989.4: Warning 
> > (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x51: 
> > unit name should not have leading "0x"
> >
> > Signed-off-by: Vinod Koul 
> > ---
> >  arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi 
> > b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > index 1ebbd568dfd7..9b27b8346ba1 100644
> > --- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > +++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> > @@ -963,27 +963,27 @@ ap_ts_i2c:  {
> >  };
> >
> >  _adc {
> > -   adc-chan@ADC5_AMUX_THM1_100K_PU {
> > +   adc-chan@4d {
> > reg = ;
> 
> I'm a little conflicted about this change. If we're replacing the
> address with actual values, perhaps we should do that same for the reg
> property to keep them in sync? Admittedly though, it is a bit easier
> to read the macro name and figure out its meaning.

Well this was how Bjorn suggested, am okay if we do in any
other way. This fixes warning but keeps it bit readable too

Other way would be to make defines decimal values instead of hex

Any better suggestions :)

-- 
~Vinod


[RFC] clk: Remove cached cores in parent map during unregister

2019-07-22 Thread Bjorn Andersson
As clocks are registered their parents are resolved and the parent_map
is updated to cache the clk_core objects of each existing parent.
But in the event of a clock being unregistered this cache will carry
dangling pointers if not invalidated, so do this for all children of the
clock being unregistered.

Signed-off-by: Bjorn Andersson 
---

This resolves the issue seen where the DSI PLL (and it's provided clocks) is
being registered and unregistered multiple times due to probe deferral.

Marking it RFC because I don't fully understand the life of the clock yet.

 drivers/clk/clk.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index c0990703ce54..8cd1ad977c50 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2423,11 +2423,14 @@ bool clk_has_parent(struct clk *clk, struct clk *parent)
 EXPORT_SYMBOL_GPL(clk_has_parent);
 
 static int clk_core_set_parent_nolock(struct clk_core *core,
- struct clk_core *parent)
+ struct clk_core *parent,
+ bool invalidate_parent)
 {
+   struct clk_core *old_parent = core->parent;
int ret = 0;
int p_index = 0;
unsigned long p_rate = 0;
+   int i;
 
lockdep_assert_held(_lock);
 
@@ -2481,6 +2484,14 @@ static int clk_core_set_parent_nolock(struct clk_core 
*core,
__clk_recalc_accuracies(core);
}
 
+   /* invalidate the parent cache */
+   if (!parent && invalidate_parent) {
+   for (i = 0; i < core->num_parents; i++) {
+   if (core->parents[i].core == old_parent)
+   core->parents[i].core = NULL;
+   }
+   }
+
 runtime_put:
clk_pm_runtime_put(core);
 
@@ -2517,7 +2528,8 @@ int clk_set_parent(struct clk *clk, struct clk *parent)
clk_core_rate_unprotect(clk->core);
 
ret = clk_core_set_parent_nolock(clk->core,
-parent ? parent->core : NULL);
+parent ? parent->core : NULL,
+false);
 
if (clk->exclusive_count)
clk_core_rate_protect(clk->core);
@@ -3772,7 +3784,7 @@ void clk_unregister(struct clk *clk)
/* Reparent all children to the orphan list. */
hlist_for_each_entry_safe(child, t, >core->children,
  child_node)
-   clk_core_set_parent_nolock(child, NULL);
+   clk_core_set_parent_nolock(child, NULL, true);
}
 
hlist_del_init(>core->child_node);
-- 
2.18.0



Re: [PATCH 5/5] arm64: dts: qcom: sdm845-cheza: remove macro from unit name

2019-07-22 Thread Amit Kucheria
On Mon, Jul 22, 2019 at 6:06 PM Vinod Koul  wrote:
>
> Unit name is supposed to be a number, using a macro with hex value is

/s/name/address?

> not recommended, so add the value in unit name.
>
> arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:966.16-969.4: Warning 
> (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4d: 
> unit name should not have leading "0x"
> arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:971.16-974.4: Warning 
> (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4e: 
> unit name should not have leading "0x"
> arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:976.16-979.4: Warning 
> (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x4f: 
> unit name should not have leading "0x"
> arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:981.16-984.4: Warning 
> (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x50: 
> unit name should not have leading "0x"
> arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi:986.16-989.4: Warning 
> (unit_address_format): /soc@0/spmi@c44/pmic@0/adc@3100/adc-chan@0x51: 
> unit name should not have leading "0x"
>
> Signed-off-by: Vinod Koul 
> ---
>  arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi 
> b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> index 1ebbd568dfd7..9b27b8346ba1 100644
> --- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
> @@ -963,27 +963,27 @@ ap_ts_i2c:  {
>  };
>
>  _adc {
> -   adc-chan@ADC5_AMUX_THM1_100K_PU {
> +   adc-chan@4d {
> reg = ;

I'm a little conflicted about this change. If we're replacing the
address with actual values, perhaps we should do that same for the reg
property to keep them in sync? Admittedly though, it is a bit easier
to read the macro name and figure out its meaning.

> label = "sdm_temp";
> };
>
> -   adc-chan@ADC5_AMUX_THM2_100K_PU {
> +   adc-chan@4e {
> reg = ;
> label = "quiet_temp";
> };
>
> -   adc-chan@ADC5_AMUX_THM3_100K_PU {
> +   adc-chan@4f {
> reg = ;
> label = "lte_temp_1";
> };
>
> -   adc-chan@ADC5_AMUX_THM4_100K_PU {
> +   adc-chan@50 {
> reg = ;
> label = "lte_temp_2";
> };
>
> -   adc-chan@ADC5_AMUX_THM5_100K_PU {
> +   adc-chan@51 {
> reg = ;
> label = "charger_temp";
> };
> --
> 2.20.1
>


Re: kernel BUG at mm/swap_state.c:170!

2019-07-22 Thread Huang, Ying
Mikhail Gavrilov  writes:

> On Mon, 22 Jul 2019 at 12:53, Huang, Ying  wrote:
>>
>> Yes.  This is quite complex.  Is the transparent huge page enabled in
>> your system?  You can check the output of
>>
>> $ cat /sys/kernel/mm/transparent_hugepage/enabled
>
> always [madvise] never
>
>> And, whether is the swap device you use a SSD or NVMe disk (not HDD)?
>
> NVMe INTEL Optane 905P SSDPE21D480GAM3

Thanks!  I have found another (easier way) to reproduce the panic.
Could you try the below patch on top of v5.2-rc2?  It can fix the panic
for me.

Best Regards,
Huang, Ying

---8<--
>From 5e519c2de54b9fd4b32b7a59e47ce7f94beb8845 Mon Sep 17 00:00:00 2001
From: Huang Ying 
Date: Tue, 23 Jul 2019 08:49:57 +0800
Subject: [PATCH] dbg xa head

---
 mm/huge_memory.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..c6ca1c7157ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2482,6 +2482,8 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
struct lruvec *lruvec;
+   struct address_space *swap_cache = NULL;
+   unsigned long offset;
int i;
 
lruvec = mem_cgroup_page_lruvec(head, pgdat);
@@ -2489,6 +2491,14 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
 
+   if (PageAnon(head) && PageSwapCache(head)) {
+   swp_entry_t entry = { .val = page_private(head) };
+
+   offset = swp_offset(entry);
+   swap_cache = swap_address_space(entry);
+   xa_lock(_cache->i_pages);
+   }
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2501,6 +2511,9 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
} else if (!PageAnon(page)) {
__xa_store(>mapping->i_pages, head[i].index,
head + i, 0);
+   } else if (swap_cache) {
+   __xa_store(_cache->i_pages, offset + i,
+  head + i, 0);
}
}
 
@@ -2508,9 +2521,10 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
-   if (PageSwapCache(head))
+   if (PageSwapCache(head)) {
page_ref_add(head, 2);
-   else
+   xa_unlock(_cache->i_pages);
+   } else
page_ref_inc(head);
} else {
/* Additional pin to page cache */
-- 
2.20.1



Re: nl80211 wlcore regression in next

2019-07-22 Thread Johannes Berg
Hi,

> Looks like this one crept back as the fix is missing from v5.3-rc1.
> 
> Forgot to include in the pull request?

More like forgot to send the pull request, my bad. I eventually realized
a couple of days ago and it'll be coming upstream soon. Sorry about
that.

johannes



Re: WARNING in __mmdrop

2019-07-22 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 11:55:28AM +0800, Jason Wang wrote:
> 
> On 2019/7/22 下午4:02, Michael S. Tsirkin wrote:
> > On Mon, Jul 22, 2019 at 01:21:59PM +0800, Jason Wang wrote:
> > > On 2019/7/21 下午6:02, Michael S. Tsirkin wrote:
> > > > On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:
> > > > > syzbot has bisected this bug to:
> > > > > 
> > > > > commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
> > > > > Author: Jason Wang 
> > > > > Date:   Fri May 24 08:12:18 2019 +
> > > > > 
> > > > >   vhost: access vq metadata through kernel virtual address
> > > > > 
> > > > > bisection log:  
> > > > > https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
> > > > > start commit:   6d21a41b Add linux-next specific files for 20190718
> > > > > git tree:   linux-next
> > > > > final crash:
> > > > > https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
> > > > > console output: 
> > > > > https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
> > > > > kernel config:  
> > > > > https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
> > > > > dashboard link: 
> > > > > https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
> > > > > syz repro:  
> > > > > https://syzkaller.appspot.com/x/repro.syz?x=10139e6860
> > > > > 
> > > > > Reported-by: syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
> > > > > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual
> > > > > address")
> > > > > 
> > > > > For information about bisection process see: 
> > > > > https://goo.gl/tpsmEJ#bisection
> > > > OK I poked at this for a bit, I see several things that
> > > > we need to fix, though I'm not yet sure it's the reason for
> > > > the failures:
> > > > 
> > > > 
> > > > 1. mmu_notifier_register shouldn't be called from 
> > > > vhost_vring_set_num_addr
> > > >  That's just a bad hack,
> > > 
> > > This is used to avoid holding lock when checking whether the addresses are
> > > overlapped. Otherwise we need to take spinlock for each invalidation 
> > > request
> > > even if it was the va range that is not interested for us. This will be 
> > > very
> > > slow e.g during guest boot.
> > KVM seems to do exactly that.
> > I tried and guest does not seem to boot any slower.
> > Do you observe any slowdown?
> 
> 
> Yes I do.
> 
> 
> > 
> > Now I took a hard look at the uaddr hackery it really makes
> > me nervious. So I think for this release we want something
> > safe, and optimizations on top. As an alternative revert the
> > optimization and try again for next merge window.
> 
> 
> Will post a series of fixes, let me know if you're ok with that.
> 
> Thanks

I'd prefer you to take a hard look at the patch I posted
which makes code cleaner, and ad optimizations on top.
But other ways could be ok too.

> 
> > 
> > 


Re: WARNING in __mmdrop

2019-07-22 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 12:01:40PM +0800, Jason Wang wrote:
> 
> On 2019/7/22 下午4:08, Michael S. Tsirkin wrote:
> > On Mon, Jul 22, 2019 at 01:24:24PM +0800, Jason Wang wrote:
> > > On 2019/7/21 下午8:18, Michael S. Tsirkin wrote:
> > > > On Sun, Jul 21, 2019 at 06:02:52AM -0400, Michael S. Tsirkin wrote:
> > > > > On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:
> > > > > > syzbot has bisected this bug to:
> > > > > > 
> > > > > > commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
> > > > > > Author: Jason Wang
> > > > > > Date:   Fri May 24 08:12:18 2019 +
> > > > > > 
> > > > > >   vhost: access vq metadata through kernel virtual address
> > > > > > 
> > > > > > bisection 
> > > > > > log:https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
> > > > > > start commit:   6d21a41b Add linux-next specific files for 20190718
> > > > > > git tree:   linux-next
> > > > > > final 
> > > > > > crash:https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
> > > > > > console 
> > > > > > output:https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
> > > > > > kernel 
> > > > > > config:https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
> > > > > > dashboard 
> > > > > > link:https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
> > > > > > syz repro:https://syzkaller.appspot.com/x/repro.syz?x=10139e6860
> > > > > > 
> > > > > > Reported-by:syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
> > > > > > Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel 
> > > > > > virtual
> > > > > > address")
> > > > > > 
> > > > > > For information about bisection process 
> > > > > > see:https://goo.gl/tpsmEJ#bisection
> > > > > OK I poked at this for a bit, I see several things that
> > > > > we need to fix, though I'm not yet sure it's the reason for
> > > > > the failures:
> > > > > 
> > > > > 
> > > > > 1. mmu_notifier_register shouldn't be called from 
> > > > > vhost_vring_set_num_addr
> > > > >  That's just a bad hack, in particular I don't think device
> > > > >  mutex is taken and so poking at two VQs will corrupt
> > > > >  memory.
> > > > >  So what to do? How about a per vq notifier?
> > > > >  Of course we also have synchronize_rcu
> > > > >  in the notifier which is slow and is now going to be called 
> > > > > twice.
> > > > >  I think call_rcu would be more appropriate here.
> > > > >  We then need rcu_barrier on module unload.
> > > > >  OTOH if we make pages linear with map then we are good
> > > > >  with kfree_rcu which is even nicer.
> > > > > 
> > > > > 2. Doesn't map leak after vhost_map_unprefetch?
> > > > >  And why does it poke at contents of the map?
> > > > >  No one should use it right?
> > > > > 
> > > > > 3. notifier unregister happens last in vhost_dev_cleanup,
> > > > >  but register happens first. This looks wrong to me.
> > > > > 
> > > > > 4. OK so we use the invalidate count to try and detect that
> > > > >  some invalidate is in progress.
> > > > >  I am not 100% sure why do we care.
> > > > >  Assuming we do, uaddr can change between start and end
> > > > >  and then the counter can get negative, or generally
> > > > >  out of sync.
> > > > > 
> > > > > So what to do about all this?
> > > > > I am inclined to say let's just drop the uaddr optimization
> > > > > for now. E.g. kvm invalidates unconditionally.
> > > > > 3 should be fixed independently.
> > > > Above implements this but is only build-tested.
> > > > Jason, pls take a look. If you like the approach feel
> > > > free to take it from here.
> > > > 
> > > > One thing the below does not have is any kind of rate-limiting.
> > > > Given it's so easy to restart I'm thinking it makes sense
> > > > to add a generic infrastructure for this.
> > > > Can be a separate patch I guess.
> > > 
> > > I don't get why must use kfree_rcu() instead of synchronize_rcu() here.
> > synchronize_rcu has very high latency on busy systems.
> > It is not something that should be used on a syscall path.
> > KVM had to switch to SRCU to keep it sane.
> > Otherwise one guest can trivially slow down another one.
> 
> 
> I think you mean the synchronize_rcu_expedited()? Rethink of the code, the
> synchronize_rcu() in ioctl() could be removed, since it was serialized with
> memory accessor.


Really let's just use kfree_rcu. It's way cleaner: fire and forget.

> 
> Btw, for kvm ioctl it still uses synchronize_rcu() in kvm_vcpu_ioctl(),
> (just a little bit more hard to trigger):


AFAIK these never run in response to guest events.
So they can take very long and guests still won't crash.


> 
>     case KVM_RUN: {
> ...
>         if (unlikely(oldpid != task_pid(current))) {
>             /* The thread running this VCPU changed. */
>             struct pid *newpid;
> 
>             r = kvm_arch_vcpu_run_pid_change(vcpu);
>             if (r)
>                 break;
> 
>             newpid = get_task_pid(current, PIDTYPE_PID);
>    

Re: [PATCH 1/2] string: Add stracpy and stracpy_pad mechanisms

2019-07-22 Thread Joe Perches
On Mon, 2019-07-22 at 21:35 -0700, Andrew Morton wrote:
> On Mon, 22 Jul 2019 17:38:15 -0700 Joe Perches  wrote:
> 
> > Several uses of strlcpy and strscpy have had defects because the
> > last argument of each function is misused or typoed.
> > 
> > Add macro mechanisms to avoid this defect.
> > 
> > stracpy (copy a string to a string array) must have a string
> > array as the first argument (to) and uses sizeof(to) as the
> > size.
> > 
> > These mechanisms verify that the to argument is an array of
> > char or other compatible types like u8 or unsigned char.
> > 
> > A BUILD_BUG is emitted when the type of to is not compatible.
> > 
> 
> It would be nice to include some conversions.  To demonstrate the need,
> to test the code, etc.

How about all the kernel/ ?
---
 kernel/acct.c  | 2 +-
 kernel/cgroup/cgroup-v1.c  | 3 +--
 kernel/debug/gdbstub.c | 4 ++--
 kernel/debug/kdb/kdb_support.c | 2 +-
 kernel/events/core.c   | 4 ++--
 kernel/module.c| 2 +-
 kernel/printk/printk.c | 2 +-
 kernel/time/clocksource.c  | 2 +-
 8 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/kernel/acct.c b/kernel/acct.c
index 81f9831a7859..5ad29248b654 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -425,7 +425,7 @@ static void fill_ac(acct_t *ac)
memset(ac, 0, sizeof(acct_t));
 
ac->ac_version = ACCT_VERSION | ACCT_BYTEORDER;
-   strlcpy(ac->ac_comm, current->comm, sizeof(ac->ac_comm));
+   stracpy(ac->ac_comm, current->comm);
 
/* calculate run_time in nsec*/
run_time = ktime_get_ns();
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 88006be40ea3..dd4f041e4179 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -571,8 +571,7 @@ static ssize_t cgroup_release_agent_write(struct 
kernfs_open_file *of,
if (!cgrp)
return -ENODEV;
spin_lock(_agent_path_lock);
-   strlcpy(cgrp->root->release_agent_path, strstrip(buf),
-   sizeof(cgrp->root->release_agent_path));
+   stracpy(cgrp->root->release_agent_path, strstrip(buf));
spin_unlock(_agent_path_lock);
cgroup_kn_unlock(of->kn);
return nbytes;
diff --git a/kernel/debug/gdbstub.c b/kernel/debug/gdbstub.c
index 4b280fc7dd67..a263f27f51ad 100644
--- a/kernel/debug/gdbstub.c
+++ b/kernel/debug/gdbstub.c
@@ -1095,10 +1095,10 @@ int gdbstub_state(struct kgdb_state *ks, char *cmd)
return error;
case 's':
case 'c':
-   strscpy(remcom_in_buffer, cmd, sizeof(remcom_in_buffer));
+   stracpy(remcom_in_buffer, cmd);
return 0;
case '$':
-   strscpy(remcom_in_buffer, cmd, sizeof(remcom_in_buffer));
+   stracpy(remcom_in_buffer, cmd);
gdbstub_use_prev_in_buf = strlen(remcom_in_buffer);
gdbstub_prev_in_buf_pos = 0;
return 0;
diff --git a/kernel/debug/kdb/kdb_support.c b/kernel/debug/kdb/kdb_support.c
index b8e6306e7e13..b49b6c3976c7 100644
--- a/kernel/debug/kdb/kdb_support.c
+++ b/kernel/debug/kdb/kdb_support.c
@@ -192,7 +192,7 @@ int kallsyms_symbol_complete(char *prefix_name, int max_len)
 
while ((name = kdb_walk_kallsyms())) {
if (strncmp(name, prefix_name, prefix_len) == 0) {
-   strscpy(ks_namebuf, name, sizeof(ks_namebuf));
+   stracpy(ks_namebuf, name);
/* Work out the longest name that matches the prefix */
if (++number == 1) {
prev_len = min_t(int, max_len-1,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 026a14541a38..25bd8c777270 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7049,7 +7049,7 @@ static void perf_event_comm_event(struct perf_comm_event 
*comm_event)
unsigned int size;
 
memset(comm, 0, sizeof(comm));
-   strlcpy(comm, comm_event->task->comm, sizeof(comm));
+   stracpy(comm, comm_event->task->comm);
size = ALIGN(strlen(comm)+1, sizeof(u64));
 
comm_event->comm = comm;
@@ -7394,7 +7394,7 @@ static void perf_event_mmap_event(struct perf_mmap_event 
*mmap_event)
}
 
 cpy_name:
-   strlcpy(tmp, name, sizeof(tmp));
+   stracpy(tmp, name);
name = tmp;
 got_name:
/*
diff --git a/kernel/module.c b/kernel/module.c
index 5933395af9a0..39384b0c90b8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1021,7 +1021,7 @@ SYSCALL_DEFINE2(delete_module, const char __user *, 
name_user,
async_synchronize_full();
 
/* Store the name of the last unloaded module for diagnostic purposes */
-   strlcpy(last_unloaded_module, mod->name, sizeof(last_unloaded_module));
+   stracpy(last_unloaded_module, mod->name);
 
free_module(mod);
return 0;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 

Re: [PATCH 1/2] string: Add stracpy and stracpy_pad mechanisms

2019-07-22 Thread Andrew Morton
On Mon, 22 Jul 2019 17:38:15 -0700 Joe Perches  wrote:

> Several uses of strlcpy and strscpy have had defects because the
> last argument of each function is misused or typoed.
> 
> Add macro mechanisms to avoid this defect.
> 
> stracpy (copy a string to a string array) must have a string
> array as the first argument (to) and uses sizeof(to) as the
> size.
> 
> These mechanisms verify that the to argument is an array of
> char or other compatible types like u8 or unsigned char.
> 
> A BUILD_BUG is emitted when the type of to is not compatible.
> 

It would be nice to include some conversions.  To demonstrate the need,
to test the code, etc.



Re: [PATCH] Revert "kvm: x86: Use task structs fpu field for user"

2019-07-22 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 240c35a3783a kvm: x86: Use task structs fpu field for user.

The bot has tested the following trees: v5.2.2, v5.1.19.

v5.2.2: Build OK!
v5.1.19: Failed to apply! Possible dependencies:
0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
2722146eb784 ("x86/fpu: Remove fpu->initialized")
4ee91519e1dc ("x86/fpu: Add an __fpregs_load_activate() internal helper")
5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks,
Sasha


cgroups v2: issues faced attempting to setup cpu limiting

2019-07-22 Thread Kaiwan N Billimoria
Hi All,

Am facing some issues getting CPU limiting working with cgroups v2. Pl read on
for the details; a solution is much appreciated!


Env: 5.0.0 Linux kernel on x86_64 Fedora 29

When attempting to setup CPU limiting using cgroups v2, I effectively
disabled cgroups v1 by passing

cgroup_no_v1=all

as a kernel cmdline option. That seems fine and now various controllers (cpu,
io, memory, ...) show up under /sys/fs/cgroup/unified/cgroup.controllers.

However, doing:

# mkdir /sys/fs/cgroup/unified/test1
# echo "+cpu " > /sys/fs/cgroup/unified/test1/cgroup.subtree_control
bash: echo: write error: No such file or directory
#

I understand that this is expected, as the man page on cgroups(7) mentions:

"... As at Linux 4.15, the cgroups v2 cpu controller does not support
control of realtime processes, and the controller can be enabled in the
root cgroup only if all realtime threads are in the root cgroup. (If there
are realtime processes in nonroot cgroups, then a write(2) of the string
"+cpu" to the cgroup.subtree_control file fails with the error EINVAL.
However, on some systems, systemd(1) places certain realtime processes in
nonroot cgroups in the v2 hierarchy. On such systems, these processes must
first be moved to the root cgroup before the cpu controller can be
enabled. ..."

My questions are (forgive them if too basic!): how exactly does one
'move realtime processes to the root cgroup'? What are the commands?

Next, how does one identify which processes? The ones that have sched policy
SCHED_FIFO or SCHED_RR? Would using a utility wrapper make this simpler?
(libcgroup, cgmanager, etc) - do they play well with cgroups2?

TIA!

Regards,
Kaiwan.
---
amazon author page: https://www.amazon.com/-/e/B07KNJSRJX


linux-next: Tree for Jul 23

2019-07-22 Thread Stephen Rothwell
Hi all,

Changes since 20190722:

The v4l-dvb tree gained a build failure for which I applied a patch.

The drm-intel tree gained a conflict against the kspp-gustavo tree.

Non-merge commits (relative to Linus' tree): 1453
 1509 files changed, 143742 insertions(+), 24819 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 299 trees (counting Linus' and 72 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (7b5cf701ea9c Merge branch 'sched-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip)
Merging fixes/master (c309b6f24222 Merge tag 'docs/v5.3-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media)
Merging kbuild-current/fixes (5f9e832c1370 Linus 5.3-rc1)
Merging arc-current/for-curr (24a20b0a443f ARC: [plat-hsdk]: Enable AXI DW DMAC 
in defconfig)
Merging arm-current/fixes (c5d0e49e8d8f ARM: 8867/1: vdso: pass --be8 to linker 
if necessary)
Merging arm-soc-fixes/arm/fixes (ae00fcc51e71 ARM: Delete netx a second time)
Merging arm64-fixes/for-next/fixes (40ca0ce56d4b arm64: entry: SP Alignment 
Fault doesn't write to FAR_EL1)
Merging m68k-current/for-linus (f28a1f16135c m68k: Don't select 
ARCH_HAS_DMA_PREP_COHERENT for nommu or coldfire)
Merging powerpc-fixes/fixes (f16d80b75a09 powerpc/tm: Fix oops on sigreturn on 
systems without TM)
Merging s390-fixes/fixes (9a159190414d s390/unwind: avoid int overflow in 
outside_of_stack)
Merging sparc/master (192f0f8e9db7 Merge tag 'powerpc-5.3-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (12185dfe4436 bonding: Force slave speed check after link 
state recovery for 802.3ad)
Merging bpf/master (c8eee4135a45 selftests/bpf: fix sendmsg6_prog on s390)
Merging ipsec/master (22d6552f827e xfrm interface: fix management of phydev)
Merging netfilter/master (15a78ba1844a netfilter: ebtables: fix a memory leak 
bug in compat)
Merging ipvs/master (58e8b37069ff Merge branch 'net-phy-dp83867-add-some-fixes')
Merging wireless-drivers/master (5f9e832c1370 Linus 5.3-rc1)
Merging mac80211/master (d2b3fe42bc62 mac80211: don't warn about CW params when 
not using them)
Merging rdma-fixes/for-rc (5f9e832c1370 Linus 5.3-rc1)
Merging sound-current/for-linus (e4091bdd2fd9 ALSA: line6: Fix a typo)
Merging sound-asoc-fixes/for-linus (5ee3c836a2ad Merge branch 'asoc-5.3' into 
asoc-linus)
Merging regmap-fixes/for-linus (5f9e832c1370 Linus 5.3-rc1)
Merging regulator-fixes/for-linus (b9131a51dc49 Merge branch 'regulator-5.3' 
into regulator-linus)
Merging spi-fixes/for-linus (29a603af8bc6 Merge branch 'spi-5.3' into spi-linus)
Merging pci-current/for-linus (6dbbd053e6ae PCI/P2PDMA: Ignore root complex 
whitelist when an IOMMU is present)
Merging driver-core.current/driver-core-linus (5f9e832c1370 Linus 5.3-rc1)
Merging tty.current/tty-linus (5f9e832c1370 Linus 5.3-rc1)
Merging usb.current/usb-linus (5f9e832c1370 Linus 5.3-rc1)
Merging usb-gadget-fixes/fixes (42de8afc40c9 usb: dwc2: Use generic PHY width 
in params setup)
Merging usb-serial-fixes/usb-linus (f8377eff5481 USB: serial: ftdi_sio: add ID 
for isodebug v1)
Merging usb-chipidea-fixes/ci-for-usb-stable (16009db47c51 usb: chipidea: udc: 
workaround for endpoint conflict issu

[PATCH] kbuild: remove unused objectify macro

2019-07-22 Thread Masahiro Yamada
Commit 415008af3219 ("docs-rst: convert lsm from DocBook to ReST")
removed the last users of this macro.

Signed-off-by: Masahiro Yamada 
---

 scripts/Kbuild.include | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 12666fc922ea..10ba926ae292 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -185,9 +185,6 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\
 # printing commands
 cmd = @set -e; $(echo-cmd) $(cmd_$(1))
 
-# Add $(obj)/ for paths that are not absolute
-objectify = $(foreach o,$(1),$(if $(filter /%,$(o)),$(o),$(obj)/$(o)))
-
 ###
 # if_changed  - execute command if any prerequisite is newer than
 #   target, or command line has changed
-- 
2.17.1



Re: [PATCH v6 14/14] arm64: dts: Add power controller device node of MT8183

2019-07-22 Thread Weiyi Lu
On Tue, 2019-07-16 at 09:50 +0800, CK Hu wrote:
> Hi, Weiyi:
> 
> On Mon, 2019-07-15 at 17:07 +0800, Weiyi Lu wrote:
> > On Mon, 2019-07-15 at 16:07 +0800, CK Hu wrote:
> > > Hi, Weiyi:
> > > 
> > > On Mon, 2019-07-01 at 16:57 +0800, CK Hu wrote:
> > > > Hi, Weiyi:
> > > > 
> > > > On Thu, 2019-06-20 at 10:38 +0800, Weiyi Lu wrote:
> > > > > Add power controller node and smi-common node for MT8183
> > > > > In scpsys node, it contains clocks and regmapping of
> > > > > infracfg and smi-common for bus protection.
> > > > > 
> > > > > Signed-off-by: Weiyi Lu 
> > > > > ---
> > > > >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 62 
> > > > > 
> > > > >  1 file changed, 62 insertions(+)
> > > > > 
> > > > > diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi 
> > > > > b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > > index 08274bf..75c4881 100644
> > > > > --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > > +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > > @@ -8,6 +8,7 @@
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > +#include 
> > > > >  
> > > > >  / {
> > > > >   compatible = "mediatek,mt8183";
> > > > > @@ -196,6 +197,62 @@
> > > > >   #clock-cells = <1>;
> > > > >   };
> > > > >  
> > > > > + scpsys: syscon@10006000 {
> > > > > + compatible = "mediatek,mt8183-scpsys", "syscon";
> > > > > + #power-domain-cells = <1>;
> > > > > + reg = <0 0x10006000 0 0x1000>;
> > > > > + clocks = < CLK_TOP_MUX_AUD_INTBUS>,
> > > > > +  < CLK_INFRA_AUDIO>,
> > > > > +  < CLK_INFRA_AUDIO_26M_BCLK>,
> > > > > +  < CLK_TOP_MUX_MFG>,
> > > > > +  < CLK_TOP_MUX_MM>,
> > > > > +  < CLK_TOP_MUX_CAM>,
> > > > > +  < CLK_TOP_MUX_IMG>,
> > > > > +  < CLK_TOP_MUX_IPU_IF>,
> > > > > +  < CLK_TOP_MUX_DSP>,
> > > > > +  < CLK_TOP_MUX_DSP1>,
> > > > > +  < CLK_TOP_MUX_DSP2>,
> > > > > +  < CLK_MM_SMI_COMMON>,
> > > > > +  < CLK_MM_SMI_LARB0>,
> > > > > +  < CLK_MM_SMI_LARB1>,
> > > > > +  < CLK_MM_GALS_COMM0>,
> > > > > +  < CLK_MM_GALS_COMM1>,
> > > > > +  < CLK_MM_GALS_CCU2MM>,
> > > > > +  < CLK_MM_GALS_IPU12MM>,
> > > > > +  < CLK_MM_GALS_IMG2MM>,
> > > > > +  < CLK_MM_GALS_CAM2MM>,
> > > > > +  < CLK_MM_GALS_IPU2MM>,
> > > 
> > > I've removed all mmsys clock in scpsys node and display still works, so
> > > I think these subsys clock could be removed from scpsys node. It's
> > > reasonable that subsys clock is controlled by subsys device or the
> > > device use it. In MT2712 [1], the scpsys does not control subsys clock
> > > and it works, so I think you should remove subsys clock in scpsys device
> > > node.
> > > 
> > > [1]
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/mediatek/mt2712e.dtsi?h=v5.2
> > > 
> > > Regards,
> > > CK
> > > 
> > 
> > Hello CK,
> > 
> > Sorry, I can't agree with you at all.
> > I thought you just created an environment where the MM (DISP) power
> > domain could not be turned on and off properly.
> > If you delete those mmsys clocks listed, bus protection will not work.
> > These clocks are used for bus protection that I mentioned in patch [2].
> > I guess you are now trying to solve the problem that mmsys blocks are
> > used for probing two drivers. One for the display and another for the
> > clock. Right?
> > In the previous test you mentioned, you have affected the registration
> > of mmsys clock first. This is why you saw the boot failure. I think boot
> > failure is the real problem I should avoid if mmsys clock cannot probe.
> > 
> > [2] https://patchwork.kernel.org/patch/11005747/
> > 
> 
> OK, I'll try another way to fix the probe problem, but I still have
> question about bus protection. I'm not sure how bus protection works,
> but I think that what mtk_scpsys_ext_clear_bus_protection() do could be
> moved in mtk_smi_clk_enable(). How do you think?
> 
> Regards,
> CK
> 

I think we need to consider the disable case as well.
And SMI may not be the only DISP power domain user. As far as I know and
being requested, bus protection should only be set when DISP power
domain is going to be turned OFF, and vise versa.
But if SMI will turn ON before all the other multimedia drivers and be
the last one to turn OFF DISP power domain, it might be worth trying.

> > > 
> > > > 
> > > > Up to now, MT8183 mmsys 

Re: WARNING in __mmdrop

2019-07-22 Thread Jason Wang



On 2019/7/22 下午4:08, Michael S. Tsirkin wrote:

On Mon, Jul 22, 2019 at 01:24:24PM +0800, Jason Wang wrote:

On 2019/7/21 下午8:18, Michael S. Tsirkin wrote:

On Sun, Jul 21, 2019 at 06:02:52AM -0400, Michael S. Tsirkin wrote:

On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:

syzbot has bisected this bug to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang
Date:   Fri May 24 08:12:18 2019 +

  vhost: access vq metadata through kernel virtual address

bisection log:https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
start commit:   6d21a41b Add linux-next specific files for 20190718
git tree:   linux-next
final crash:https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
console output:https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
kernel config:https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
dashboard link:https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
syz repro:https://syzkaller.appspot.com/x/repro.syz?x=10139e6860

Reported-by:syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual
address")

For information about bisection process see:https://goo.gl/tpsmEJ#bisection

OK I poked at this for a bit, I see several things that
we need to fix, though I'm not yet sure it's the reason for
the failures:


1. mmu_notifier_register shouldn't be called from vhost_vring_set_num_addr
 That's just a bad hack, in particular I don't think device
 mutex is taken and so poking at two VQs will corrupt
 memory.
 So what to do? How about a per vq notifier?
 Of course we also have synchronize_rcu
 in the notifier which is slow and is now going to be called twice.
 I think call_rcu would be more appropriate here.
 We then need rcu_barrier on module unload.
 OTOH if we make pages linear with map then we are good
 with kfree_rcu which is even nicer.

2. Doesn't map leak after vhost_map_unprefetch?
 And why does it poke at contents of the map?
 No one should use it right?

3. notifier unregister happens last in vhost_dev_cleanup,
 but register happens first. This looks wrong to me.

4. OK so we use the invalidate count to try and detect that
 some invalidate is in progress.
 I am not 100% sure why do we care.
 Assuming we do, uaddr can change between start and end
 and then the counter can get negative, or generally
 out of sync.

So what to do about all this?
I am inclined to say let's just drop the uaddr optimization
for now. E.g. kvm invalidates unconditionally.
3 should be fixed independently.

Above implements this but is only build-tested.
Jason, pls take a look. If you like the approach feel
free to take it from here.

One thing the below does not have is any kind of rate-limiting.
Given it's so easy to restart I'm thinking it makes sense
to add a generic infrastructure for this.
Can be a separate patch I guess.


I don't get why must use kfree_rcu() instead of synchronize_rcu() here.

synchronize_rcu has very high latency on busy systems.
It is not something that should be used on a syscall path.
KVM had to switch to SRCU to keep it sane.
Otherwise one guest can trivially slow down another one.



I think you mean the synchronize_rcu_expedited()? Rethink of the code, 
the synchronize_rcu() in ioctl() could be removed, since it was 
serialized with memory accessor.


Btw, for kvm ioctl it still uses synchronize_rcu() in kvm_vcpu_ioctl(), 
(just a little bit more hard to trigger):



    case KVM_RUN: {
...
        if (unlikely(oldpid != task_pid(current))) {
            /* The thread running this VCPU changed. */
            struct pid *newpid;

            r = kvm_arch_vcpu_run_pid_change(vcpu);
            if (r)
                break;

            newpid = get_task_pid(current, PIDTYPE_PID);
            rcu_assign_pointer(vcpu->pid, newpid);
            if (oldpid)
                synchronize_rcu();
            put_pid(oldpid);
        }
...
        break;





Signed-off-by: Michael S. Tsirkin


Let me try to figure out the root cause then decide whether or not to go for
this way.

Thanks

The root cause of the crash is relevant, but we still need
to fix issues 1-4.

More issues (my patch tries to fix them too):

5. page not dirtied when mappings are torn down outside
of invalidate callback



Yes.




6. potential cross-VM DOS by one guest keeping system busy
and increasing synchronize_rcu latency to the point where
another guest stars timing out and crashes





This will be addressed after I remove the synchronize_rcu() from ioctl path.

Thanks



Re: WARNING in __mmdrop

2019-07-22 Thread Jason Wang



On 2019/7/22 下午4:02, Michael S. Tsirkin wrote:

On Mon, Jul 22, 2019 at 01:21:59PM +0800, Jason Wang wrote:

On 2019/7/21 下午6:02, Michael S. Tsirkin wrote:

On Sat, Jul 20, 2019 at 03:08:00AM -0700, syzbot wrote:

syzbot has bisected this bug to:

commit 7f466032dc9e5a61217f22ea34b2df932786bbfc
Author: Jason Wang 
Date:   Fri May 24 08:12:18 2019 +

  vhost: access vq metadata through kernel virtual address

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=149a8a2060
start commit:   6d21a41b Add linux-next specific files for 20190718
git tree:   linux-next
final crash:https://syzkaller.appspot.com/x/report.txt?x=169a8a2060
console output: https://syzkaller.appspot.com/x/log.txt?x=129a8a2060
kernel config:  https://syzkaller.appspot.com/x/.config?x=3430a151e1452331
dashboard link: https://syzkaller.appspot.com/bug?extid=e58112d71f77113ddb7b
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=10139e6860

Reported-by: syzbot+e58112d71f77113dd...@syzkaller.appspotmail.com
Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual
address")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

OK I poked at this for a bit, I see several things that
we need to fix, though I'm not yet sure it's the reason for
the failures:


1. mmu_notifier_register shouldn't be called from vhost_vring_set_num_addr
 That's just a bad hack,


This is used to avoid holding lock when checking whether the addresses are
overlapped. Otherwise we need to take spinlock for each invalidation request
even if it was the va range that is not interested for us. This will be very
slow e.g during guest boot.

KVM seems to do exactly that.
I tried and guest does not seem to boot any slower.
Do you observe any slowdown?



Yes I do.




Now I took a hard look at the uaddr hackery it really makes
me nervious. So I think for this release we want something
safe, and optimizations on top. As an alternative revert the
optimization and try again for next merge window.



Will post a series of fixes, let me know if you're ok with that.

Thanks







Re: [PATCH] ktest: Fix some typos in config-bisect.pl

2019-07-22 Thread Randy Dunlap
On 7/22/19 8:24 PM, Masanari Iida wrote:
> This patch fixes some spelling typos in config-bisect.pl
> 
> Signed-off-by: Masanari Iida 

Acked-by: Randy Dunlap 

Thanks.

> ---
>  tools/testing/ktest/config-bisect.pl | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/ktest/config-bisect.pl 
> b/tools/testing/ktest/config-bisect.pl
> index 72525426654b..6fd864935319 100755
> --- a/tools/testing/ktest/config-bisect.pl
> +++ b/tools/testing/ktest/config-bisect.pl
> @@ -663,7 +663,7 @@ while ($#ARGV >= 0) {
>  }
>  
>  else {
> - die "Unknow option $opt\n";
> + die "Unknown option $opt\n";
>  }
>  }
>  
> @@ -732,7 +732,7 @@ if ($start) {
>   }
>  }
>  run_command "cp $good_start $good" or die "failed to copy to $good\n";
> -run_command "cp $bad_start $bad" or die "faield to copy to $bad\n";
> +run_command "cp $bad_start $bad" or die "failed to copy to $bad\n";
>  } else {
>  if ( ! -f $good ) {
>   die "Can not find file $good\n";
> 


-- 
~Randy


Re: [PATCH V6 16/21] soc/tegra: pmc: Add pmc wake support for tegra210

2019-07-22 Thread Dmitry Osipenko
23.07.2019 6:31, Sowjanya Komatineni пишет:
> 
> On 7/22/19 8:25 PM, Dmitry Osipenko wrote:
>> 23.07.2019 6:09, Sowjanya Komatineni пишет:
>>> On 7/22/19 8:03 PM, Dmitry Osipenko wrote:
 23.07.2019 4:52, Sowjanya Komatineni пишет:
> On 7/22/19 6:41 PM, Dmitry Osipenko wrote:
>> 23.07.2019 4:08, Dmitry Osipenko пишет:
>>> 23.07.2019 3:58, Dmitry Osipenko пишет:
 21.07.2019 22:40, Sowjanya Komatineni пишет:
> This patch implements PMC wakeup sequence for Tegra210 and defines
> common used RTC alarm wake event.
>
> Signed-off-by: Sowjanya Komatineni 
> ---
>    drivers/soc/tegra/pmc.c | 111
> 
>    1 file changed, 111 insertions(+)
>
> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
> index 91c84d0e66ae..c556f38874e1 100644
> --- a/drivers/soc/tegra/pmc.c
> +++ b/drivers/soc/tegra/pmc.c
> @@ -57,6 +57,12 @@
>    #define  PMC_CNTRL_SYSCLK_OE    BIT(11) /* system clock
> enable */
>    #define  PMC_CNTRL_SYSCLK_POLARITY    BIT(10) /* sys clk
> polarity */
>    #define  PMC_CNTRL_MAIN_RST    BIT(4)
> +#define  PMC_CNTRL_LATCH_WAKEUPS    BIT(5)
>>> Please follow the TRM's bits naming.
>>>
>>> PMC_CNTRL_LATCHWAKE_EN
>>>
> +#define PMC_WAKE_MASK    0x0c
> +#define PMC_WAKE_LEVEL    0x10
> +#define PMC_WAKE_STATUS    0x14
> +#define PMC_SW_WAKE_STATUS    0x18
>      #define DPD_SAMPLE    0x020
>    #define  DPD_SAMPLE_ENABLE    BIT(0)
> @@ -87,6 +93,11 @@
>      #define PMC_SCRATCH41    0x140
>    +#define PMC_WAKE2_MASK    0x160
> +#define PMC_WAKE2_LEVEL    0x164
> +#define PMC_WAKE2_STATUS    0x168
> +#define PMC_SW_WAKE2_STATUS    0x16c
> +
>    #define PMC_SENSOR_CTRL    0x1b0
>    #define  PMC_SENSOR_CTRL_SCRATCH_WRITE    BIT(2)
>    #define  PMC_SENSOR_CTRL_ENABLE_RST    BIT(1)
> @@ -1922,6 +1933,55 @@ static const struct irq_domain_ops
> tegra_pmc_irq_domain_ops = {
>    .alloc = tegra_pmc_irq_alloc,
>    };
>    +static int tegra210_pmc_irq_set_wake(struct irq_data *data,
> unsigned int on)
> +{
> +    struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
> +    unsigned int offset, bit;
> +    u32 value;
> +
> +    if (data->hwirq == ULONG_MAX)
> +    return 0;
> +
> +    offset = data->hwirq / 32;
> +    bit = data->hwirq % 32;
> +
> +    /*
> + * Latch wakeups to SW_WAKE_STATUS register to capture events
> + * that would not make it into wakeup event register during
> LP0 exit.
> + */
> +    value = tegra_pmc_readl(pmc, PMC_CNTRL);
> +    value |= PMC_CNTRL_LATCH_WAKEUPS;
> +    tegra_pmc_writel(pmc, value, PMC_CNTRL);
> +    udelay(120);
 Why it takes so much time to latch the values? Shouldn't some
 status-bit
 be polled for the completion of latching?

 Is this register-write really getting buffered in the PMC?

> +    value &= ~PMC_CNTRL_LATCH_WAKEUPS;
> +    tegra_pmc_writel(pmc, value, PMC_CNTRL);
> +    udelay(120);
 120 usecs to remove latching, really?

> +    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
> +    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
> +
> +    tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
> +    tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
> +
> +    /* enable PMC wake */
> +    if (data->hwirq >= 32)
> +    offset = PMC_WAKE2_MASK;
> +    else
> +    offset = PMC_WAKE_MASK;
> +
> +    value = tegra_pmc_readl(pmc, offset);
> +
> +    if (on)
> +    value |= 1 << bit;
> +    else
> +    value &= ~(1 << bit);
> +
> +    tegra_pmc_writel(pmc, value, offset);
 Why the latching is done *before* writing into the WAKE registers?
 What
 it is latching then?
>>> I'm looking at the TRM doc and it says that latching should be done
>>> *after* writing to the WAKE_MASK / LEVEL registers.
>>>
>>> Secondly it says that it's enough to do:
>>>
>>> value = tegra_pmc_readl(pmc, PMC_CNTRL);
>>> value |= PMC_CNTRL_LATCH_WAKEUPS;
>>> tegra_pmc_writel(pmc, value, PMC_CNTRL);
>>>
>>> in order to latch. There is no need for the delay and to remove the
>>> "LATCHWAKE_EN" bit, it should be a oneshot action.
>> Although, no. TRM says "stops latching on transition from 1

Re: [PATCH v4] mmc: host: sdhci-sprd: Fix the incorrect soft reset operation when runtime resuming

2019-07-22 Thread Baolin Wang
On Tue, 23 Jul 2019 at 11:21, Chunyan Zhang  wrote:
>
> On Tue, 23 Jul 2019 at 11:05, Baolin Wang  wrote:
> >
> > Hi Ulf,
> >
> > On Mon, 22 Jul 2019 at 19:54, Ulf Hansson  wrote:
> > >
> > > On Wed, 17 Jul 2019 at 04:29, Baolin Wang  wrote:
> > > >
> > > > In sdhci_runtime_resume_host() function, we will always do software 
> > > > reset
> > > > for all, which will cause Spreadtrum host controller work abnormally 
> > > > after
> > > > resuming.
> > >
> > > What does "software reset for all" means?
> >
> > The SD host controller specification defines 3 types software reset:
> > software reset for data line, software reset for command line and
> > software reset for all.
> > Software reset for all means this reset affects the entire Host
> > controller except for the card detection circuit.
> >
> > >
> > > >
> > > > Thus for Spreadtrum platform that will not power down the SD/eMMC card 
> > > > during
> > > > runtime suspend, we should not do software reset for all.
> > >
> > > Normally, sdhci hosts that enters runtime suspend doesn't power off
> > > the card (there are some exceptions like PCI variants).
> >
> > Yes, same as our controller.
> >
> > >
> > > So, what's so special here and how does the reset come into play? I
> > > don't see sdhci doing a reset in sdhci_runtime_suspend|resume_host()
> > > and nor doesn the callback from the sdhci-sprd.c variant doing it.
> >
> > In sdhci_runtime_resume_host(), it will issue sdhci_init(host, 0) to
> > issue software reset for all.
> >
> > >
> > > > To fix this
> > > > issue, adding a specific reset operation that adds one condition to 
> > > > validate
> > > > the power mode to decide if we can do software reset for all or just 
> > > > reset
> > > > command and data lines.
> > > >
> > > > Signed-off-by: Baolin Wang 
> > > > ---
> > > > Changess from v3:
> > > >  - Use ios.power_mode to validate if the card is power down or not.
> > > >
> > > > Changes from v2:
> > > >  - Simplify the sdhci_sprd_reset() by issuing sdhci_reset().
> > > >
> > > > Changes from v1:
> > > >  - Add a specific reset operation instead of changing the core to avoid
> > > >  affecting other hardware.
> > > > ---
> > > >  drivers/mmc/host/sdhci-sprd.c |   19 ++-
> > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/mmc/host/sdhci-sprd.c 
> > > > b/drivers/mmc/host/sdhci-sprd.c
> > > > index 603a5d9..94f9726 100644
> > > > --- a/drivers/mmc/host/sdhci-sprd.c
> > > > +++ b/drivers/mmc/host/sdhci-sprd.c
> > > > @@ -373,6 +373,23 @@ static unsigned int 
> > > > sdhci_sprd_get_max_timeout_count(struct sdhci_host *host)
> > > > return 1 << 31;
> > > >  }
> > > >
> > > > +static void sdhci_sprd_reset(struct sdhci_host *host, u8 mask)
> > > > +{
> > > > +   struct mmc_host *mmc = host->mmc;
> > > > +
> > > > +   /*
> > > > +* When try to reset controller after runtime suspend, we 
> > > > should not
> > > > +* reset for all if the SD/eMMC card is not power down, just 
> > > > reset
> > > > +* command and data lines instead. Otherwise will meet some 
> > > > strange
> > > > +* behaviors for Spreadtrum host controller.
> > > > +*/
> > > > +   if (host->runtime_suspended && (mask & SDHCI_RESET_ALL) &&
> > > > +   mmc->ios.power_mode == MMC_POWER_ON)
> > > > +   mask = SDHCI_RESET_CMD | SDHCI_RESET_DATA;
> > >
> > > Can sdhci_sprd_reset() be called when the host is runtime suspended?
> >
> > When host tries to runtime resume in sdhci_runtime_resume_host(), it
> > will call reset operation to do software reset.
> >
> > > That sounds like a bug to me, no?
> >
> > Since our controller will meet some strange behaviors if we do
> > software reset for all in sdhci_runtime_resume_host(), and try to
> > avoid changing the core logic of sdhci_runtime_resume_host() used by
> > other hardware controllers, thus I introduced a specific reset ops and
> > added some condition to make sure we just do software reset command
> > and data lines from runtime suspend state.
>
> I can make a verification on sprd's SC9863A, but that would take a
> little time, since I need to make sd card registered with sdhci-sprd.c
> first :)

Great, you can try it on your board. Thanks.

-- 
Baolin Wang
Best Regards


Re: [PATCH V6 16/21] soc/tegra: pmc: Add pmc wake support for tegra210

2019-07-22 Thread Sowjanya Komatineni



On 7/22/19 8:25 PM, Dmitry Osipenko wrote:

23.07.2019 6:09, Sowjanya Komatineni пишет:

On 7/22/19 8:03 PM, Dmitry Osipenko wrote:

23.07.2019 4:52, Sowjanya Komatineni пишет:

On 7/22/19 6:41 PM, Dmitry Osipenko wrote:

23.07.2019 4:08, Dmitry Osipenko пишет:

23.07.2019 3:58, Dmitry Osipenko пишет:

21.07.2019 22:40, Sowjanya Komatineni пишет:

This patch implements PMC wakeup sequence for Tegra210 and defines
common used RTC alarm wake event.

Signed-off-by: Sowjanya Komatineni 
---
   drivers/soc/tegra/pmc.c | 111

   1 file changed, 111 insertions(+)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index 91c84d0e66ae..c556f38874e1 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -57,6 +57,12 @@
   #define  PMC_CNTRL_SYSCLK_OE    BIT(11) /* system clock
enable */
   #define  PMC_CNTRL_SYSCLK_POLARITY    BIT(10) /* sys clk
polarity */
   #define  PMC_CNTRL_MAIN_RST    BIT(4)
+#define  PMC_CNTRL_LATCH_WAKEUPS    BIT(5)

Please follow the TRM's bits naming.

PMC_CNTRL_LATCHWAKE_EN


+#define PMC_WAKE_MASK    0x0c
+#define PMC_WAKE_LEVEL    0x10
+#define PMC_WAKE_STATUS    0x14
+#define PMC_SW_WAKE_STATUS    0x18
     #define DPD_SAMPLE    0x020
   #define  DPD_SAMPLE_ENABLE    BIT(0)
@@ -87,6 +93,11 @@
     #define PMC_SCRATCH41    0x140
   +#define PMC_WAKE2_MASK    0x160
+#define PMC_WAKE2_LEVEL    0x164
+#define PMC_WAKE2_STATUS    0x168
+#define PMC_SW_WAKE2_STATUS    0x16c
+
   #define PMC_SENSOR_CTRL    0x1b0
   #define  PMC_SENSOR_CTRL_SCRATCH_WRITE    BIT(2)
   #define  PMC_SENSOR_CTRL_ENABLE_RST    BIT(1)
@@ -1922,6 +1933,55 @@ static const struct irq_domain_ops
tegra_pmc_irq_domain_ops = {
   .alloc = tegra_pmc_irq_alloc,
   };
   +static int tegra210_pmc_irq_set_wake(struct irq_data *data,
unsigned int on)
+{
+    struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
+    unsigned int offset, bit;
+    u32 value;
+
+    if (data->hwirq == ULONG_MAX)
+    return 0;
+
+    offset = data->hwirq / 32;
+    bit = data->hwirq % 32;
+
+    /*
+ * Latch wakeups to SW_WAKE_STATUS register to capture events
+ * that would not make it into wakeup event register during
LP0 exit.
+ */
+    value = tegra_pmc_readl(pmc, PMC_CNTRL);
+    value |= PMC_CNTRL_LATCH_WAKEUPS;
+    tegra_pmc_writel(pmc, value, PMC_CNTRL);
+    udelay(120);

Why it takes so much time to latch the values? Shouldn't some
status-bit
be polled for the completion of latching?

Is this register-write really getting buffered in the PMC?


+    value &= ~PMC_CNTRL_LATCH_WAKEUPS;
+    tegra_pmc_writel(pmc, value, PMC_CNTRL);
+    udelay(120);

120 usecs to remove latching, really?


+    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
+    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
+
+    tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
+    tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
+
+    /* enable PMC wake */
+    if (data->hwirq >= 32)
+    offset = PMC_WAKE2_MASK;
+    else
+    offset = PMC_WAKE_MASK;
+
+    value = tegra_pmc_readl(pmc, offset);
+
+    if (on)
+    value |= 1 << bit;
+    else
+    value &= ~(1 << bit);
+
+    tegra_pmc_writel(pmc, value, offset);

Why the latching is done *before* writing into the WAKE registers?
What
it is latching then?

I'm looking at the TRM doc and it says that latching should be done
*after* writing to the WAKE_MASK / LEVEL registers.

Secondly it says that it's enough to do:

value = tegra_pmc_readl(pmc, PMC_CNTRL);
value |= PMC_CNTRL_LATCH_WAKEUPS;
tegra_pmc_writel(pmc, value, PMC_CNTRL);

in order to latch. There is no need for the delay and to remove the
"LATCHWAKE_EN" bit, it should be a oneshot action.

Although, no. TRM says "stops latching on transition from 1
to 0 (sequence - set to 1,set to 0)", so it's not a oneshot action.

Have you tested this code at all? I'm wondering how it happens to work
without a proper latching.

Yes, ofcourse its tested and this sequence to do transition is
recommendation from Tegra designer.
Will check if TRM doesn't have update properly or will re-confirm
internally on delay time...

On any of the wake event PMC wakeup happens and WAKE_STATUS register
will have bits set for all events that triggered wake.
After wakeup PMC doesn't update SW_WAKE_STATUS register as per PMC
design.
SW latch register added in design helps to provide a way to capture
those events that happen right during wakeup time and didnt make it to
SW_WAKE_STATUS register.
So before next suspend entry, latching all prior wake events into SW
WAKE_STATUS and then clearing them.

I'm now wondering whether the latching cold be turned ON permanently
during of the PMC's probe, for simplicity.

latching should be done on suspend-resume cycle as wake events gets
generates on every suspend-resume cycle.

You're saying that PMC "doesn't update SW_WAKE_STATUS" after wake-up,
then I don't 

[PATCH] sys_prctl(): simplify arg2 judgment when calling PR_SET_TIMERSLACK

2019-07-22 Thread Yang Xu
arg2 will never < 0, for its type is 'unsigned long'. So negative
judgment is meaningless.

Signed-off-by: Yang Xu 
---
 kernel/sys.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 2969304c29fe..399457d26bef 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2372,11 +2372,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, 
arg2, unsigned long, arg3,
error = current->timer_slack_ns;
break;
case PR_SET_TIMERSLACK:
-   if (arg2 <= 0)
+   if (arg2)
+   current->timer_slack_ns = arg2;
+   else
current->timer_slack_ns =
current->default_timer_slack_ns;
-   else
-   current->timer_slack_ns = arg2;
break;
case PR_MCE_KILL:
if (arg4 | arg5)
-- 
2.18.1





Re: [PATCH v5 2/2] dt-bindings: mtd: Document Macronix raw NAND controller bindings

2019-07-22 Thread masonccyang


Hi Rob,


> 
> Re: [PATCH v5 2/2] dt-bindings: mtd: Document Macronix raw NAND 
controller bindings
> 
> On Wed, Jul 03, 2019 at 03:15:44PM +0800, Mason Yang wrote:
> > Document the bindings used by the Macronix raw NAND controller.
> > 
> > Signed-off-by: Mason Yang 
> > ---
> >  Documentation/devicetree/bindings/mtd/mxic-nand.txt | 20 

> >  1 file changed, 20 insertions(+)
> >  create mode 100644 
Documentation/devicetree/bindings/mtd/mxic-nand.txt
> > 
> > diff --git a/Documentation/devicetree/bindings/mtd/mxic-nand.txt b/
> Documentation/devicetree/bindings/mtd/mxic-nand.txt
> > new file mode 100644
> > index 000..ddd7660
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/mtd/mxic-nand.txt
> > @@ -0,0 +1,20 @@
> > +Macronix Raw NAND Controller Device Tree Bindings
> > +-
> > +
> > +Required properties:
> > +- compatible: should be "macronix,nand-controller"
> 
> That's not very specific. There's only 1 version of this h/w?

okay, will give it a apposite name.

> 
> > +- reg: should contain 1 entrie for the registers
> 
> s/entrie/entry/

will fix it.

> 
> > +- interrupts: interrupt line connected to this raw NAND controller
> > +- clock-names: should contain "ps_clk", "send_clk" and "send_dly_clk"
> > +- clocks: should contain 3 phandles for the "ps_clk", "send_clk" and
> > +"send_dly_clk" clocks
> 
> You can drop '_clk' as that is redundant.

okay, got it.

> 
> > +
> > +Example:
> > +
> > +   nand: mxic-nfc@43c3 {
> > +  compatible = "macronix,nand-controller";
> > +  reg = <0x43c3 0x1>;
> > +  reg-names = "regs";
> 
> Not documented. You can drop as *-names is not generally useful when 
> there is only 1 entry.

okay, will fix it.

> 
> > +  clocks = < 0>, < 1>, < 15>;
> > +  clock-names = "send_clk", "send_dly_clk", "ps_clk";
> > +   };
> > -- 
> > 1.9.1
> > 

thanks for your time & review.
best regards,
Mason

CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information 
and/or personal data, which is protected by applicable laws. Please be 
reminded that duplication, disclosure, distribution, or use of this e-mail 
(and/or its attachments) or any part thereof is prohibited. If you receive 
this e-mail in error, please notify us immediately and delete this mail as 
well as its attachment(s) from your system. In addition, please be 
informed that collection, processing, and/or use of personal data is 
prohibited unless expressly permitted by personal data protection laws. 
Thank you for your attention and cooperation.

Macronix International Co., Ltd.

=





CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information and/or 
personal data, which is protected by applicable laws. Please be reminded that 
duplication, disclosure, distribution, or use of this e-mail (and/or its 
attachments) or any part thereof is prohibited. If you receive this e-mail in 
error, please notify us immediately and delete this mail as well as its 
attachment(s) from your system. In addition, please be informed that 
collection, processing, and/or use of personal data is prohibited unless 
expressly permitted by personal data protection laws. Thank you for your 
attention and cooperation.

Macronix International Co., Ltd.

=



Re: [PATCH V6 16/21] soc/tegra: pmc: Add pmc wake support for tegra210

2019-07-22 Thread Dmitry Osipenko
23.07.2019 6:09, Sowjanya Komatineni пишет:
> 
> On 7/22/19 8:03 PM, Dmitry Osipenko wrote:
>> 23.07.2019 4:52, Sowjanya Komatineni пишет:
>>> On 7/22/19 6:41 PM, Dmitry Osipenko wrote:
 23.07.2019 4:08, Dmitry Osipenko пишет:
> 23.07.2019 3:58, Dmitry Osipenko пишет:
>> 21.07.2019 22:40, Sowjanya Komatineni пишет:
>>> This patch implements PMC wakeup sequence for Tegra210 and defines
>>> common used RTC alarm wake event.
>>>
>>> Signed-off-by: Sowjanya Komatineni 
>>> ---
>>>   drivers/soc/tegra/pmc.c | 111
>>> 
>>>   1 file changed, 111 insertions(+)
>>>
>>> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
>>> index 91c84d0e66ae..c556f38874e1 100644
>>> --- a/drivers/soc/tegra/pmc.c
>>> +++ b/drivers/soc/tegra/pmc.c
>>> @@ -57,6 +57,12 @@
>>>   #define  PMC_CNTRL_SYSCLK_OE    BIT(11) /* system clock
>>> enable */
>>>   #define  PMC_CNTRL_SYSCLK_POLARITY    BIT(10) /* sys clk
>>> polarity */
>>>   #define  PMC_CNTRL_MAIN_RST    BIT(4)
>>> +#define  PMC_CNTRL_LATCH_WAKEUPS    BIT(5)
> Please follow the TRM's bits naming.
>
> PMC_CNTRL_LATCHWAKE_EN
>
>>> +#define PMC_WAKE_MASK    0x0c
>>> +#define PMC_WAKE_LEVEL    0x10
>>> +#define PMC_WAKE_STATUS    0x14
>>> +#define PMC_SW_WAKE_STATUS    0x18
>>>     #define DPD_SAMPLE    0x020
>>>   #define  DPD_SAMPLE_ENABLE    BIT(0)
>>> @@ -87,6 +93,11 @@
>>>     #define PMC_SCRATCH41    0x140
>>>   +#define PMC_WAKE2_MASK    0x160
>>> +#define PMC_WAKE2_LEVEL    0x164
>>> +#define PMC_WAKE2_STATUS    0x168
>>> +#define PMC_SW_WAKE2_STATUS    0x16c
>>> +
>>>   #define PMC_SENSOR_CTRL    0x1b0
>>>   #define  PMC_SENSOR_CTRL_SCRATCH_WRITE    BIT(2)
>>>   #define  PMC_SENSOR_CTRL_ENABLE_RST    BIT(1)
>>> @@ -1922,6 +1933,55 @@ static const struct irq_domain_ops
>>> tegra_pmc_irq_domain_ops = {
>>>   .alloc = tegra_pmc_irq_alloc,
>>>   };
>>>   +static int tegra210_pmc_irq_set_wake(struct irq_data *data,
>>> unsigned int on)
>>> +{
>>> +    struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
>>> +    unsigned int offset, bit;
>>> +    u32 value;
>>> +
>>> +    if (data->hwirq == ULONG_MAX)
>>> +    return 0;
>>> +
>>> +    offset = data->hwirq / 32;
>>> +    bit = data->hwirq % 32;
>>> +
>>> +    /*
>>> + * Latch wakeups to SW_WAKE_STATUS register to capture events
>>> + * that would not make it into wakeup event register during
>>> LP0 exit.
>>> + */
>>> +    value = tegra_pmc_readl(pmc, PMC_CNTRL);
>>> +    value |= PMC_CNTRL_LATCH_WAKEUPS;
>>> +    tegra_pmc_writel(pmc, value, PMC_CNTRL);
>>> +    udelay(120);
>> Why it takes so much time to latch the values? Shouldn't some
>> status-bit
>> be polled for the completion of latching?
>>
>> Is this register-write really getting buffered in the PMC?
>>
>>> +    value &= ~PMC_CNTRL_LATCH_WAKEUPS;
>>> +    tegra_pmc_writel(pmc, value, PMC_CNTRL);
>>> +    udelay(120);
>> 120 usecs to remove latching, really?
>>
>>> +    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
>>> +    tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
>>> +
>>> +    tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
>>> +    tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
>>> +
>>> +    /* enable PMC wake */
>>> +    if (data->hwirq >= 32)
>>> +    offset = PMC_WAKE2_MASK;
>>> +    else
>>> +    offset = PMC_WAKE_MASK;
>>> +
>>> +    value = tegra_pmc_readl(pmc, offset);
>>> +
>>> +    if (on)
>>> +    value |= 1 << bit;
>>> +    else
>>> +    value &= ~(1 << bit);
>>> +
>>> +    tegra_pmc_writel(pmc, value, offset);
>> Why the latching is done *before* writing into the WAKE registers?
>> What
>> it is latching then?
> I'm looking at the TRM doc and it says that latching should be done
> *after* writing to the WAKE_MASK / LEVEL registers.
>
> Secondly it says that it's enough to do:
>
> value = tegra_pmc_readl(pmc, PMC_CNTRL);
> value |= PMC_CNTRL_LATCH_WAKEUPS;
> tegra_pmc_writel(pmc, value, PMC_CNTRL);
>
> in order to latch. There is no need for the delay and to remove the
> "LATCHWAKE_EN" bit, it should be a oneshot action.
 Although, no. TRM says "stops latching on transition from 1
 to 0 (sequence - set to 1,set to 0)", so it's not a oneshot action.

 Have you tested this code at all? I'm wondering how it happens to work
 without a proper latching.
>>> Yes, ofcourse its tested and this sequence to do transition is
>>> recommendation from Tegra designer.
>>> Will check if TRM doesn't have update properly or 

[PATCH] ktest: Fix some typos in config-bisect.pl

2019-07-22 Thread Masanari Iida
This patch fixes some spelling typos in config-bisect.pl

Signed-off-by: Masanari Iida 
---
 tools/testing/ktest/config-bisect.pl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/ktest/config-bisect.pl 
b/tools/testing/ktest/config-bisect.pl
index 72525426654b..6fd864935319 100755
--- a/tools/testing/ktest/config-bisect.pl
+++ b/tools/testing/ktest/config-bisect.pl
@@ -663,7 +663,7 @@ while ($#ARGV >= 0) {
 }
 
 else {
-   die "Unknow option $opt\n";
+   die "Unknown option $opt\n";
 }
 }
 
@@ -732,7 +732,7 @@ if ($start) {
}
 }
 run_command "cp $good_start $good" or die "failed to copy to $good\n";
-run_command "cp $bad_start $bad" or die "faield to copy to $bad\n";
+run_command "cp $bad_start $bad" or die "failed to copy to $bad\n";
 } else {
 if ( ! -f $good ) {
die "Can not find file $good\n";
-- 
2.22.0.545.g9c9b961d7eb1



Re: [PATCH v4] mmc: host: sdhci-sprd: Fix the incorrect soft reset operation when runtime resuming

2019-07-22 Thread Chunyan Zhang
On Tue, 23 Jul 2019 at 11:05, Baolin Wang  wrote:
>
> Hi Ulf,
>
> On Mon, 22 Jul 2019 at 19:54, Ulf Hansson  wrote:
> >
> > On Wed, 17 Jul 2019 at 04:29, Baolin Wang  wrote:
> > >
> > > In sdhci_runtime_resume_host() function, we will always do software reset
> > > for all, which will cause Spreadtrum host controller work abnormally after
> > > resuming.
> >
> > What does "software reset for all" means?
>
> The SD host controller specification defines 3 types software reset:
> software reset for data line, software reset for command line and
> software reset for all.
> Software reset for all means this reset affects the entire Host
> controller except for the card detection circuit.
>
> >
> > >
> > > Thus for Spreadtrum platform that will not power down the SD/eMMC card 
> > > during
> > > runtime suspend, we should not do software reset for all.
> >
> > Normally, sdhci hosts that enters runtime suspend doesn't power off
> > the card (there are some exceptions like PCI variants).
>
> Yes, same as our controller.
>
> >
> > So, what's so special here and how does the reset come into play? I
> > don't see sdhci doing a reset in sdhci_runtime_suspend|resume_host()
> > and nor doesn the callback from the sdhci-sprd.c variant doing it.
>
> In sdhci_runtime_resume_host(), it will issue sdhci_init(host, 0) to
> issue software reset for all.
>
> >
> > > To fix this
> > > issue, adding a specific reset operation that adds one condition to 
> > > validate
> > > the power mode to decide if we can do software reset for all or just reset
> > > command and data lines.
> > >
> > > Signed-off-by: Baolin Wang 
> > > ---
> > > Changess from v3:
> > >  - Use ios.power_mode to validate if the card is power down or not.
> > >
> > > Changes from v2:
> > >  - Simplify the sdhci_sprd_reset() by issuing sdhci_reset().
> > >
> > > Changes from v1:
> > >  - Add a specific reset operation instead of changing the core to avoid
> > >  affecting other hardware.
> > > ---
> > >  drivers/mmc/host/sdhci-sprd.c |   19 ++-
> > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
> > > index 603a5d9..94f9726 100644
> > > --- a/drivers/mmc/host/sdhci-sprd.c
> > > +++ b/drivers/mmc/host/sdhci-sprd.c
> > > @@ -373,6 +373,23 @@ static unsigned int 
> > > sdhci_sprd_get_max_timeout_count(struct sdhci_host *host)
> > > return 1 << 31;
> > >  }
> > >
> > > +static void sdhci_sprd_reset(struct sdhci_host *host, u8 mask)
> > > +{
> > > +   struct mmc_host *mmc = host->mmc;
> > > +
> > > +   /*
> > > +* When try to reset controller after runtime suspend, we should 
> > > not
> > > +* reset for all if the SD/eMMC card is not power down, just reset
> > > +* command and data lines instead. Otherwise will meet some 
> > > strange
> > > +* behaviors for Spreadtrum host controller.
> > > +*/
> > > +   if (host->runtime_suspended && (mask & SDHCI_RESET_ALL) &&
> > > +   mmc->ios.power_mode == MMC_POWER_ON)
> > > +   mask = SDHCI_RESET_CMD | SDHCI_RESET_DATA;
> >
> > Can sdhci_sprd_reset() be called when the host is runtime suspended?
>
> When host tries to runtime resume in sdhci_runtime_resume_host(), it
> will call reset operation to do software reset.
>
> > That sounds like a bug to me, no?
>
> Since our controller will meet some strange behaviors if we do
> software reset for all in sdhci_runtime_resume_host(), and try to
> avoid changing the core logic of sdhci_runtime_resume_host() used by
> other hardware controllers, thus I introduced a specific reset ops and
> added some condition to make sure we just do software reset command
> and data lines from runtime suspend state.

I can make a verification on sprd's SC9863A, but that would take a
little time, since I need to make sd card registered with sdhci-sprd.c
first :)

Thanks,
Chunyan


RE: [PATCH] arm: xen: mm: use __GPF_DMA32 for arm64

2019-07-22 Thread Peng Fan
Hi Russell, Stefano

> Subject: [PATCH] arm: xen: mm: use __GPF_DMA32 for arm64

Any comments?

> 
> arm64 shares some code under arch/arm/xen, including mm.c.
> However ZONE_DMA is removed by commit
> ad67f5a6545("arm64: replace ZONE_DMA with ZONE_DMA32").
> So to ARM64, need use __GFP_DMA32.
> 
> Signed-off-by: Peng Fan 
> ---
>  arch/arm/xen/mm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index
> e1d44b903dfc..a95e76d18bf9 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -27,7 +27,7 @@ unsigned long xen_get_swiotlb_free_pages(unsigned int
> order)
> 
>   for_each_memblock(memory, reg) {
>   if (reg->base < (phys_addr_t)0x) {
> - flags |= __GFP_DMA;
> + flags |= __GFP_DMA | __GFP_DMA32;
>   break;
>   }
>   }

Thanks,
Peng.

> --
> 2.16.4



[PATCH RESEND v5] dmaengine: tegra-apb: Support per-burst residue granularity

2019-07-22 Thread Dmitry Osipenko
Tegra's APB DMA engine updates words counter after each transferred burst
of data, hence it can report transfer's residual with more fidelity which
may be required in cases like audio playback. In particular this fixes
audio stuttering during playback in a chromium web browser. The patch is
based on the original work that was made by Ben Dooks and a patch from
downstream kernel. It was tested on Tegra20 and Tegra30 devices.

Link: 
https://lore.kernel.org/lkml/20190424162348.23692-1-ben.do...@codethink.co.uk/
Link: 
https://nv-tegra.nvidia.com/gitweb/?p=linux-4.4.git;a=commit;h=c7bba40c6846fbf3eaad35c4472dcc7d8bbc02e5
Inspired-by: Ben Dooks 
Reviewed-by: Jon Hunter 
Signed-off-by: Dmitry Osipenko 
---
 drivers/dma/tegra20-apb-dma.c | 75 +++
 1 file changed, 68 insertions(+), 7 deletions(-)

diff --git a/drivers/dma/tegra20-apb-dma.c b/drivers/dma/tegra20-apb-dma.c
index 79e9593815f1..3a45079d11ec 100644
--- a/drivers/dma/tegra20-apb-dma.c
+++ b/drivers/dma/tegra20-apb-dma.c
@@ -152,6 +152,7 @@ struct tegra_dma_sg_req {
boollast_sg;
struct list_headnode;
struct tegra_dma_desc   *dma_desc;
+   unsigned intwords_xferred;
 };
 
 /*
@@ -496,6 +497,7 @@ static void tegra_dma_configure_for_next(struct 
tegra_dma_channel *tdc,
tdc_write(tdc, TEGRA_APBDMA_CHAN_CSR,
nsg_req->ch_regs.csr | TEGRA_APBDMA_CSR_ENB);
nsg_req->configured = true;
+   nsg_req->words_xferred = 0;
 
tegra_dma_resume(tdc);
 }
@@ -511,6 +513,7 @@ static void tdc_start_head_req(struct tegra_dma_channel 
*tdc)
typeof(*sg_req), node);
tegra_dma_start(tdc, sg_req);
sg_req->configured = true;
+   sg_req->words_xferred = 0;
tdc->busy = true;
 }
 
@@ -638,6 +641,8 @@ static void handle_cont_sngl_cycle_dma_done(struct 
tegra_dma_channel *tdc,
list_add_tail(_desc->cb_node, >cb_desc);
dma_desc->cb_count++;
 
+   sgreq->words_xferred = 0;
+
/* If not last req then put at end of pending list */
if (!list_is_last(>node, >pending_sg_req)) {
list_move_tail(>node, >pending_sg_req);
@@ -797,6 +802,65 @@ static int tegra_dma_terminate_all(struct dma_chan *dc)
return 0;
 }
 
+static unsigned int tegra_dma_sg_bytes_xferred(struct tegra_dma_channel *tdc,
+  struct tegra_dma_sg_req *sg_req)
+{
+   unsigned long status, wcount = 0;
+
+   if (!list_is_first(_req->node, >pending_sg_req))
+   return 0;
+
+   if (tdc->tdma->chip_data->support_separate_wcount_reg)
+   wcount = tdc_read(tdc, TEGRA_APBDMA_CHAN_WORD_TRANSFER);
+
+   status = tdc_read(tdc, TEGRA_APBDMA_CHAN_STATUS);
+
+   if (!tdc->tdma->chip_data->support_separate_wcount_reg)
+   wcount = status;
+
+   if (status & TEGRA_APBDMA_STATUS_ISE_EOC)
+   return sg_req->req_len;
+
+   wcount = get_current_xferred_count(tdc, sg_req, wcount);
+
+   if (!wcount) {
+   /*
+* If wcount wasn't ever polled for this SG before, then
+* simply assume that transfer hasn't started yet.
+*
+* Otherwise it's the end of the transfer.
+*
+* The alternative would be to poll the status register
+* until EOC bit is set or wcount goes UP. That's so
+* because EOC bit is getting set only after the last
+* burst's completion and counter is less than the actual
+* transfer size by 4 bytes. The counter value wraps around
+* in a cyclic mode before EOC is set(!), so we can't easily
+* distinguish start of transfer from its end.
+*/
+   if (sg_req->words_xferred)
+   wcount = sg_req->req_len - 4;
+
+   } else if (wcount < sg_req->words_xferred) {
+   /*
+* This case will never happen for a non-cyclic transfer.
+*
+* For a cyclic transfer, although it is possible for the
+* next transfer to have already started (resetting the word
+* count), this case should still not happen because we should
+* have detected that the EOC bit is set and hence the transfer
+* was completed.
+*/
+   WARN_ON_ONCE(1);
+
+   wcount = sg_req->req_len - 4;
+   } else {
+   sg_req->words_xferred = wcount;
+   }
+
+   return wcount;
+}
+
 static enum dma_status tegra_dma_tx_status(struct dma_chan *dc,
dma_cookie_t cookie, struct dma_tx_state *txstate)
 {
@@ -806,6 +870,7 @@ static enum dma_status tegra_dma_tx_status(struct dma_chan 
*dc,
enum dma_status ret;

[PATCH v8 03/15] memory: tegra20-emc: Adapt for clock driver changes

2019-07-22 Thread Dmitry Osipenko
Now Terga20 and Tegra30 EMC drivers should provide clock-rounding
functionality using the new Tegra-CLK driver API.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 50 --
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index da8fa592b071..b519f02b0ee9 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -6,6 +6,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -421,6 +422,44 @@ static int emc_setup_hw(struct tegra_emc *emc)
return 0;
 }
 
+static long emc_round_rate(unsigned long rate,
+  unsigned long min_rate,
+  unsigned long max_rate,
+  void *arg)
+{
+   struct emc_timing *timing = NULL;
+   struct tegra_emc *emc = arg;
+   unsigned int i;
+
+   min_rate = min(min_rate, emc->timings[emc->num_timings - 1].rate);
+
+   for (i = 0; i < emc->num_timings; i++) {
+   if (emc->timings[i].rate < rate && i != emc->num_timings - 1)
+   continue;
+
+   if (emc->timings[i].rate > max_rate) {
+   i = max(i, 1u) - 1;
+
+   if (emc->timings[i].rate < min_rate)
+   break;
+   }
+
+   if (emc->timings[i].rate < min_rate)
+   continue;
+
+   timing = >timings[i];
+   break;
+   }
+
+   if (!timing) {
+   dev_err(emc->dev, "no timing for rate %lu min %lu max %lu\n",
+   rate, min_rate, max_rate);
+   return -EINVAL;
+   }
+
+   return timing->rate;
+}
+
 static int tegra_emc_probe(struct platform_device *pdev)
 {
struct device_node *np;
@@ -477,21 +516,28 @@ static int tegra_emc_probe(struct platform_device *pdev)
return err;
}
 
+   tegra20_clk_set_emc_round_callback(emc_round_rate, emc);
+
emc->clk = devm_clk_get(>dev, "emc");
if (IS_ERR(emc->clk)) {
err = PTR_ERR(emc->clk);
dev_err(>dev, "failed to get emc clock: %d\n", err);
-   return err;
+   goto unset_cb;
}
 
err = clk_notifier_register(emc->clk, >clk_nb);
if (err) {
dev_err(>dev, "failed to register clk notifier: %d\n",
err);
-   return err;
+   goto unset_cb;
}
 
return 0;
+
+unset_cb:
+   tegra20_clk_set_emc_round_callback(NULL, NULL);
+
+   return err;
 }
 
 static const struct of_device_id tegra_emc_of_match[] = {
-- 
2.22.0



[PATCH v8 07/15] memory: tegra20-emc: Increase handshake timeout

2019-07-22 Thread Dmitry Osipenko
Turned out that it could take over a millisecond under some circumstances,
like running on a very low CPU/memory frequency. TRM says that handshake
happens when there is a "safe" moment, but not explains exactly what that
moment is. Apparently at least memory should be idling and thus the low
frequency should be a reasonable cause for a longer handshake delay.

Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index 25a6aad6a7a9..da75efc632c7 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -236,7 +236,7 @@ static int emc_complete_timing_change(struct tegra_emc 
*emc, bool flush)
}
 
timeout = wait_for_completion_timeout(>clk_handshake_complete,
- usecs_to_jiffies(100));
+ msecs_to_jiffies(100));
if (timeout == 0) {
dev_err(emc->dev, "EMC-CAR handshake failed\n");
return -EIO;
-- 
2.22.0



[PATCH v8 08/15] memory: tegra20-emc: wait_for_completion_timeout() doesn't return error

2019-07-22 Thread Dmitry Osipenko
The "interruptible" variant may error out, the "uninterruptible" not.

Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index da75efc632c7..1b23b1c34476 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -224,7 +224,7 @@ static int emc_prepare_timing_change(struct tegra_emc *emc, 
unsigned long rate)
 
 static int emc_complete_timing_change(struct tegra_emc *emc, bool flush)
 {
-   long timeout;
+   unsigned long timeout;
 
dev_dbg(emc->dev, "%s: flush %d\n", __func__, flush);
 
@@ -240,10 +240,6 @@ static int emc_complete_timing_change(struct tegra_emc 
*emc, bool flush)
if (timeout == 0) {
dev_err(emc->dev, "EMC-CAR handshake failed\n");
return -EIO;
-   } else if (timeout < 0) {
-   dev_err(emc->dev, "failed to wait for EMC-CAR handshake: %ld\n",
-   timeout);
-   return timeout;
}
 
return 0;
-- 
2.22.0



[PATCH v8 05/15] memory: tegra20-emc: Pre-configure debug register

2019-07-22 Thread Dmitry Osipenko
The driver expects certain debug features to be disabled in order to
work properly. Let's disable them explicitly for consistency and to not
rely on a boot state.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index 1ce351dd5461..85c24f285fd4 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -22,6 +22,7 @@
 
 #define EMC_INTSTATUS  0x000
 #define EMC_INTMASK0x004
+#define EMC_DBG0x008
 #define EMC_TIMING_CONTROL 0x028
 #define EMC_RC 0x02c
 #define EMC_RFC0x030
@@ -80,6 +81,12 @@
 #define EMC_REFRESH_OVERFLOW_INT   BIT(3)
 #define EMC_CLKCHANGE_COMPLETE_INT BIT(4)
 
+#define EMC_DBG_READ_MUX_ASSEMBLY  BIT(0)
+#define EMC_DBG_WRITE_MUX_ACTIVE   BIT(1)
+#define EMC_DBG_FORCE_UPDATE   BIT(2)
+#define EMC_DBG_READ_DQM_CTRL  BIT(9)
+#define EMC_DBG_CFG_PRIORITY   BIT(24)
+
 static const u16 emc_timing_registers[] = {
EMC_RC,
EMC_RFC,
@@ -396,7 +403,7 @@ tegra_emc_find_node_by_ram_code(struct device *dev)
 static int emc_setup_hw(struct tegra_emc *emc)
 {
u32 intmask = EMC_REFRESH_OVERFLOW_INT | EMC_CLKCHANGE_COMPLETE_INT;
-   u32 emc_cfg;
+   u32 emc_cfg, emc_dbg;
 
emc_cfg = readl_relaxed(emc->regs + EMC_CFG_2);
 
@@ -419,6 +426,14 @@ static int emc_setup_hw(struct tegra_emc *emc)
writel_relaxed(intmask, emc->regs + EMC_INTMASK);
writel_relaxed(intmask, emc->regs + EMC_INTSTATUS);
 
+   /* ensure that unwanted debug features are disabled */
+   emc_dbg = readl_relaxed(emc->regs + EMC_DBG);
+   emc_dbg |= EMC_DBG_CFG_PRIORITY;
+   emc_dbg &= ~EMC_DBG_READ_MUX_ASSEMBLY;
+   emc_dbg &= ~EMC_DBG_WRITE_MUX_ACTIVE;
+   emc_dbg &= ~EMC_DBG_FORCE_UPDATE;
+   writel_relaxed(emc_dbg, emc->regs + EMC_DBG);
+
return 0;
 }
 
-- 
2.22.0



[PATCH v8 01/15] clk: tegra20/30: Add custom EMC clock implementation

2019-07-22 Thread Dmitry Osipenko
A proper External Memory Controller clock rounding and parent selection
functionality is required by the EMC drivers, it is not available using
the generic clock implementation because only the Memory Controller driver
is aware of what clock rates are actually available for a particular
device. EMC drivers will have to register a Tegra-specific CLK-API
callback which will perform rounding of a requested rate. EMC clock users
won't be able to request EMC clock by getting -EPROBE_DEFER until EMC
driver is probed and the callback is set up.

The functionality is somewhat similar to the clk-emc.c which serves
Tegra124+ SoCs. The later HW generations support more parent clock sources
and the HW configuration / integration with the EMC drivers differs a tad
from the older gens, hence it's not really worth to try to squash
everything into a single source file.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/clk/tegra/Makefile  |   2 +
 drivers/clk/tegra/clk-tegra20-emc.c | 293 
 drivers/clk/tegra/clk-tegra20.c |  55 ++
 drivers/clk/tegra/clk-tegra30.c |  38 ++--
 drivers/clk/tegra/clk.h |   3 +
 include/linux/clk/tegra.h   |  11 ++
 6 files changed, 350 insertions(+), 52 deletions(-)
 create mode 100644 drivers/clk/tegra/clk-tegra20-emc.c

diff --git a/drivers/clk/tegra/Makefile b/drivers/clk/tegra/Makefile
index 4812e45c2214..df966ca06788 100644
--- a/drivers/clk/tegra/Makefile
+++ b/drivers/clk/tegra/Makefile
@@ -17,7 +17,9 @@ obj-y += clk-tegra-fixed.o
 obj-y  += clk-tegra-super-gen4.o
 obj-$(CONFIG_TEGRA_CLK_EMC)+= clk-emc.o
 obj-$(CONFIG_ARCH_TEGRA_2x_SOC) += clk-tegra20.o
+obj-$(CONFIG_ARCH_TEGRA_2x_SOC)+= clk-tegra20-emc.o
 obj-$(CONFIG_ARCH_TEGRA_3x_SOC) += clk-tegra30.o
+obj-$(CONFIG_ARCH_TEGRA_3x_SOC)+= clk-tegra20-emc.o
 obj-$(CONFIG_ARCH_TEGRA_114_SOC)   += clk-tegra114.o
 obj-$(CONFIG_ARCH_TEGRA_124_SOC)   += clk-tegra124.o
 obj-$(CONFIG_TEGRA_CLK_DFLL)   += clk-tegra124-dfll-fcpu.o
diff --git a/drivers/clk/tegra/clk-tegra20-emc.c 
b/drivers/clk/tegra/clk-tegra20-emc.c
new file mode 100644
index ..03bf0009a33c
--- /dev/null
+++ b/drivers/clk/tegra/clk-tegra20-emc.c
@@ -0,0 +1,293 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Based on drivers/clk/tegra/clk-emc.c
+ * Copyright (c) 2014, NVIDIA CORPORATION.  All rights reserved.
+ *
+ * Author: Dmitry Osipenko 
+ * Copyright (C) 2019 GRATE-DRIVER project
+ */
+
+#define pr_fmt(fmt)"tegra-emc-clk: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "clk.h"
+
+#define CLK_SOURCE_EMC_2X_CLK_DIVISOR_MASK GENMASK(7, 0)
+#define CLK_SOURCE_EMC_2X_CLK_SRC_MASK GENMASK(31, 30)
+#define CLK_SOURCE_EMC_2X_CLK_SRC_SHIFT30
+
+#define MC_EMC_SAME_FREQ   BIT(16)
+#define USE_PLLM_UDBIT(29)
+
+#define EMC_SRC_PLL_M  0
+#define EMC_SRC_PLL_C  1
+#define EMC_SRC_PLL_P  2
+#define EMC_SRC_CLK_M  3
+
+static const char * const emc_parent_clk_names[] = {
+   "pll_m", "pll_c", "pll_p", "clk_m",
+};
+
+struct tegra_clk_emc {
+   struct clk_hw hw;
+   void __iomem *reg;
+   bool mc_same_freq;
+   bool want_low_jitter;
+
+   tegra20_clk_emc_round_cb *round_cb;
+   void *cb_arg;
+};
+
+static inline struct tegra_clk_emc *to_tegra_clk_emc(struct clk_hw *hw)
+{
+   return container_of(hw, struct tegra_clk_emc, hw);
+}
+
+static unsigned long emc_recalc_rate(struct clk_hw *hw,
+unsigned long parent_rate)
+{
+   struct tegra_clk_emc *emc = to_tegra_clk_emc(hw);
+   u32 val, div;
+
+   val = readl_relaxed(emc->reg);
+   div = val & CLK_SOURCE_EMC_2X_CLK_DIVISOR_MASK;
+
+   return DIV_ROUND_UP(parent_rate * 2, div + 2);
+}
+
+static u8 emc_get_parent(struct clk_hw *hw)
+{
+   struct tegra_clk_emc *emc = to_tegra_clk_emc(hw);
+
+   return readl_relaxed(emc->reg) >> CLK_SOURCE_EMC_2X_CLK_SRC_SHIFT;
+}
+
+static int emc_set_parent(struct clk_hw *hw, u8 index)
+{
+   struct tegra_clk_emc *emc = to_tegra_clk_emc(hw);
+   u32 val, div;
+
+   val = readl_relaxed(emc->reg);
+   val &= ~CLK_SOURCE_EMC_2X_CLK_SRC_MASK;
+   val |= index << CLK_SOURCE_EMC_2X_CLK_SRC_SHIFT;
+
+   div = val & CLK_SOURCE_EMC_2X_CLK_DIVISOR_MASK;
+
+   if (index == EMC_SRC_PLL_M && div == 0 && emc->want_low_jitter)
+   val |= USE_PLLM_UD;
+   else
+   val &= ~USE_PLLM_UD;
+
+   if (emc->mc_same_freq)
+   val |= MC_EMC_SAME_FREQ;
+   else
+   val &= ~MC_EMC_SAME_FREQ;
+
+   writel_relaxed(val, emc->reg);
+
+   fence_udelay(1, emc->reg);
+
+   return 0;
+}
+
+static int emc_set_rate(struct clk_hw *hw, unsigned long rate,
+   unsigned 

[PATCH v8 10/15] dt-bindings: memory: Add binding for NVIDIA Tegra30 Memory Controller

2019-07-22 Thread Dmitry Osipenko
Add binding for the NVIDIA Tegra30 SoC Memory Controller.

Signed-off-by: Dmitry Osipenko 
---
 .../memory-controllers/nvidia,tegra30-mc.yaml | 173 ++
 1 file changed, 173 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.yaml

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.yaml 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.yaml
new file mode 100644
index ..40e63cdf836b
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.yaml
@@ -0,0 +1,173 @@
+# SPDX-License-Identifier: (GPL-2.0)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/nvidia,tegra30-mc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NVIDIA Tegra30 SoC Memory Controller
+
+maintainers:
+  - Dmitry Osipenko 
+  - Jon Hunter 
+  - Thierry Reding 
+
+description: |
+  Tegra30 Memory Controller architecturally consists of the following parts:
+
+Arbitration Domains, which can handle a single request or response per
+clock from a group of clients. Typically, a system has a single Arbitration
+Domain, but an implementation may divide the client space into multiple
+Arbitration Domains to increase the effective system bandwidth.
+
+Protocol Arbiter, which manage a related pool of memory devices. A system
+may have a single Protocol Arbiter or multiple Protocol Arbiters.
+
+Memory Crossbar, which routes request and responses between Arbitration
+Domains and Protocol Arbiters. In the simplest version of the system, the
+Memory Crossbar is just a pass through between a single Arbitration Domain
+and a single Protocol Arbiter.
+
+Global Resources, which include things like configuration registers which
+are shared across the Memory Subsystem.
+
+  The Tegra30 Memory Controller handles memory requests from internal clients
+  and arbitrates among them to allocate memory bandwidth for DDR3L and LPDDR2
+  SDRAMs.
+
+properties:
+  compatible:
+const: nvidia,tegra30-mc
+
+  reg:
+maxItems: 1
+description:
+  Physical base address.
+
+  clocks:
+maxItems: 1
+description:
+  Memory Controller clock.
+
+  clock-names:
+items:
+  - const: mc
+
+  interrupts:
+maxItems: 1
+description:
+  Memory Controller interrupt.
+
+  "#reset-cells":
+const: 1
+
+  "#iommu-cells":
+const: 1
+
+patternProperties:
+  "^emc-timings-[0-9]+$":
+type: object
+properties:
+  nvidia,ram-code:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Value of RAM_CODE this timing set is used for.
+
+patternProperties:
+  "^timing-[0-9]+$":
+type: object
+properties:
+  clock-frequency:
+description:
+  Memory clock rate in Hz.
+minimum: 100
+maximum: 9
+
+  nvidia,emem-configuration:
+$ref: /schemas/types.yaml#/definitions/uint32-array
+description: |
+  Values to be written to the EMEM register block. See section
+  "18.13.1 MC Registers" in the TRM.
+items:
+  - description: MC_EMEM_ARB_CFG
+  - description: MC_EMEM_ARB_OUTSTANDING_REQ
+  - description: MC_EMEM_ARB_TIMING_RCD
+  - description: MC_EMEM_ARB_TIMING_RP
+  - description: MC_EMEM_ARB_TIMING_RC
+  - description: MC_EMEM_ARB_TIMING_RAS
+  - description: MC_EMEM_ARB_TIMING_FAW
+  - description: MC_EMEM_ARB_TIMING_RRD
+  - description: MC_EMEM_ARB_TIMING_RAP2PRE
+  - description: MC_EMEM_ARB_TIMING_WAP2PRE
+  - description: MC_EMEM_ARB_TIMING_R2R
+  - description: MC_EMEM_ARB_TIMING_W2W
+  - description: MC_EMEM_ARB_TIMING_R2W
+  - description: MC_EMEM_ARB_TIMING_W2R
+  - description: MC_EMEM_ARB_DA_TURNS
+  - description: MC_EMEM_ARB_DA_COVERS
+  - description: MC_EMEM_ARB_MISC0
+  - description: MC_EMEM_ARB_RING1_THROTTLE
+
+required:
+  - clock-frequency
+  - nvidia,emem-configuration
+
+additionalProperties: false
+
+required:
+  - nvidia,ram-code
+
+additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#reset-cells"
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+memory-controller@7000f000 {
+compatible = "nvidia,tegra30-mc";
+reg = <0x7000f000 0x400>;
+clocks = <_car 32>;
+clock-names = "mc";
+
+interrupts = <0 77 4>;
+
+#iommu-cells = <1>;
+#reset-cells = <1>;
+
+emc-timings-1 {
+nvidia,ram-code = <1>;
+
+timing-66700 {
+clock-frequency = 

[PATCH v8 09/15] dt-bindings: memory: tegra30: Convert to Tegra124 YAML

2019-07-22 Thread Dmitry Osipenko
The Tegra30 binding will actually differ from the Tegra124 a tad, in
particular the EMEM configuration description. Hence rename the binding
to Tegra124 during of the conversion to YAML.

Signed-off-by: Dmitry Osipenko 
---
 .../nvidia,tegra124-mc.yaml   | 156 ++
 .../memory-controllers/nvidia,tegra30-mc.txt  | 123 --
 2 files changed, 156 insertions(+), 123 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra124-mc.yaml
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-mc.txt

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra124-mc.yaml 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra124-mc.yaml
new file mode 100644
index ..eed9ed8ee111
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra124-mc.yaml
@@ -0,0 +1,156 @@
+# SPDX-License-Identifier: (GPL-2.0)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/nvidia,tegra124-mc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NVIDIA Tegra124 SoC Memory Controller
+
+maintainers:
+  - Jon Hunter 
+  - Thierry Reding 
+
+description: |
+  Tegra124 SoC features a hybrid 2x32-bit / 1x64-bit memory controller.
+  These are interleaved to provide high performance with the load shared across
+  two memory channels. The Tegra124 Memory Controller handles memory requests
+  from internal clients and arbitrates among them to allocate memory bandwidth
+  for DDR3L and LPDDR3 SDRAMs.
+
+properties:
+  compatible:
+const: nvidia,tegra124-mc
+
+  reg:
+maxItems: 1
+description:
+  Physical base address.
+
+  clocks:
+maxItems: 1
+description:
+  Memory Controller clock.
+
+  clock-names:
+items:
+  - const: mc
+
+  interrupts:
+maxItems: 1
+description:
+  Memory Controller interrupt.
+
+  "#reset-cells":
+const: 1
+
+  "#iommu-cells":
+const: 1
+
+patternProperties:
+  "^emc-timings-[0-9]+$":
+properties:
+  nvidia,ram-code:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Value of RAM_CODE this timing set is used for.
+
+patternProperties:
+  "^timing-[0-9]+$":
+properties:
+  clock-frequency:
+description:
+  Memory clock rate in Hz.
+minimum: 100
+maximum: 106600
+
+  nvidia,emem-configuration:
+$ref: /schemas/types.yaml#/definitions/uint32-array
+description: |
+  Values to be written to the EMEM register block. See section
+  "15.6.1 MC Registers" in the TRM.
+items:
+  - description: MC_EMEM_ARB_CFG
+  - description: MC_EMEM_ARB_OUTSTANDING_REQ
+  - description: MC_EMEM_ARB_TIMING_RCD
+  - description: MC_EMEM_ARB_TIMING_RP
+  - description: MC_EMEM_ARB_TIMING_RC
+  - description: MC_EMEM_ARB_TIMING_RAS
+  - description: MC_EMEM_ARB_TIMING_FAW
+  - description: MC_EMEM_ARB_TIMING_RRD
+  - description: MC_EMEM_ARB_TIMING_RAP2PRE
+  - description: MC_EMEM_ARB_TIMING_WAP2PRE
+  - description: MC_EMEM_ARB_TIMING_R2R
+  - description: MC_EMEM_ARB_TIMING_W2W
+  - description: MC_EMEM_ARB_TIMING_R2W
+  - description: MC_EMEM_ARB_TIMING_W2R
+  - description: MC_EMEM_ARB_DA_TURNS
+  - description: MC_EMEM_ARB_DA_COVERS
+  - description: MC_EMEM_ARB_MISC0
+  - description: MC_EMEM_ARB_MISC1
+  - description: MC_EMEM_ARB_RING1_THROTTLE
+
+required:
+  - clock-frequency
+  - nvidia,emem-configuration
+
+additionalProperties: false
+
+required:
+  - nvidia,ram-code
+
+additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#reset-cells"
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+memory-controller@70019000 {
+compatible = "nvidia,tegra124-mc";
+reg = <0x0 0x70019000 0x0 0x1000>;
+clocks = <_car 32>;
+clock-names = "mc";
+
+interrupts = <0 77 4>;
+
+#iommu-cells = <1>;
+#reset-cells = <1>;
+
+emc-timings-3 {
+nvidia,ram-code = <3>;
+
+timing-1275 {
+clock-frequency = <1275>;
+
+nvidia,emem-configuration = <
+0x40040001 /* MC_EMEM_ARB_CFG */
+0x800a /* MC_EMEM_ARB_OUTSTANDING_REQ */
+0x0001 /* MC_EMEM_ARB_TIMING_RCD */
+0x0001 /* MC_EMEM_ARB_TIMING_RP */
+0x0002 /* MC_EMEM_ARB_TIMING_RC */
+0x /* MC_EMEM_ARB_TIMING_RAS */
+

[PATCH v8 13/15] memory: tegra: Ensure timing control debug features are disabled

2019-07-22 Thread Dmitry Osipenko
Timing control debug features should be disabled at a boot time, but you
never now and hence it's better to disable them explicitly because some of
those features are crucial for the driver to do a proper thing.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/mc.c | 3 +++
 drivers/memory/tegra/mc.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c
index 43819e8df95c..1bad7f238881 100644
--- a/drivers/memory/tegra/mc.c
+++ b/drivers/memory/tegra/mc.c
@@ -657,6 +657,9 @@ static int tegra_mc_probe(struct platform_device *pdev)
} else
 #endif
{
+   /* ensure that debug features are disabled */
+   mc_writel(mc, 0x, MC_TIMING_CONTROL_DBG);
+
err = tegra_mc_setup_latency_allowance(mc);
if (err < 0) {
dev_err(>dev,
diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h
index 410efc4d7e7b..cd52628c2b96 100644
--- a/drivers/memory/tegra/mc.h
+++ b/drivers/memory/tegra/mc.h
@@ -30,6 +30,8 @@
 #define MC_EMEM_ARB_OVERRIDE   0xe8
 #define MC_EMEM_ARB_OVERRIDE_EACK_MASK 0x3
 
+#define MC_TIMING_CONTROL_DBG  0xf8
+
 #define MC_TIMING_CONTROL  0xfc
 #define MC_TIMING_UPDATE   BIT(0)
 
-- 
2.22.0



[PATCH v8 12/15] memory: tegra: Introduce Tegra30 EMC driver

2019-07-22 Thread Dmitry Osipenko
Introduce driver for the External Memory Controller (EMC) found on Tegra30
chips, it controls the external DRAM on the board. The purpose of this
driver is to program memory timing for external memory on the EMC clock
rate change.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/Kconfig   |   10 +
 drivers/memory/tegra/Makefile  |1 +
 drivers/memory/tegra/mc.c  |9 +-
 drivers/memory/tegra/mc.h  |   30 +-
 drivers/memory/tegra/tegra30-emc.c | 1230 
 drivers/memory/tegra/tegra30.c |   42 +
 include/soc/tegra/mc.h |2 +-
 7 files changed, 1309 insertions(+), 15 deletions(-)
 create mode 100644 drivers/memory/tegra/tegra30-emc.c

diff --git a/drivers/memory/tegra/Kconfig b/drivers/memory/tegra/Kconfig
index 4680124ddcab..fbfbaada61a2 100644
--- a/drivers/memory/tegra/Kconfig
+++ b/drivers/memory/tegra/Kconfig
@@ -17,6 +17,16 @@ config TEGRA20_EMC
  This driver is required to change memory timings / clock rate for
  external memory.
 
+config TEGRA30_EMC
+   bool "NVIDIA Tegra30 External Memory Controller driver"
+   default y
+   depends on TEGRA_MC && ARCH_TEGRA_3x_SOC
+   help
+ This driver is for the External Memory Controller (EMC) found on
+ Tegra30 chips. The EMC controls the external DRAM on the board.
+ This driver is required to change memory timings / clock rate for
+ external memory.
+
 config TEGRA124_EMC
bool "NVIDIA Tegra124 External Memory Controller driver"
default y
diff --git a/drivers/memory/tegra/Makefile b/drivers/memory/tegra/Makefile
index 3971a6b7c487..3d23c4261104 100644
--- a/drivers/memory/tegra/Makefile
+++ b/drivers/memory/tegra/Makefile
@@ -11,5 +11,6 @@ tegra-mc-$(CONFIG_ARCH_TEGRA_210_SOC) += tegra210.o
 obj-$(CONFIG_TEGRA_MC) += tegra-mc.o
 
 obj-$(CONFIG_TEGRA20_EMC)  += tegra20-emc.o
+obj-$(CONFIG_TEGRA30_EMC)  += tegra30-emc.o
 obj-$(CONFIG_TEGRA124_EMC) += tegra124-emc.o
 obj-$(CONFIG_ARCH_TEGRA_186_SOC) += tegra186.o
diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c
index 3d8d322511c5..43819e8df95c 100644
--- a/drivers/memory/tegra/mc.c
+++ b/drivers/memory/tegra/mc.c
@@ -48,9 +48,6 @@
 #define MC_EMEM_ADR_CFG 0x54
 #define MC_EMEM_ADR_CFG_EMEM_NUMDEV BIT(0)
 
-#define MC_TIMING_CONTROL  0xfc
-#define MC_TIMING_UPDATE   BIT(0)
-
 static const struct of_device_id tegra_mc_of_match[] = {
 #ifdef CONFIG_ARCH_TEGRA_2x_SOC
{ .compatible = "nvidia,tegra20-mc-gart", .data = _mc_soc },
@@ -307,7 +304,7 @@ static int tegra_mc_setup_latency_allowance(struct tegra_mc 
*mc)
return 0;
 }
 
-void tegra_mc_write_emem_configuration(struct tegra_mc *mc, unsigned long rate)
+int tegra_mc_write_emem_configuration(struct tegra_mc *mc, unsigned long rate)
 {
unsigned int i;
struct tegra_mc_timing *timing = NULL;
@@ -322,11 +319,13 @@ void tegra_mc_write_emem_configuration(struct tegra_mc 
*mc, unsigned long rate)
if (!timing) {
dev_err(mc->dev, "no memory timing registered for rate %lu\n",
rate);
-   return;
+   return -EINVAL;
}
 
for (i = 0; i < mc->soc->num_emem_regs; ++i)
mc_writel(mc, timing->emem_data[i], mc->soc->emem_regs[i]);
+
+   return 0;
 }
 
 unsigned int tegra_mc_get_emem_device_count(struct tegra_mc *mc)
diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h
index f9353494b708..410efc4d7e7b 100644
--- a/drivers/memory/tegra/mc.h
+++ b/drivers/memory/tegra/mc.h
@@ -6,20 +6,32 @@
 #ifndef MEMORY_TEGRA_MC_H
 #define MEMORY_TEGRA_MC_H
 
+#include 
 #include 
 #include 
 
 #include 
 
-#define MC_INT_DECERR_MTS (1 << 16)
-#define MC_INT_SECERR_SEC (1 << 13)
-#define MC_INT_DECERR_VPR (1 << 12)
-#define MC_INT_INVALID_APB_ASID_UPDATE (1 << 11)
-#define MC_INT_INVALID_SMMU_PAGE (1 << 10)
-#define MC_INT_ARBITRATION_EMEM (1 << 9)
-#define MC_INT_SECURITY_VIOLATION (1 << 8)
-#define MC_INT_INVALID_GART_PAGE (1 << 7)
-#define MC_INT_DECERR_EMEM (1 << 6)
+#define MC_INT_DECERR_MTS  BIT(16)
+#define MC_INT_SECERR_SEC  BIT(13)
+#define MC_INT_DECERR_VPR  BIT(12)
+#define MC_INT_INVALID_APB_ASID_UPDATE BIT(11)
+#define MC_INT_INVALID_SMMU_PAGE   BIT(10)
+#define MC_INT_ARBITRATION_EMEMBIT(9)
+#define MC_INT_SECURITY_VIOLATION  BIT(8)
+#define MC_INT_INVALID_GART_PAGE   BIT(7)
+#define MC_INT_DECERR_EMEM BIT(6)
+
+#define MC_EMEM_ARB_OUTSTANDING_REQ0x94
+#define MC_EMEM_ARB_OUTSTANDING_REQ_MAX_MASK   0x1ff
+#define MC_EMEM_ARB_OUTSTANDING_REQ_HOLDOFF_OVERRIDE   BIT(30)
+#define MC_EMEM_ARB_OUTSTANDING_REQ_LIMIT_ENABLE   BIT(31)
+
+#define MC_EMEM_ARB_OVERRIDE  

[PATCH v8 06/15] memory: tegra20-emc: Print a brief info message about the timings

2019-07-22 Thread Dmitry Osipenko
During boot print how many memory timings got the driver and what's the
RAM code. This is a very useful information when something is wrong with
boards memory timing.

Suggested-by: Marc Dietrich 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index 85c24f285fd4..25a6aad6a7a9 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -368,6 +368,13 @@ static int tegra_emc_load_timings_from_dt(struct tegra_emc 
*emc,
sort(emc->timings, emc->num_timings, sizeof(*timing), cmp_timings,
 NULL);
 
+   dev_info(emc->dev,
+"got %u timings for RAM code %u (min %luMHz max %luMHz)\n",
+emc->num_timings,
+tegra_read_ram_code(),
+emc->timings[0].rate / 100,
+emc->timings[emc->num_timings - 1].rate / 100);
+
return 0;
 }
 
-- 
2.22.0



[PATCH v8 15/15] ARM: dts: tegra30: Add External Memory Controller node

2019-07-22 Thread Dmitry Osipenko
Add External Memory Controller node to the device-tree.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 arch/arm/boot/dts/tegra30.dtsi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/boot/dts/tegra30.dtsi b/arch/arm/boot/dts/tegra30.dtsi
index e074258d4518..8355264e2265 100644
--- a/arch/arm/boot/dts/tegra30.dtsi
+++ b/arch/arm/boot/dts/tegra30.dtsi
@@ -732,6 +732,15 @@
#reset-cells = <1>;
};
 
+   memory-controller@7000f400 {
+   compatible = "nvidia,tegra30-emc";
+   reg = <0x7000f400 0x400>;
+   interrupts = ;
+   clocks = <_car TEGRA30_CLK_EMC>;
+
+   nvidia,memory-controller = <>;
+   };
+
fuse@7000f800 {
compatible = "nvidia,tegra30-efuse";
reg = <0x7000f800 0x400>;
-- 
2.22.0



[PATCH v8 02/15] memory: tegra20-emc: Drop setting EMC rate to max on probe

2019-07-22 Thread Dmitry Osipenko
The memory frequency scaling will be managed by tegra20-devfreq driver
and PM QoS once all the prerequisite patches will get upstreamed.
The parent clock is now managed by the clock driver and we also should
assume that PLLM rate can't be changed on some devices (Galaxy Tab 10.1
for example). Altogether there is no point in touching of clock's rate
from the EMC driver.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 78 +-
 1 file changed, 1 insertion(+), 77 deletions(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index 9ee5bef49e47..da8fa592b071 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -137,9 +137,6 @@ struct tegra_emc {
struct device *dev;
struct completion clk_handshake_complete;
struct notifier_block clk_nb;
-   struct clk *backup_clk;
-   struct clk *emc_mux;
-   struct clk *pll_m;
struct clk *clk;
void __iomem *regs;
 
@@ -424,41 +421,6 @@ static int emc_setup_hw(struct tegra_emc *emc)
return 0;
 }
 
-static int emc_init(struct tegra_emc *emc, unsigned long rate)
-{
-   int err;
-
-   err = clk_set_parent(emc->emc_mux, emc->backup_clk);
-   if (err) {
-   dev_err(emc->dev,
-   "failed to reparent to backup source: %d\n", err);
-   return err;
-   }
-
-   err = clk_set_rate(emc->pll_m, rate);
-   if (err) {
-   dev_err(emc->dev,
-   "failed to change pll_m rate: %d\n", err);
-   return err;
-   }
-
-   err = clk_set_parent(emc->emc_mux, emc->pll_m);
-   if (err) {
-   dev_err(emc->dev,
-   "failed to reparent to pll_m: %d\n", err);
-   return err;
-   }
-
-   err = clk_set_rate(emc->clk, rate);
-   if (err) {
-   dev_err(emc->dev,
-   "failed to change emc rate: %d\n", err);
-   return err;
-   }
-
-   return 0;
-}
-
 static int tegra_emc_probe(struct platform_device *pdev)
 {
struct device_node *np;
@@ -522,52 +484,14 @@ static int tegra_emc_probe(struct platform_device *pdev)
return err;
}
 
-   emc->pll_m = clk_get_sys(NULL, "pll_m");
-   if (IS_ERR(emc->pll_m)) {
-   err = PTR_ERR(emc->pll_m);
-   dev_err(>dev, "failed to get pll_m clock: %d\n", err);
-   return err;
-   }
-
-   emc->backup_clk = clk_get_sys(NULL, "pll_p");
-   if (IS_ERR(emc->backup_clk)) {
-   err = PTR_ERR(emc->backup_clk);
-   dev_err(>dev, "failed to get pll_p clock: %d\n", err);
-   goto put_pll_m;
-   }
-
-   emc->emc_mux = clk_get_parent(emc->clk);
-   if (IS_ERR(emc->emc_mux)) {
-   err = PTR_ERR(emc->emc_mux);
-   dev_err(>dev, "failed to get emc_mux clock: %d\n", err);
-   goto put_backup;
-   }
-
err = clk_notifier_register(emc->clk, >clk_nb);
if (err) {
dev_err(>dev, "failed to register clk notifier: %d\n",
err);
-   goto put_backup;
-   }
-
-   /* set DRAM clock rate to maximum */
-   err = emc_init(emc, emc->timings[emc->num_timings - 1].rate);
-   if (err) {
-   dev_err(>dev, "failed to initialize EMC clock rate: %d\n",
-   err);
-   goto unreg_notifier;
+   return err;
}
 
return 0;
-
-unreg_notifier:
-   clk_notifier_unregister(emc->clk, >clk_nb);
-put_backup:
-   clk_put(emc->backup_clk);
-put_pll_m:
-   clk_put(emc->pll_m);
-
-   return err;
 }
 
 static const struct of_device_id tegra_emc_of_match[] = {
-- 
2.22.0



[PATCH v8 14/15] memory: tegra: Consolidate registers definition into common header

2019-07-22 Thread Dmitry Osipenko
The Memory Controller registers definition is sparse and duplicated,
let's consolidate everything into a common place for consistency.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/mc.c   | 30 ---
 drivers/memory/tegra/mc.h   | 52 +
 drivers/memory/tegra/tegra124.c | 20 -
 drivers/memory/tegra/tegra30.c  | 19 
 4 files changed, 47 insertions(+), 74 deletions(-)

diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c
index 1bad7f238881..955f1d3f6b6a 100644
--- a/drivers/memory/tegra/mc.c
+++ b/drivers/memory/tegra/mc.c
@@ -18,36 +18,6 @@
 
 #include "mc.h"
 
-#define MC_INTSTATUS 0x000
-
-#define MC_INTMASK 0x004
-
-#define MC_ERR_STATUS 0x08
-#define  MC_ERR_STATUS_TYPE_SHIFT 28
-#define  MC_ERR_STATUS_TYPE_INVALID_SMMU_PAGE (6 << MC_ERR_STATUS_TYPE_SHIFT)
-#define  MC_ERR_STATUS_TYPE_MASK (0x7 << MC_ERR_STATUS_TYPE_SHIFT)
-#define  MC_ERR_STATUS_READABLE (1 << 27)
-#define  MC_ERR_STATUS_WRITABLE (1 << 26)
-#define  MC_ERR_STATUS_NONSECURE (1 << 25)
-#define  MC_ERR_STATUS_ADR_HI_SHIFT 20
-#define  MC_ERR_STATUS_ADR_HI_MASK 0x3
-#define  MC_ERR_STATUS_SECURITY (1 << 17)
-#define  MC_ERR_STATUS_RW (1 << 16)
-
-#define MC_ERR_ADR 0x0c
-
-#define MC_GART_ERROR_REQ  0x30
-#define MC_DECERR_EMEM_OTHERS_STATUS   0x58
-#define MC_SECURITY_VIOLATION_STATUS   0x74
-
-#define MC_EMEM_ARB_CFG 0x90
-#define  MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE(x)  (((x) & 0x1ff) << 0)
-#define  MC_EMEM_ARB_CFG_CYCLES_PER_UPDATE_MASK0x1ff
-#define MC_EMEM_ARB_MISC0 0xd8
-
-#define MC_EMEM_ADR_CFG 0x54
-#define MC_EMEM_ADR_CFG_EMEM_NUMDEV BIT(0)
-
 static const struct of_device_id tegra_mc_of_match[] = {
 #ifdef CONFIG_ARCH_TEGRA_2x_SOC
{ .compatible = "nvidia,tegra20-mc-gart", .data = _mc_soc },
diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h
index cd52628c2b96..957c6eb74ff9 100644
--- a/drivers/memory/tegra/mc.h
+++ b/drivers/memory/tegra/mc.h
@@ -12,6 +12,37 @@
 
 #include 
 
+#define MC_INTSTATUS   0x00
+#define MC_INTMASK 0x04
+#define MC_ERR_STATUS  0x08
+#define MC_ERR_ADR 0x0c
+#define MC_GART_ERROR_REQ  0x30
+#define MC_EMEM_ADR_CFG0x54
+#define MC_DECERR_EMEM_OTHERS_STATUS   0x58
+#define MC_SECURITY_VIOLATION_STATUS   0x74
+#define MC_EMEM_ARB_CFG0x90
+#define MC_EMEM_ARB_OUTSTANDING_REQ0x94
+#define MC_EMEM_ARB_TIMING_RCD 0x98
+#define MC_EMEM_ARB_TIMING_RP  0x9c
+#define MC_EMEM_ARB_TIMING_RC  0xa0
+#define MC_EMEM_ARB_TIMING_RAS 0xa4
+#define MC_EMEM_ARB_TIMING_FAW 0xa8
+#define MC_EMEM_ARB_TIMING_RRD 0xac
+#define MC_EMEM_ARB_TIMING_RAP2PRE 0xb0
+#define MC_EMEM_ARB_TIMING_WAP2PRE 0xb4
+#define MC_EMEM_ARB_TIMING_R2R 0xb8
+#define MC_EMEM_ARB_TIMING_W2W 0xbc
+#define MC_EMEM_ARB_TIMING_R2W 0xc0
+#define MC_EMEM_ARB_TIMING_W2R 0xc4
+#define MC_EMEM_ARB_DA_TURNS   0xd0
+#define MC_EMEM_ARB_DA_COVERS  0xd4
+#define MC_EMEM_ARB_MISC0  0xd8
+#define MC_EMEM_ARB_MISC1  0xdc
+#define MC_EMEM_ARB_RING1_THROTTLE 0xe0
+#define MC_EMEM_ARB_OVERRIDE   0xe8
+#define MC_TIMING_CONTROL_DBG  0xf8
+#define MC_TIMING_CONTROL  0xfc
+
 #define MC_INT_DECERR_MTS  BIT(16)
 #define MC_INT_SECERR_SEC  BIT(13)
 #define MC_INT_DECERR_VPR  BIT(12)
@@ -22,17 +53,28 @@
 #define MC_INT_INVALID_GART_PAGE   BIT(7)
 #define MC_INT_DECERR_EMEM BIT(6)
 
-#define MC_EMEM_ARB_OUTSTANDING_REQ0x94
+#define MC_ERR_STATUS_TYPE_SHIFT   28
+#define MC_ERR_STATUS_TYPE_INVALID_SMMU_PAGE   (0x6 << 28)
+#define MC_ERR_STATUS_TYPE_MASK(0x7 << 28)
+#define MC_ERR_STATUS_READABLE BIT(27)
+#define MC_ERR_STATUS_WRITABLE BIT(26)
+#define MC_ERR_STATUS_NONSECUREBIT(25)
+#define MC_ERR_STATUS_ADR_HI_SHIFT 20
+#define MC_ERR_STATUS_ADR_HI_MASK  0x3
+#define MC_ERR_STATUS_SECURITY BIT(17)
+#define MC_ERR_STATUS_RW   BIT(16)
+
+#define 

[PATCH v8 11/15] dt-bindings: memory: Add binding for NVIDIA Tegra30 External Memory Controller

2019-07-22 Thread Dmitry Osipenko
Add device-tree binding for NVIDIA Tegra30 External Memory Controller.
The binding is based on the Tegra124 EMC binding since hardware is
similar, although there are couple significant differences.

Note that the memory timing description is given in a platform-specific
form because there is no detailed information on how to convert a
typical-common DDR timing into the register values. The timing format is
borrowed from downstream kernel, hence there is no hurdle in regards to
upstreaming of memory timings for the boards.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 .../nvidia,tegra30-emc.yaml   | 341 ++
 1 file changed, 341 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-emc.yaml

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-emc.yaml 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-emc.yaml
new file mode 100644
index ..6865cfb16e59
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra30-emc.yaml
@@ -0,0 +1,341 @@
+# SPDX-License-Identifier: (GPL-2.0)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/nvidia,tegra30-emc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: NVIDIA Tegra30 SoC External Memory Controller
+
+maintainers:
+  - Dmitry Osipenko 
+  - Jon Hunter 
+  - Thierry Reding 
+
+description: |
+  The EMC interfaces with the off-chip SDRAM to service the request stream
+  sent from Memory Controller. The EMC also has various performance-affecting
+  settings beyond the obvious SDRAM configuration parameters and initialization
+  settings. Tegra30 EMC supports multiple JEDEC standard protocols: LPDDR2,
+  LPDDR3, and DDR3.
+
+properties:
+  compatible:
+const: nvidia,tegra30-emc
+
+  reg:
+maxItems: 1
+description:
+  Physical base address.
+
+  clocks:
+maxItems: 1
+description:
+  EMC clock.
+
+  interrupts:
+maxItems: 1
+description:
+  EMC General interrupt.
+
+  nvidia,memory-controller:
+$ref: /schemas/types.yaml#/definitions/phandle
+description:
+  Phandle of the Memory Controller node.
+
+patternProperties:
+  "^emc-timings-[0-9]+$":
+type: object
+properties:
+  nvidia,ram-code:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Value of RAM_CODE this timing set is used for.
+
+patternProperties:
+  "^timing-[0-9]+$":
+type: object
+properties:
+  clock-frequency:
+description:
+  Memory clock rate in Hz.
+minimum: 100
+maximum: 9
+
+  nvidia,emc-auto-cal-interval:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Pad calibration interval.
+
+  nvidia,emc-mode-1:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Mode Register 1.
+
+  nvidia,emc-mode-2:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Mode Register 2.
+
+  nvidia,emc-mode-reset:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Mode Register 0.
+
+  nvidia,emc-zcal-cnt-long:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Number of EMC clocks to wait before issuing any commands after
+  sending ZCAL_MRW_CMD.
+
+  nvidia,emc-cfg-dyn-self-ref:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  Dynamic self-refresh enabled.
+
+  nvidia,emc-cfg-periodic-qrst:
+$ref: /schemas/types.yaml#/definitions/uint32
+description:
+  FBIO "read" FIFO periodic resetting enabled.
+
+  nvidia,emc-configuration:
+$ref: /schemas/types.yaml#/definitions/uint32-array
+description:
+  EMC timing characterization data. These are the registers
+  (see section "18.13.2 EMC Registers" in the TRM) whose values
+  need to be specified, according to the board documentation.
+items:
+  - description: EMC_RC
+  - description: EMC_RFC
+  - description: EMC_RAS
+  - description: EMC_RP
+  - description: EMC_R2W
+  - description: EMC_W2R
+  - description: EMC_R2P
+  - description: EMC_W2P
+  - description: EMC_RD_RCD
+  - description: EMC_WR_RCD
+  - description: EMC_RRD
+  - description: EMC_REXT
+  - description: EMC_WEXT
+  - description: EMC_WDV
+  - description: EMC_QUSE
+  - description: EMC_QRST
+  - description: EMC_QSAFE
+   

[PATCH v8 04/15] memory: tegra20-emc: Include io.h instead of iopoll.h

2019-07-22 Thread Dmitry Osipenko
The register polling code was gone, but the included header change was
missed. Fix it up for consistency.

Acked-by: Peter De Schrijver 
Signed-off-by: Dmitry Osipenko 
---
 drivers/memory/tegra/tegra20-emc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/memory/tegra/tegra20-emc.c 
b/drivers/memory/tegra/tegra20-emc.c
index b519f02b0ee9..1ce351dd5461 100644
--- a/drivers/memory/tegra/tegra20-emc.c
+++ b/drivers/memory/tegra/tegra20-emc.c
@@ -10,7 +10,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
-- 
2.22.0



[PATCH v8 00/15] memory: tegra: Introduce Tegra30 EMC driver

2019-07-22 Thread Dmitry Osipenko
Hello,

This series introduces driver for the External Memory Controller (EMC)
found on Tegra30 chips, it controls the external DRAM on the board. The
purpose of this driver is to program memory timing for external memory on
the EMC clock rate change. The driver was tested using the ACTMON devfreq
driver that performs memory frequency scaling based on memory-usage load.

Changelog:

v8: - Added two new patches:

memory: tegra20-emc: Increase handshake timeout
memory: tegra20-emc: wait_for_completion_timeout() doesn't return error

  Turned out that memory-clk handshake may take much more time under
  some circumstances. The second patch is a minor cleanup. The same
  changes are also applied to the Terga30 EMC driver addition-patch.

  The pattern-properties of YAML bindings gained "type: object", for
  consistency.

v7: - Addressed review comments that were made by Rob Herring to v6 by
  removing old Terga30 Memory Controller binding once it's converted
  to YAML, by using explicit patterns for the sub-nodes and specifying
  min/max clock rates in the YAML.

- Two patches that were added in v6 are removed from the series:

clk: tegra20: emc: Add tegra20_clk_emc_on_pllp()
ARM: tegra30: cpuidle: Don't enter LP2 on CPU0 when EMC runs off PLLP

  Because the problem with the PLLP is resolved now, turned out it was
  a bug in the CPU-suspend code.

- The "Introduce Tegra30 EMC driver" patch got a fix for the "Same Freq"
  bit typo, it's a bit 27 and not 16.

v6: - Tegra124 Memory Controller binding factored out into standalone
  binding because it requires to specify MC_EMEM_ARB_MISC1 for EMEM
  programming, which is not required for Tegra30. This makes the
  upstream MC registers specification to match downstream exactly,
  easing porting of boards memory timings configuration to upstream.

- Tegra30/124 Memory Controller binding converted to YAML.

- Tegra30 External Memory Controller binding now is in YAML format.

- Added workaround for hanging during LP2 when EMC runs off PLLP on
  Tegra30 in this new patches:

clk: tegra20: emc: Add tegra20_clk_emc_on_pllp()
ARM: tegra30: cpuidle: Don't enter LP2 on CPU0 when EMC runs off PLLP

- Added info message to the Tegra20/30 EMC drivers, telling about
  RAM code and a number of available timings:

memory: tegra20-emc: Print a brief info message about the timings

v5: - Addressed review comments that were made by Thierry Reding to v4 by
  adding appropriate copyrights to the source code headers and making
  Tegra30 EMC driver to use common Tegra20 CLK API directly instead
  of having a dummy-proxy functions specifically for Tegra30.

- Addressed review comments that were made by Stephen Boyd to v4 by
  rewording commit message of the "Add custom EMC clock implementation"
  patch and adding clarifying comment (to that patch as well) which
  tells why EMC is a critical clock.

- Added suspend-resume to Tegra30 EMC driver to error out if EMC driver
  is in a "bad state" as it will likely cause a hang on entering suspend.

- Dropped patch "tegra20-emc: Replace clk_get_sys with devm_clk_get"
  because the replaced clocks are actually should be removed altogether
  in the "Drop setting EMC rate to max on probe" patch and that was
  missed by an accident.

- Added "tegra20-emc: Pre-configure debug register" patch which ensures
  that inappropriate HW debug features are disabled at a probe time.
  The same change is also made in the "Introduce Tegra30 EMC driver"
  patch.

- Added ACKs to the patches from Peter De Schrijver that he gave to v4
  since all of the v5 changes are actually very minor.

v4: - Addressed review comments that were made by Peter De Schrijver to v3
  by adding fence_udelay() after writes in the "Add custom EMC clock
  implementation" patch.

- Added two new minor patches:

memory: tegra: Ensure timing control debug features are disabled
memory: tegra: Consolidate registers definition into one place

  The first one is needed to ensure that EMC driver will work
  properly regardless of hardware configuration left after boot.
  The second patch is just a minor code cleanup.

- The "Introduce Tegra30 EMC driver" got also few very minor changes.
  Now every possible error case is handled, nothing is ignored.
  The EMC_DBG register is explicitly initialized during probe to be
  on the safe side.

v3: - Addressed review comments that were made by Stephen Boyd to v2 by
  adding explicit typing for the callback variable, by including
  "clk-provider.h" directly in the code and by dropping __clk_lookup
  usage where possible.

- Added more patches into this series:

memory: tegra20-emc: Drop setting EMC rate to max on probe
memory: tegra20-emc: Adapt 

Re: [PATCH 1/2] printk/panic: Access the main printk log in panic() only when safe

2019-07-22 Thread Sergey Senozhatsky
On (07/19/19 14:57), Petr Mladek wrote:
[..]
> > Where do nested printk()-s come from? Which one of the following
> > scenarios you cover in commit message:
> > 
> > scenario 1
> > 
> > - we have CPUB which holds logbuf_lock
> > - we have CPUA which panic()-s the system, but can't bring CPUB down,
> >   so logbuf_lock stays locked on remote CPU
> 
> No, this scenario is not affected by this patch. It would always lead to
> a deadlock.

Agreed, in many cases we won't be able to panic() the system properly,
deadlocking somewhere in smp_send_stop().

> > scenario 2
> > 
> > - we have CPUA which holds logbuf_lock
> > - we have panic() on CPUA, but it cannot bring down some other CPUB
> >   so logbuf_lock stays locked on local CPU, and it cannot re-init
> >   logbuf.
[..]
>   + Before:
>   + printk_safe_flush_on_panic() will keep logbuf_lock locked
>   and do nothing.
> 
>   + kmsg_dump(), console_unblank(), or console_flush_on_panic()
>   will deadlock when they try to get logbuf_lock(). They will
>   not be able to process any single line.
> 
>   + After:
>   + printk_bust_lock_safe() will keep logbuf_lock locked
> 
>   + All functions using logbuf_lock will not get called.
>   We will not see the messages (as previously) but the
>   system will not deadlock.
> 
> 
> But there is one more scenario 3:

Yes!

>   - we have CPUB which loops or is deadlocked in IRQ context
> 
>   - we have CPUA which panic()-s the system, but can't bring CPUB down,
> so logbuf_lock might be takes and release from time to time
> by CPUB

Great!

This is the only case when we actually need to pay attention to
num_online_cpus(), because there is an active logbuf_lock owner;
in any other case we can unconditionally re-init printk() locks.

But there is more to it.

Note, that the problem in scenario 3 is bigger than just logbuf_lock.
Regardless of logbuf implementation we will not be able to panic()
the system.

If we have a never ending source of printk() messages, coming from
misbehaving CPU which stuck in printing loop in IRQ context, then
flush_on_panic() will never end or kmsg dump will never stop, etc.
We need to cut off misbehaving CPUs. Panic CPU waits (for up to 1
second?) in smp_send_stop() for secondary CPUs to die, if some
secondary CPUs are still chatty after that then most likely those
CPUs don't have anything good to say, just a pointless flood of same
messages over and over again; which, however, will not let panic
CPU to proceed.

And this is where the idea of "disconnecting" those CPUs from main
logbuf come from.

So what we can do:
- smp_send_stop()
- disconnect all-but-self from logbuf (via printk-safe)
- wait for 1 or 2 more extra seconds for secondary CPUs to leave
  console_unlock() and to redirect printks to per-CPU buffers
- after that we are sort of good-to-go: re-init printk locks
  and do kmsg_dump, flush_on_panic().

Note, misbehaving CPUs will write to per-CPU buffers, they are not
expected to be able to flush per-CPU buffers to the main logbuf. That
will require enabled IRQs, which should deliver stop IPI. But we can
do even more - just disable print_safe irq work on disconnect CPUs.

So, shall we try one more time with the "disconnect" misbehaving CPUs
approach? I can send an RFC patch.

-ss


RE: linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8*/sw.c: many redundant assignments ?

2019-07-22 Thread Pkshih


> -Original Message-
> From: David Binderman [mailto:dcb...@hotmail.com]
> Sent: Monday, July 22, 2019 4:12 PM
> To: Pkshih; kv...@codeaurora.org; da...@davemloft.net; 
> linux-wirel...@vger.kernel.org;
> net...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8*/sw.c: many 
> redundant assignments ?
> 
> Hello there,
> 
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/sw.c:120]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->disable_watchdog' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c:134]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->disable_watchdog' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c:133]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->disable_watchdog' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/sw.c:150]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->disable_watchdog' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/sw.c:118]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8192ce/sw.c:116]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8192cu/sw.c:42]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8192se/sw.c:164]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/sw.c:132]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8723be/sw.c:131]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> > [linux-5.2.2/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/sw.c:148]: 
> > (warning) Redundant
> assignment of 'rtlpriv->cfg->mod_params->sw_crypto' to itself.
> 
> Might be worth a look
> 

I send a patch to fix it.
https://patchwork.kernel.org/patch/11053745/

Thank you.

---
PK



Re: [PATCH 02/10] ARM: dts: imx6ul: segin: Add boot media to dts filename

2019-07-22 Thread Shawn Guo
On Tue, Jul 09, 2019 at 09:19:19AM +0200, Stefan Riedmueller wrote:
> There is now a PHYTEC phyCORE-i.MX 6UL with eMMC instead of NAND flash
> available. The dts filename needs to reflect that to differentiate both.
> 
> Signed-off-by: Stefan Riedmueller 
> ---
>  arch/arm/boot/dts/Makefile   | 2 +-
>  ...l-phytec-segin-ff-rdk.dts => imx6ul-phytec-segin-ff-rdk-nand.dts} | 5 
> +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
>  rename arch/arm/boot/dts/{imx6ul-phytec-segin-ff-rdk.dts => 
> imx6ul-phytec-segin-ff-rdk-nand.dts} (85%)
> 
> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> index e1924b06f3cb..668b57c8cc57 100644
> --- a/arch/arm/boot/dts/Makefile
> +++ b/arch/arm/boot/dts/Makefile
> @@ -573,7 +573,7 @@ dtb-$(CONFIG_SOC_IMX6UL) += \
>   imx6ul-opos6uldev.dtb \
>   imx6ul-pico-hobbit.dtb \
>   imx6ul-pico-pi.dtb \
> - imx6ul-phytec-segin-ff-rdk.dtb \
> + imx6ul-phytec-segin-ff-rdk-nand.dtb \
>   imx6ul-tx6ul-0010.dtb \
>   imx6ul-tx6ul-0011.dtb \
>   imx6ul-tx6ul-mainboard.dtb \
> diff --git a/arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk.dts 
> b/arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk-nand.dts
> similarity index 85%
> rename from arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk.dts
> rename to arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk-nand.dts
> index 1e59183a2f7c..dc06029c5701 100644
> --- a/arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk.dts
> +++ b/arch/arm/boot/dts/imx6ul-phytec-segin-ff-rdk-nand.dts
> @@ -10,8 +10,9 @@
>  #include "imx6ul-phytec-segin-peb-eval-01.dtsi"
>  
>  / {
> - model = "PHYTEC phyBOARD-Segin i.MX6 UltraLite Full Featured";
> - compatible = "phytec,imx6ul-pbacd10", "phytec,imx6ul-pcl063", 
> "fsl,imx6ul";
> + model = "PHYTEC phyBOARD-Segin i.MX6 UltraLite Full Featured with NAND";
> + compatible = "phytec,imx6ul-pbacd10-nand", "phytec,imx6ul-pbacd10",

The board compatibles need to be documented.

Shawn

> +  "phytec,imx6ul-pcl063", "fsl,imx6ul";
>  };
>  
>   {
> -- 
> 2.7.4
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[PATCH 7/8] pipe: add pipe_buf_get() helper

2019-07-22 Thread Ajay Kaher
From: Miklos Szeredi 

commit 7bf2d1df80822ec056363627e2014990f068f7aa upstream.

Signed-off-by: Miklos Szeredi 
Signed-off-by: Al Viro 
Signed-off-by: Ajay Kaher 
Reviewed-by: Srivatsa S. Bhat (VMware) 
---
 fs/fuse/dev.c |  2 +-
 fs/splice.c   |  4 ++--
 include/linux/pipe_fs_i.h | 11 +++
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index f5d2d23..36a5df9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2052,7 +2052,7 @@ static ssize_t fuse_dev_splice_write(struct 
pipe_inode_info *pipe,
pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
pipe->nrbufs--;
} else {
-   ibuf->ops->get(pipe, ibuf);
+   pipe_buf_get(pipe, ibuf);
*obuf = *ibuf;
obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
obuf->len = rem;
diff --git a/fs/splice.c b/fs/splice.c
index 8398974..fde1263 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,7 @@ retry:
 * Get a reference to this pipe buffer,
 * so we can copy the contents over.
 */
-   ibuf->ops->get(ipipe, ibuf);
+   pipe_buf_get(ipipe, ibuf);
*obuf = *ibuf;
 
/*
@@ -1948,7 +1948,7 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 * Get a reference to this pipe buffer,
 * so we can copy the contents over.
 */
-   ibuf->ops->get(ipipe, ibuf);
+   pipe_buf_get(ipipe, ibuf);
 
obuf = opipe->bufs + nbuf;
*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 24f5470..10876f3 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -115,6 +115,17 @@ struct pipe_buf_operations {
void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
+/**
+ * pipe_buf_get - get a reference to a pipe_buffer
+ * @pipe:  the pipe that the buffer belongs to
+ * @buf:   the buffer to get a reference to
+ */
+static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+   struct pipe_buffer *buf)
+{
+   buf->ops->get(pipe, buf);
+}
+
 /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
 #define PIPE_SIZE  PAGE_SIZE
-- 
2.7.4



[PATCH 8/8] fs: prevent page refcount overflow in pipe_buf_get

2019-07-22 Thread Ajay Kaher
From: Matthew Wilcox 

commit 15fab63e1e57be9fdb5eec1bbc5916e9825e9acb upstream.

Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn 
Signed-off-by: Matthew Wilcox 
Cc: sta...@kernel.org
Signed-off-by: Linus Torvalds 
[ 4.4.y backport notes:
  Regarding the change in generic_pipe_buf_get(), note that
  page_cache_get() is the same as get_page(). See mainline commit
  09cbfeaf1a5a6 "mm, fs: get rid of PAGE_CACHE_* and
  page_cache_{get,release} macros" for context. ]
Signed-off-by: Ajay Kaher 
Reviewed-by: Srivatsa S. Bhat (VMware) 
---
 fs/fuse/dev.c | 12 ++--
 fs/pipe.c |  4 ++--
 fs/splice.c   | 12 ++--
 include/linux/pipe_fs_i.h | 10 ++
 kernel/trace/trace.c  |  6 +-
 5 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 36a5df9..16891f5 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2031,10 +2031,8 @@ static ssize_t fuse_dev_splice_write(struct 
pipe_inode_info *pipe,
rem += pipe->bufs[(pipe->curbuf + idx) & (pipe->buffers - 
1)].len;
 
ret = -EINVAL;
-   if (rem < len) {
-   pipe_unlock(pipe);
-   goto out;
-   }
+   if (rem < len)
+   goto out_free;
 
rem = len;
while (rem) {
@@ -2052,7 +2050,9 @@ static ssize_t fuse_dev_splice_write(struct 
pipe_inode_info *pipe,
pipe->curbuf = (pipe->curbuf + 1) & (pipe->buffers - 1);
pipe->nrbufs--;
} else {
-   pipe_buf_get(pipe, ibuf);
+   if (!pipe_buf_get(pipe, ibuf))
+   goto out_free;
+
*obuf = *ibuf;
obuf->flags &= ~PIPE_BUF_FLAG_GIFT;
obuf->len = rem;
@@ -2075,13 +2075,13 @@ static ssize_t fuse_dev_splice_write(struct 
pipe_inode_info *pipe,
ret = fuse_dev_do_write(fud, , len);
 
pipe_lock(pipe);
+out_free:
for (idx = 0; idx < nbuf; idx++) {
struct pipe_buffer *buf = [idx];
buf->ops->release(pipe, buf);
}
pipe_unlock(pipe);
 
-out:
kfree(bufs);
return ret;
 }
diff --git a/fs/pipe.c b/fs/pipe.c
index 1e7263b..6534470 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -178,9 +178,9 @@ EXPORT_SYMBOL(generic_pipe_buf_steal);
  * in the tee() system call, when we duplicate the buffers in one
  * pipe into another.
  */
-void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer 
*buf)
+bool generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer 
*buf)
 {
-   page_cache_get(buf->page);
+   return try_get_page(buf->page);
 }
 EXPORT_SYMBOL(generic_pipe_buf_get);
 
diff --git a/fs/splice.c b/fs/splice.c
index fde1263..57ccc58 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1876,7 +1876,11 @@ retry:
 * Get a reference to this pipe buffer,
 * so we can copy the contents over.
 */
-   pipe_buf_get(ipipe, ibuf);
+   if (!pipe_buf_get(ipipe, ibuf)) {
+   if (ret == 0)
+   ret = -EFAULT;
+   break;
+   }
*obuf = *ibuf;
 
/*
@@ -1948,7 +1952,11 @@ static int link_pipe(struct pipe_inode_info *ipipe,
 * Get a reference to this pipe buffer,
 * so we can copy the contents over.
 */
-   pipe_buf_get(ipipe, ibuf);
+   if (!pipe_buf_get(ipipe, ibuf)) {
+   if (ret == 0)
+   ret = -EFAULT;
+   break;
+   }
 
obuf = opipe->bufs + nbuf;
*obuf = *ibuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 10876f3..0b28b65 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -112,18 +112,20 @@ struct pipe_buf_operations {
/*
 * Get a reference to the pipe buffer.
 */
-   void (*get)(struct pipe_inode_info *, struct pipe_buffer *);
+   bool (*get)(struct pipe_inode_info *, struct pipe_buffer *);
 };
 
 /**
  * pipe_buf_get - get a reference to a pipe_buffer
  * @pipe:  the pipe that the buffer belongs to
  * @buf:   the buffer to get a reference to
+ *
+ * Return: %true if the reference was successfully obtained.
  */
-static inline void pipe_buf_get(struct pipe_inode_info *pipe,
+static inline __must_check bool pipe_buf_get(struct pipe_inode_info *pipe,
struct pipe_buffer 

[PATCH 6/8] mm: prevent get_user_pages() from overflowing page refcount

2019-07-22 Thread Ajay Kaher
From: Linus Torvalds 

commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.

If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn 
Acked-by: Matthew Wilcox 
Cc: sta...@kernel.org
Signed-off-by: Linus Torvalds 
[ 4.4.y backport notes:
  Ajay: Added local variable 'err' with-in follow_hugetlb_page()
from 2be7cfed995e, to resolve compilation error
  Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ]
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Ajay Kaher 
---
 mm/gup.c | 43 ---
 mm/hugetlb.c | 16 +++-
 2 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index fae4d1e..171b460 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -126,8 +126,12 @@ retry:
}
}
 
-   if (flags & FOLL_GET)
-   get_page_foll(page);
+   if (flags & FOLL_GET) {
+   if (unlikely(!try_get_page_foll(page))) {
+   page = ERR_PTR(-ENOMEM);
+   goto out;
+   }
+   }
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
@@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned 
long address,
goto unmap;
*page = pte_page(*pte);
}
-   get_page(*page);
+   if (unlikely(!try_get_page(*page))) {
+   ret = -ENOMEM;
+   goto unmap;
+   }
 out:
ret = 0;
 unmap:
@@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr)
  */
 #ifdef CONFIG_HAVE_GENERIC_RCU_GUP
 
+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+   struct page *head = compound_head(page);
+   if (WARN_ON_ONCE(atomic_read(>_count) < 0))
+   return NULL;
+   if (unlikely(!page_cache_add_speculative(head, refs)))
+   return NULL;
+   return head;
+}
+
 #ifdef __HAVE_ARCH_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 int write, struct page **pages, int *nr)
@@ -1082,9 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
 
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
-   head = compound_head(page);
 
-   if (!page_cache_get_speculative(head))
+   head = try_get_compound_head(page, 1);
+   if (!head)
goto pte_unmap;
 
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
@@ -1141,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
-   head = compound_head(pmd_page(orig));
-   if (!page_cache_add_speculative(head, refs)) {
+   head = try_get_compound_head(pmd_page(orig), refs);
+   if (!head) {
*nr -= refs;
return 0;
}
@@ -1187,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
-   head = compound_head(pud_page(orig));
-   if (!page_cache_add_speculative(head, refs)) {
+   head = try_get_compound_head(pud_page(orig), refs);
+   if (!head) {
*nr -= refs;
return 0;
}
@@ -1229,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
-   head = compound_head(pgd_page(orig));
-   if (!page_cache_add_speculative(head, refs)) {
+   head = try_get_compound_head(pgd_page(orig), refs);
+   if (!head) {
*nr -= refs;
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd932e7..3a1501e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
unsigned long vaddr = *position;
unsigned long remainder = *nr_pages;
struct hstate *h = hstate_vma(vma);
+   int err = -EFAULT;
 
while (vaddr < vma->vm_end && remainder) {
pte_t *pte;
@@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
page = 

Re: [PATCH V6 16/21] soc/tegra: pmc: Add pmc wake support for tegra210

2019-07-22 Thread Sowjanya Komatineni



On 7/22/19 8:03 PM, Dmitry Osipenko wrote:

23.07.2019 4:52, Sowjanya Komatineni пишет:

On 7/22/19 6:41 PM, Dmitry Osipenko wrote:

23.07.2019 4:08, Dmitry Osipenko пишет:

23.07.2019 3:58, Dmitry Osipenko пишет:

21.07.2019 22:40, Sowjanya Komatineni пишет:

This patch implements PMC wakeup sequence for Tegra210 and defines
common used RTC alarm wake event.

Signed-off-by: Sowjanya Komatineni 
---
  drivers/soc/tegra/pmc.c | 111 
  1 file changed, 111 insertions(+)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index 91c84d0e66ae..c556f38874e1 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -57,6 +57,12 @@
  #define  PMC_CNTRL_SYSCLK_OE  BIT(11) /* system clock enable */
  #define  PMC_CNTRL_SYSCLK_POLARITYBIT(10) /* sys clk polarity */
  #define  PMC_CNTRL_MAIN_RST   BIT(4)
+#define  PMC_CNTRL_LATCH_WAKEUPS   BIT(5)

Please follow the TRM's bits naming.

PMC_CNTRL_LATCHWAKE_EN


+#define PMC_WAKE_MASK  0x0c
+#define PMC_WAKE_LEVEL 0x10
+#define PMC_WAKE_STATUS0x14
+#define PMC_SW_WAKE_STATUS 0x18
  
  #define DPD_SAMPLE			0x020

  #define  DPD_SAMPLE_ENABLEBIT(0)
@@ -87,6 +93,11 @@
  
  #define PMC_SCRATCH41			0x140
  
+#define PMC_WAKE2_MASK			0x160

+#define PMC_WAKE2_LEVEL0x164
+#define PMC_WAKE2_STATUS   0x168
+#define PMC_SW_WAKE2_STATUS0x16c
+
  #define PMC_SENSOR_CTRL   0x1b0
  #define  PMC_SENSOR_CTRL_SCRATCH_WRITEBIT(2)
  #define  PMC_SENSOR_CTRL_ENABLE_RST   BIT(1)
@@ -1922,6 +1933,55 @@ static const struct irq_domain_ops 
tegra_pmc_irq_domain_ops = {
.alloc = tegra_pmc_irq_alloc,
  };
  
+static int tegra210_pmc_irq_set_wake(struct irq_data *data, unsigned int on)

+{
+   struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
+   unsigned int offset, bit;
+   u32 value;
+
+   if (data->hwirq == ULONG_MAX)
+   return 0;
+
+   offset = data->hwirq / 32;
+   bit = data->hwirq % 32;
+
+   /*
+* Latch wakeups to SW_WAKE_STATUS register to capture events
+* that would not make it into wakeup event register during LP0 exit.
+*/
+   value = tegra_pmc_readl(pmc, PMC_CNTRL);
+   value |= PMC_CNTRL_LATCH_WAKEUPS;
+   tegra_pmc_writel(pmc, value, PMC_CNTRL);
+   udelay(120);

Why it takes so much time to latch the values? Shouldn't some status-bit
be polled for the completion of latching?

Is this register-write really getting buffered in the PMC?


+   value &= ~PMC_CNTRL_LATCH_WAKEUPS;
+   tegra_pmc_writel(pmc, value, PMC_CNTRL);
+   udelay(120);

120 usecs to remove latching, really?


+   tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
+   tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
+
+   tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
+   tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
+
+   /* enable PMC wake */
+   if (data->hwirq >= 32)
+   offset = PMC_WAKE2_MASK;
+   else
+   offset = PMC_WAKE_MASK;
+
+   value = tegra_pmc_readl(pmc, offset);
+
+   if (on)
+   value |= 1 << bit;
+   else
+   value &= ~(1 << bit);
+
+   tegra_pmc_writel(pmc, value, offset);

Why the latching is done *before* writing into the WAKE registers? What
it is latching then?

I'm looking at the TRM doc and it says that latching should be done
*after* writing to the WAKE_MASK / LEVEL registers.

Secondly it says that it's enough to do:

value = tegra_pmc_readl(pmc, PMC_CNTRL);
value |= PMC_CNTRL_LATCH_WAKEUPS;
tegra_pmc_writel(pmc, value, PMC_CNTRL);

in order to latch. There is no need for the delay and to remove the
"LATCHWAKE_EN" bit, it should be a oneshot action.

Although, no. TRM says "stops latching on transition from 1
to 0 (sequence - set to 1,set to 0)", so it's not a oneshot action.

Have you tested this code at all? I'm wondering how it happens to work
without a proper latching.

Yes, ofcourse its tested and this sequence to do transition is
recommendation from Tegra designer.
Will check if TRM doesn't have update properly or will re-confirm
internally on delay time...

On any of the wake event PMC wakeup happens and WAKE_STATUS register
will have bits set for all events that triggered wake.
After wakeup PMC doesn't update SW_WAKE_STATUS register as per PMC design.
SW latch register added in design helps to provide a way to capture
those events that happen right during wakeup time and didnt make it to
SW_WAKE_STATUS register.
So before next suspend entry, latching all prior wake events into SW
WAKE_STATUS and then clearing them.

I'm now wondering whether the latching cold be turned ON permanently
during of the PMC's probe, for simplicity.
latching should be done on suspend-resume cycle as wake events gets 
generates on every suspend-resume cycle.


Re: [PATCH] be2net: fix adapter->big_page_size miscaculation

2019-07-22 Thread Qian Cai
The original issue,

https://lore.kernel.org/netdev/1562959401-19815-1-git-send-email-...@lca.pw/

The debugging so far seems point to that the compilers get confused by the
module sections. During module_param(), it stores “__param_rx_frag_size"
as a “struct kernel_param” into the __param section. Later, load_module()
obtains all “kernel_param” from the __param section and compare against the
user-input module parameters from the command-line.  If there is a match, it
calls params[i].ops->set([I]) to replace the value.  If compilers can’t
see that params[i].ops->set([I]) could potentially change the value
of rx_frag_size, it will wrongly optimize it as a constant.


For example (it is not
compilable yet as I have not able to extract variable from the __param section
like find_module_sections()),

#include 
#include 

#define __module_param_call(name, ops, arg) \
static struct kernel_param __param_##name \
 __attribute__ ((unused,__section__ ("__param"),aligned(sizeof(void 
* = { \
#name, ops, { arg } }

struct kernel_param {
const char *name;
const struct kernel_param_ops *ops;
union {
int *arg;
};
};

struct kernel_param_ops {
int (*set)(const struct kernel_param *kp);
};

#define STANDARD_PARAM_DEF(name) \
int param_set_##name(const struct kernel_param *kp) \
{ \
*kp->arg = 2; \
} \
const struct kernel_param_ops param_ops_##name = { \
.set = param_set_##name, \
};

STANDARD_PARAM_DEF(ushort);
static int rx = 1;
__module_param_call(rx_frag_siz, _ops_ushort, _frag_size);

int main(int argc, char *argv[])
{
const struct kernel_param *params = <<< Get all kernel_param from the 
__param section >>>;
int i;

if (__builtin_constant_p(rx_frag_size))
printf("rx_frag_size is a const.\n");

for (i = 0; i < num_param; i++) {
if (!strcmp(params[I].name, argv[1])) {
params[i].ops->set([i]);
break;
}
}

printf("rx_frag_size = %d\n", rx_frag_size);

return 0;
}



[PATCH 4/8] mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages

2019-07-22 Thread Ajay Kaher
From: Will Deacon 

commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream.

When operating on hugepages with DEBUG_VM enabled, the GUP code checks
the compound head for each tail page prior to calling
page_cache_add_speculative.  This is broken, because on the fast-GUP
path (where we don't hold any page table locks) we can be racing with a
concurrent invocation of split_huge_page_to_list.

split_huge_page_to_list deals with this race by using page_ref_freeze to
freeze the page and force concurrent GUPs to fail whilst the component
pages are modified.  This modification includes clearing the
compound_head field for the tail pages, so checking this prior to a
successful call to page_cache_add_speculative can lead to false
positives: In fact, page_cache_add_speculative *already* has this check
once the page refcount has been successfully updated, so we can simply
remove the broken calls to VM_BUG_ON_PAGE.

Link: http://lkml.kernel.org/r/20170522133604.11392-2-punit.agra...@arm.com
Signed-off-by: Will Deacon 
Signed-off-by: Punit Agrawal 
Acked-by: Steve Capper 
Acked-by: Kirill A. Shutemov 
Cc: Aneesh Kumar K.V 
Cc: Catalin Marinas 
Cc: Naoya Horiguchi 
Cc: Mark Rutland 
Cc: Hillf Danton 
Cc: Michal Hocko 
Cc: Mike Kravetz 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Ajay Kaher 
---
 mm/gup.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 45c544b..6e7cfaa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1136,7 +1136,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned 
long addr,
page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
tail = page;
do {
-   VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
@@ -1183,7 +1182,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned 
long addr,
page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
tail = page;
do {
-   VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
@@ -1226,7 +1224,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
tail = page;
do {
-   VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
-- 
2.7.4



[PATCH 3/8] mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton

2019-07-22 Thread Ajay Kaher
From: "Kirill A. Shutemov" 

commit 7aef4172c7957d7e65fc172be4c99becaef855d4 upstream.

With new refcounting we are going to see THP tail pages mapped with PTE.
Generic fast GUP rely on page_cache_get_speculative() to obtain
reference on page.  page_cache_get_speculative() always fails on tail
pages, because ->_count on tail pages is always zero.

Let's handle tail pages in gup_pte_range().

New split_huge_page() will rely on migration entries to freeze page's
counts.  Recheck PTE value after page_cache_get_speculative() on head
page should be enough to serialize against split.

Signed-off-by: Kirill A. Shutemov 
Tested-by: Sasha Levin 
Tested-by: Aneesh Kumar K.V 
Acked-by: Jerome Marchand 
Acked-by: Vlastimil Babka 
Cc: Andrea Arcangeli 
Cc: Hugh Dickins 
Cc: Dave Hansen 
Cc: Mel Gorman 
Cc: Rik van Riel 
Cc: Naoya Horiguchi 
Cc: Steve Capper 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Christoph Lameter 
Cc: David Rientjes 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ajay Kaher 
---
 mm/gup.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2cd3b31..45c544b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1070,7 +1070,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
 * for an example see gup_get_pte in arch/x86/mm/gup.c
 */
pte_t pte = READ_ONCE(*ptep);
-   struct page *page;
+   struct page *head, *page;
 
/*
 * Similar to the PMD case below, NUMA hinting must take slow
@@ -1082,15 +1082,17 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, 
unsigned long end,
 
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
+   head = compound_head(page);
 
-   if (!page_cache_get_speculative(page))
+   if (!page_cache_get_speculative(head))
goto pte_unmap;
 
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-   put_page(page);
+   put_page(head);
goto pte_unmap;
}
 
+   VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
 
-- 
2.7.4



[PATCH 5/8] mm, gup: ensure real head page is ref-counted when using hugepages

2019-07-22 Thread Ajay Kaher
From: Punit Agrawal 

commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream.

When speculatively taking references to a hugepage using
page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
page returned by pmd_page() is the head page.  Although normally true,
this assumption doesn't hold when the hugepage comprises of successive
page table entries such as when using contiguous bit on arm64 at PTE or
PMD levels.

This can be addressed by ensuring that the page passed to
page_cache_add_speculative() is the real head or by de-referencing the
head page within the function.

We take the first approach to keep the usage pattern aligned with
page_cache_get_speculative() where users already pass the appropriate
page, i.e., the de-referenced head.

Apply the same logic to fix gup_huge_[pud|pgd]() as well.

[punit.agra...@arm.com: fix arm64 ltp failure]
  Link: http://lkml.kernel.org/r/20170619170145.25577-5-punit.agra...@arm.com
Link: http://lkml.kernel.org/r/20170522133604.11392-3-punit.agra...@arm.com
Signed-off-by: Punit Agrawal 
Acked-by: Steve Capper 
Cc: Michal Hocko 
Cc: "Kirill A. Shutemov" 
Cc: Aneesh Kumar K.V 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Naoya Horiguchi 
Cc: Mark Rutland 
Cc: Hillf Danton 
Cc: Mike Kravetz 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ajay Kaher 
Reviewed-by: Srivatsa S. Bhat (VMware) 
---
 mm/gup.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 6e7cfaa..fae4d1e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1132,8 +1132,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned 
long addr,
return 0;
 
refs = 0;
-   head = pmd_page(orig);
-   page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+   page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1142,6 +1141,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
+   head = compound_head(pmd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1178,8 +1178,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned 
long addr,
return 0;
 
refs = 0;
-   head = pud_page(orig);
-   page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+   page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1188,6 +1187,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
+   head = compound_head(pud_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1220,8 +1220,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
return 0;
 
refs = 0;
-   head = pgd_page(orig);
-   page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+   page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1230,6 +1229,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned 
long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
 
+   head = compound_head(pgd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
-- 
2.7.4



[PATCH 2/8] mm: add 'try_get_page()' helper function

2019-07-22 Thread Ajay Kaher
From: Linus Torvalds 

commit 88b1a17dfc3ed7728316478fae0f5ad508f50397 upsteam.

This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe".  It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page.  The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox 
Cc: Jann Horn 
Cc: sta...@kernel.org
Signed-off-by: Linus Torvalds 
[ 4.4.y backport notes:
  Srivatsa:
  - Adapted try_get_page() to match the get_page()
implementation in 4.4.y, except for the refcount check.
  - Added try_get_page_foll() which will be needed
in a subsequent patch. ]
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Ajay Kaher 
---
 include/linux/mm.h | 12 
 mm/internal.h  | 23 +++
 2 files changed, 35 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 701088e..52edaf1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -505,6 +505,18 @@ static inline void get_page(struct page *page)
atomic_inc(>_count);
 }
 
+static inline __must_check bool try_get_page(struct page *page)
+{
+   if (unlikely(PageTail(page)))
+   if (likely(__get_page_tail(page)))
+   return true;
+
+   if (WARN_ON_ONCE(atomic_read(>_count) <= 0))
+   return false;
+   atomic_inc(>_count);
+   return true;
+}
+
 static inline struct page *virt_to_head_page(const void *x)
 {
struct page *page = virt_to_page(x);
diff --git a/mm/internal.h b/mm/internal.h
index 67015e5..d83afc9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -112,6 +112,29 @@ static inline void get_page_foll(struct page *page)
}
 }
 
+static inline __must_check bool try_get_page_foll(struct page *page)
+{
+   if (unlikely(PageTail(page))) {
+   if (WARN_ON_ONCE(atomic_read(_head(page)->_count) <= 
0))
+   return false;
+   /*
+* This is safe only because
+* __split_huge_page_refcount() can't run under
+* get_page_foll() because we hold the proper PT lock.
+*/
+   __get_page_tail_foll(page, true);
+   } else {
+   /*
+* Getting a normal page or the head of a compound page
+* requires to already have an elevated page->_count.
+*/
+   if (WARN_ON_ONCE(atomic_read(>_count) <= 0))
+   return false;
+   atomic_inc(>_count);
+   }
+   return true;
+}
+
 extern unsigned long highest_memmap_pfn;
 
 /*
-- 
2.7.4



[PATCH 0/8] Backported fixes for 4.4 stable tree

2019-07-22 Thread Ajay Kaher
These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.

Ajay

---
[PATCH 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit

[PATCH 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function

[PATCH 3/8]:
Backporting of upstream commit 7aef4172c795:
mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton

[PATCH 4/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages

[PATCH 5/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages

[PATCH 6/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount

[PATCH 7/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper

[PATCH 8/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get



[PATCH 1/8] mm: make page ref count overflow check tighter and more explicit

2019-07-22 Thread Ajay Kaher
From: Linus Torvalds 

commit f958d7b528b1b40c44cfda5eabe2d82760d868c3 upsteam.

We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox 
Cc: Jann Horn 
Cc: sta...@kernel.org
Signed-off-by: Linus Torvalds 
[ 4.4.y backport notes:
  Ajay: Open-coded atomic refcount access due to missing
  page_ref_count() helper in 4.4.y
  Srivatsa: Added overflow check to get_page_foll() and related code. ]
Signed-off-by: Srivatsa S. Bhat (VMware) 
Signed-off-by: Ajay Kaher 
---
 include/linux/mm.h | 6 +-
 mm/internal.h  | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed653ba..701088e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -488,6 +488,10 @@ static inline void get_huge_page_tail(struct page *page)
 
 extern bool __get_page_tail(struct page *page);
 
+/* 127: arbitrary random number, small enough to assemble well */
+#define page_ref_zero_or_close_to_overflow(page) \
+   ((unsigned int) atomic_read(>_count) + 127u <= 127u)
+
 static inline void get_page(struct page *page)
 {
if (unlikely(PageTail(page)))
@@ -497,7 +501,7 @@ static inline void get_page(struct page *page)
 * Getting a normal page or the head of a compound page
 * requires to already have an elevated page->_count.
 */
-   VM_BUG_ON_PAGE(atomic_read(>_count) <= 0, page);
+   VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
atomic_inc(>_count);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index f63f439..67015e5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -81,7 +81,8 @@ static inline void __get_page_tail_foll(struct page *page,
 * speculative page access (like in
 * page_cache_get_speculative()) on tail pages.
 */
-   VM_BUG_ON_PAGE(atomic_read(_head(page)->_count) <= 0, page);
+   VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(compound_head(page)),
+  page);
if (get_page_head)
atomic_inc(_head(page)->_count);
get_huge_page_tail(page);
@@ -106,7 +107,7 @@ static inline void get_page_foll(struct page *page)
 * Getting a normal page or the head of a compound page
 * requires to already have an elevated page->_count.
 */
-   VM_BUG_ON_PAGE(atomic_read(>_count) <= 0, page);
+   VM_BUG_ON_PAGE(page_ref_zero_or_close_to_overflow(page), page);
atomic_inc(>_count);
}
 }
-- 
2.7.4



Re: [PATCH v4] mmc: host: sdhci-sprd: Fix the incorrect soft reset operation when runtime resuming

2019-07-22 Thread Baolin Wang
Hi Ulf,

On Mon, 22 Jul 2019 at 19:54, Ulf Hansson  wrote:
>
> On Wed, 17 Jul 2019 at 04:29, Baolin Wang  wrote:
> >
> > In sdhci_runtime_resume_host() function, we will always do software reset
> > for all, which will cause Spreadtrum host controller work abnormally after
> > resuming.
>
> What does "software reset for all" means?

The SD host controller specification defines 3 types software reset:
software reset for data line, software reset for command line and
software reset for all.
Software reset for all means this reset affects the entire Host
controller except for the card detection circuit.

>
> >
> > Thus for Spreadtrum platform that will not power down the SD/eMMC card 
> > during
> > runtime suspend, we should not do software reset for all.
>
> Normally, sdhci hosts that enters runtime suspend doesn't power off
> the card (there are some exceptions like PCI variants).

Yes, same as our controller.

>
> So, what's so special here and how does the reset come into play? I
> don't see sdhci doing a reset in sdhci_runtime_suspend|resume_host()
> and nor doesn the callback from the sdhci-sprd.c variant doing it.

In sdhci_runtime_resume_host(), it will issue sdhci_init(host, 0) to
issue software reset for all.

>
> > To fix this
> > issue, adding a specific reset operation that adds one condition to validate
> > the power mode to decide if we can do software reset for all or just reset
> > command and data lines.
> >
> > Signed-off-by: Baolin Wang 
> > ---
> > Changess from v3:
> >  - Use ios.power_mode to validate if the card is power down or not.
> >
> > Changes from v2:
> >  - Simplify the sdhci_sprd_reset() by issuing sdhci_reset().
> >
> > Changes from v1:
> >  - Add a specific reset operation instead of changing the core to avoid
> >  affecting other hardware.
> > ---
> >  drivers/mmc/host/sdhci-sprd.c |   19 ++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
> > index 603a5d9..94f9726 100644
> > --- a/drivers/mmc/host/sdhci-sprd.c
> > +++ b/drivers/mmc/host/sdhci-sprd.c
> > @@ -373,6 +373,23 @@ static unsigned int 
> > sdhci_sprd_get_max_timeout_count(struct sdhci_host *host)
> > return 1 << 31;
> >  }
> >
> > +static void sdhci_sprd_reset(struct sdhci_host *host, u8 mask)
> > +{
> > +   struct mmc_host *mmc = host->mmc;
> > +
> > +   /*
> > +* When try to reset controller after runtime suspend, we should not
> > +* reset for all if the SD/eMMC card is not power down, just reset
> > +* command and data lines instead. Otherwise will meet some strange
> > +* behaviors for Spreadtrum host controller.
> > +*/
> > +   if (host->runtime_suspended && (mask & SDHCI_RESET_ALL) &&
> > +   mmc->ios.power_mode == MMC_POWER_ON)
> > +   mask = SDHCI_RESET_CMD | SDHCI_RESET_DATA;
>
> Can sdhci_sprd_reset() be called when the host is runtime suspended?

When host tries to runtime resume in sdhci_runtime_resume_host(), it
will call reset operation to do software reset.

> That sounds like a bug to me, no?

Since our controller will meet some strange behaviors if we do
software reset for all in sdhci_runtime_resume_host(), and try to
avoid changing the core logic of sdhci_runtime_resume_host() used by
other hardware controllers, thus I introduced a specific reset ops and
added some condition to make sure we just do software reset command
and data lines from runtime suspend state.

>
> > +
> > +   sdhci_reset(host, mask);
> > +}
> > +
> >  static struct sdhci_ops sdhci_sprd_ops = {
> > .read_l = sdhci_sprd_readl,
> > .write_l = sdhci_sprd_writel,
> > @@ -381,7 +398,7 @@ static unsigned int 
> > sdhci_sprd_get_max_timeout_count(struct sdhci_host *host)
> > .get_max_clock = sdhci_sprd_get_max_clock,
> > .get_min_clock = sdhci_sprd_get_min_clock,
> > .set_bus_width = sdhci_set_bus_width,
> > -   .reset = sdhci_reset,
> > +   .reset = sdhci_sprd_reset,
> > .set_uhs_signaling = sdhci_sprd_set_uhs_signaling,
> > .hw_reset = sdhci_sprd_hw_reset,
> > .get_max_timeout_count = sdhci_sprd_get_max_timeout_count,
> > --
> > 1.7.9.5
> >
>
> Kind regards
> Uffe



-- 
Baolin Wang
Best Regards


Re: [PATCH V6 16/21] soc/tegra: pmc: Add pmc wake support for tegra210

2019-07-22 Thread Dmitry Osipenko
23.07.2019 4:52, Sowjanya Komatineni пишет:
> 
> On 7/22/19 6:41 PM, Dmitry Osipenko wrote:
>> 23.07.2019 4:08, Dmitry Osipenko пишет:
>>> 23.07.2019 3:58, Dmitry Osipenko пишет:
 21.07.2019 22:40, Sowjanya Komatineni пишет:
> This patch implements PMC wakeup sequence for Tegra210 and defines
> common used RTC alarm wake event.
>
> Signed-off-by: Sowjanya Komatineni 
> ---
>  drivers/soc/tegra/pmc.c | 111 
> 
>  1 file changed, 111 insertions(+)
>
> diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
> index 91c84d0e66ae..c556f38874e1 100644
> --- a/drivers/soc/tegra/pmc.c
> +++ b/drivers/soc/tegra/pmc.c
> @@ -57,6 +57,12 @@
>  #define  PMC_CNTRL_SYSCLK_OE BIT(11) /* system clock enable 
> */
>  #define  PMC_CNTRL_SYSCLK_POLARITY   BIT(10) /* sys clk polarity */
>  #define  PMC_CNTRL_MAIN_RST  BIT(4)
> +#define  PMC_CNTRL_LATCH_WAKEUPS BIT(5)
>>> Please follow the TRM's bits naming.
>>>
>>> PMC_CNTRL_LATCHWAKE_EN
>>>
> +#define PMC_WAKE_MASK0x0c
> +#define PMC_WAKE_LEVEL   0x10
> +#define PMC_WAKE_STATUS  0x14
> +#define PMC_SW_WAKE_STATUS   0x18
>  
>  #define DPD_SAMPLE   0x020
>  #define  DPD_SAMPLE_ENABLE   BIT(0)
> @@ -87,6 +93,11 @@
>  
>  #define PMC_SCRATCH410x140
>  
> +#define PMC_WAKE2_MASK   0x160
> +#define PMC_WAKE2_LEVEL  0x164
> +#define PMC_WAKE2_STATUS 0x168
> +#define PMC_SW_WAKE2_STATUS  0x16c
> +
>  #define PMC_SENSOR_CTRL  0x1b0
>  #define  PMC_SENSOR_CTRL_SCRATCH_WRITE   BIT(2)
>  #define  PMC_SENSOR_CTRL_ENABLE_RST  BIT(1)
> @@ -1922,6 +1933,55 @@ static const struct irq_domain_ops 
> tegra_pmc_irq_domain_ops = {
>   .alloc = tegra_pmc_irq_alloc,
>  };
>  
> +static int tegra210_pmc_irq_set_wake(struct irq_data *data, unsigned int 
> on)
> +{
> + struct tegra_pmc *pmc = irq_data_get_irq_chip_data(data);
> + unsigned int offset, bit;
> + u32 value;
> +
> + if (data->hwirq == ULONG_MAX)
> + return 0;
> +
> + offset = data->hwirq / 32;
> + bit = data->hwirq % 32;
> +
> + /*
> +  * Latch wakeups to SW_WAKE_STATUS register to capture events
> +  * that would not make it into wakeup event register during LP0 exit.
> +  */
> + value = tegra_pmc_readl(pmc, PMC_CNTRL);
> + value |= PMC_CNTRL_LATCH_WAKEUPS;
> + tegra_pmc_writel(pmc, value, PMC_CNTRL);
> + udelay(120);
 Why it takes so much time to latch the values? Shouldn't some status-bit
 be polled for the completion of latching?

 Is this register-write really getting buffered in the PMC?

> + value &= ~PMC_CNTRL_LATCH_WAKEUPS;
> + tegra_pmc_writel(pmc, value, PMC_CNTRL);
> + udelay(120);
 120 usecs to remove latching, really?

> + tegra_pmc_writel(pmc, 0, PMC_SW_WAKE_STATUS);
> + tegra_pmc_writel(pmc, 0, PMC_SW_WAKE2_STATUS);
> +
> + tegra_pmc_writel(pmc, 0, PMC_WAKE_STATUS);
> + tegra_pmc_writel(pmc, 0, PMC_WAKE2_STATUS);
> +
> + /* enable PMC wake */
> + if (data->hwirq >= 32)
> + offset = PMC_WAKE2_MASK;
> + else
> + offset = PMC_WAKE_MASK;
> +
> + value = tegra_pmc_readl(pmc, offset);
> +
> + if (on)
> + value |= 1 << bit;
> + else
> + value &= ~(1 << bit);
> +
> + tegra_pmc_writel(pmc, value, offset);
 Why the latching is done *before* writing into the WAKE registers? What
 it is latching then?
>>> I'm looking at the TRM doc and it says that latching should be done
>>> *after* writing to the WAKE_MASK / LEVEL registers.
>>>
>>> Secondly it says that it's enough to do:
>>>
>>> value = tegra_pmc_readl(pmc, PMC_CNTRL);
>>> value |= PMC_CNTRL_LATCH_WAKEUPS;
>>> tegra_pmc_writel(pmc, value, PMC_CNTRL);
>>>
>>> in order to latch. There is no need for the delay and to remove the
>>> "LATCHWAKE_EN" bit, it should be a oneshot action.
>> Although, no. TRM says "stops latching on transition from 1
>> to 0 (sequence - set to 1,set to 0)", so it's not a oneshot action.
>>
>> Have you tested this code at all? I'm wondering how it happens to work
>> without a proper latching.
> Yes, ofcourse its tested and this sequence to do transition is
> recommendation from Tegra designer.
> Will check if TRM doesn't have update properly or will re-confirm
> internally on delay time...
> 
> On any of the wake event PMC wakeup happens and WAKE_STATUS register
> will have bits set for all events that triggered wake.
> After wakeup PMC doesn't update SW_WAKE_STATUS register as per PMC design.
> SW latch register added in design helps to provide a way to capture
> those 

Re: [PATCH V4 1/2] dt-bindings: reset: imx7: Add support for i.MX8MM

2019-07-22 Thread Shawn Guo
On Fri, Jul 05, 2019 at 04:54:05PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> i.MX8MM can reuse i.MX8MQ's reset driver, update the compatible
> property and related info to support i.MX8MM.
> 
> Signed-off-by: Anson Huang 

Hi Philipp,

Let me know if you want me to pick this up.

Shawn

> ---
> Changes since V3:
>   - Add comments to those reset indices to indicate which are NOT 
> supported on i.MX8MM.
> ---
>  .../devicetree/bindings/reset/fsl,imx7-src.txt |  6 +++--
>  include/dt-bindings/reset/imx8mq-reset.h   | 28 
> +++---
>  2 files changed, 18 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt 
> b/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt
> index 13e0951..c2489e4 100644
> --- a/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt
> +++ b/Documentation/devicetree/bindings/reset/fsl,imx7-src.txt
> @@ -8,6 +8,7 @@ Required properties:
>  - compatible:
>   - For i.MX7 SoCs should be "fsl,imx7d-src", "syscon"
>   - For i.MX8MQ SoCs should be "fsl,imx8mq-src", "syscon"
> + - For i.MX8MM SoCs should be "fsl,imx8mm-src", "fsl,imx8mq-src", 
> "syscon"
>  - reg: should be register base and length as documented in the
>datasheet
>  - interrupts: Should contain SRC interrupt
> @@ -46,5 +47,6 @@ Example:
>  
>  
>  For list of all valid reset indices see
> - for i.MX7 and
> - for i.MX8MQ
> + for i.MX7,
> + for i.MX8MQ and
> + for i.MX8MM
> diff --git a/include/dt-bindings/reset/imx8mq-reset.h 
> b/include/dt-bindings/reset/imx8mq-reset.h
> index 57c5924..f17ef2a 100644
> --- a/include/dt-bindings/reset/imx8mq-reset.h
> +++ b/include/dt-bindings/reset/imx8mq-reset.h
> @@ -38,26 +38,26 @@
>  #define IMX8MQ_RESET_PCIEPHY_PERST   27
>  #define IMX8MQ_RESET_PCIE_CTRL_APPS_EN   28
>  #define IMX8MQ_RESET_PCIE_CTRL_APPS_TURNOFF  29
> -#define IMX8MQ_RESET_HDMI_PHY_APB_RESET  30
> +#define IMX8MQ_RESET_HDMI_PHY_APB_RESET  30  /* i.MX8MM does 
> NOT support */
>  #define IMX8MQ_RESET_DISP_RESET  31
>  #define IMX8MQ_RESET_GPU_RESET   32
>  #define IMX8MQ_RESET_VPU_RESET   33
> -#define IMX8MQ_RESET_PCIEPHY234
> -#define IMX8MQ_RESET_PCIEPHY2_PERST  35
> -#define IMX8MQ_RESET_PCIE2_CTRL_APPS_EN  36
> -#define IMX8MQ_RESET_PCIE2_CTRL_APPS_TURNOFF 37
> -#define IMX8MQ_RESET_MIPI_CSI1_CORE_RESET38
> -#define IMX8MQ_RESET_MIPI_CSI1_PHY_REF_RESET 39
> -#define IMX8MQ_RESET_MIPI_CSI1_ESC_RESET 40
> -#define IMX8MQ_RESET_MIPI_CSI2_CORE_RESET41
> -#define IMX8MQ_RESET_MIPI_CSI2_PHY_REF_RESET 42
> -#define IMX8MQ_RESET_MIPI_CSI2_ESC_RESET 43
> +#define IMX8MQ_RESET_PCIEPHY234  /* i.MX8MM does 
> NOT support */
> +#define IMX8MQ_RESET_PCIEPHY2_PERST  35  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_PCIE2_CTRL_APPS_EN  36  /* i.MX8MM does 
> NOT support */
> +#define IMX8MQ_RESET_PCIE2_CTRL_APPS_TURNOFF 37  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI1_CORE_RESET38  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI1_PHY_REF_RESET 39  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI1_ESC_RESET 40  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI2_CORE_RESET41  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI2_PHY_REF_RESET 42  /* i.MX8MM does NOT 
> support */
> +#define IMX8MQ_RESET_MIPI_CSI2_ESC_RESET 43  /* i.MX8MM does NOT 
> support */
>  #define IMX8MQ_RESET_DDRC1_PRST  44
>  #define IMX8MQ_RESET_DDRC1_CORE_RESET45
>  #define IMX8MQ_RESET_DDRC1_PHY_RESET 46
> -#define IMX8MQ_RESET_DDRC2_PRST  47
> -#define IMX8MQ_RESET_DDRC2_CORE_RESET48
> -#define IMX8MQ_RESET_DDRC2_PHY_RESET 49
> +#define IMX8MQ_RESET_DDRC2_PRST  47  /* i.MX8MM does 
> NOT support */
> +#define IMX8MQ_RESET_DDRC2_CORE_RESET48  /* i.MX8MM does 
> NOT support */
> +#define IMX8MQ_RESET_DDRC2_PHY_RESET 49  /* i.MX8MM does NOT 
> support */
>  
>  #define IMX8MQ_RESET_NUM 50
>  
> -- 
> 2.7.4
> 


Re: [PATCH V4 2/2] arm64: dts: imx8mm: Add "fsl,imx8mq-src" as src's fallback compatible

2019-07-22 Thread Shawn Guo
On Fri, Jul 05, 2019 at 04:54:06PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> i.MX8MM can reuse i.MX8MQ's src driver, add "fsl,imx8mq-src" as
> src's fallback compatible to enable it.
> 
> Signed-off-by: Anson Huang 
> Reviewed-by: Philipp Zabel 

Applied this one, thanks.


Re: [PATCH v8 13/19] locking/rwsem: Make rwsem->owner an atomic_long_t

2019-07-22 Thread Waiman Long
On 7/21/19 4:49 PM, Luis Henriques wrote:
> Waiman Long  writes:
>
>> On 7/20/19 4:41 AM, Luis Henriques wrote:
>>> "Linus Torvalds"  writes:
>>>
 On Fri, Jul 19, 2019 at 12:32 PM Waiman Long  wrote:
> This patch shouldn't change the behavior of the rwsem code. The code
> only access data within the rw_semaphore structures. I don't know why it
> will cause a KASAN error. I will have to reproduce it and figure out
> exactly which statement is doing the invalid access.
 The stack traces should show line numbers if you run them through
 scripts/decode_stacktrace.sh.

 You need to have debug info enabled for that, though.

 Luis?

  Linus
>>> Yep, sure.  And I should have done this in the initial report.  It's a
>>> different trace, I had to recompile the kernel.
>>>
>>> (I'm also adding Jeff to the CC list.)
>>>
>>> Cheers,
>> Thanks for the information. I think I know where the problem is. Would
>> you mind applying the attached patch to see if it can fix the KASAN error.
> Yep, that seems to work -- I can't reproduce the error anymore (and
> sorry for the delay).  Thanks!  And feel free to add my Tested-by.
>
> Cheers,

Thanks for the testing. I will post the official patch tomorrow.

Cheers,
Longman



[PATCH v2 1/2] clk: tegra: divider: Add missing check for enable-bit on rate's recalculation

2019-07-22 Thread Dmitry Osipenko
Unset "enable" bit means that divider is in bypass mode, hence it doesn't
have any effect in that case. Please note that there are no known bugs
caused by the missing check.

Signed-off-by: Dmitry Osipenko 
---

Changelog:

v2: Changed the commit's description from 'Fix' to 'Add' in response to the
Stephen's Boyd question about the need to backport the patch into stable
kernels. The backporting is not really needed.

 drivers/clk/tegra/clk-divider.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/clk/tegra/clk-divider.c b/drivers/clk/tegra/clk-divider.c
index e76731fb7d69..f33c19045386 100644
--- a/drivers/clk/tegra/clk-divider.c
+++ b/drivers/clk/tegra/clk-divider.c
@@ -40,8 +40,13 @@ static unsigned long clk_frac_div_recalc_rate(struct clk_hw 
*hw,
int div, mul;
u64 rate = parent_rate;
 
-   reg = readl_relaxed(divider->reg) >> divider->shift;
-   div = reg & div_mask(divider);
+   reg = readl_relaxed(divider->reg);
+
+   if ((divider->flags & TEGRA_DIVIDER_UART) &&
+   !(reg & PERIPH_CLK_UART_DIV_ENB))
+   return rate;
+
+   div = (reg >> divider->shift) & div_mask(divider);
 
mul = get_mul(divider);
div += mul;
-- 
2.22.0



[PATCH v2 2/2] clk: tegra: divider: Support enable-bit for Super clocks

2019-07-22 Thread Dmitry Osipenko
All Super clocks have a divider that has the enable bit.

Signed-off-by: Dmitry Osipenko 
---

Changelog:

v2: Improved commit's message.

 drivers/clk/tegra/clk-divider.c | 12 
 drivers/clk/tegra/clk-super.c   |  1 +
 drivers/clk/tegra/clk.h |  4 
 3 files changed, 17 insertions(+)

diff --git a/drivers/clk/tegra/clk-divider.c b/drivers/clk/tegra/clk-divider.c
index f33c19045386..a980b9bddecd 100644
--- a/drivers/clk/tegra/clk-divider.c
+++ b/drivers/clk/tegra/clk-divider.c
@@ -17,6 +17,7 @@
 #define get_max_div(d) div_mask(d)
 
 #define PERIPH_CLK_UART_DIV_ENB BIT(24)
+#define SUPER_CLK_DIV_ENB BIT(31)
 
 static int get_div(struct tegra_clk_frac_div *divider, unsigned long rate,
   unsigned long parent_rate)
@@ -46,6 +47,10 @@ static unsigned long clk_frac_div_recalc_rate(struct clk_hw 
*hw,
!(reg & PERIPH_CLK_UART_DIV_ENB))
return rate;
 
+   if ((divider->flags & TEGRA_DIVIDER_SUPER) &&
+   !(reg & SUPER_CLK_DIV_ENB))
+   return rate;
+
div = (reg >> divider->shift) & div_mask(divider);
 
mul = get_mul(divider);
@@ -96,6 +101,13 @@ static int clk_frac_div_set_rate(struct clk_hw *hw, 
unsigned long rate,
val &= ~(div_mask(divider) << divider->shift);
val |= div << divider->shift;
 
+   if (divider->flags & TEGRA_DIVIDER_SUPER) {
+   if (div)
+   val |= SUPER_CLK_DIV_ENB;
+   else
+   val &= ~SUPER_CLK_DIV_ENB;
+   }
+
if (divider->flags & TEGRA_DIVIDER_UART) {
if (div)
val |= PERIPH_CLK_UART_DIV_ENB;
diff --git a/drivers/clk/tegra/clk-super.c b/drivers/clk/tegra/clk-super.c
index 39ef31b46df5..4d8e36b04f03 100644
--- a/drivers/clk/tegra/clk-super.c
+++ b/drivers/clk/tegra/clk-super.c
@@ -220,6 +220,7 @@ struct clk *tegra_clk_register_super_clk(const char *name,
super->frac_div.width = 8;
super->frac_div.frac_width = 1;
super->frac_div.lock = lock;
+   super->frac_div.flags = TEGRA_DIVIDER_SUPER;
super->div_ops = _clk_frac_div_ops;
 
/* Data in .init is copied by clk_register(), so stack variable OK */
diff --git a/drivers/clk/tegra/clk.h b/drivers/clk/tegra/clk.h
index 905bf1096558..a4fbf55930aa 100644
--- a/drivers/clk/tegra/clk.h
+++ b/drivers/clk/tegra/clk.h
@@ -53,6 +53,9 @@ struct clk *tegra_clk_register_sync_source(const char *name,
  * TEGRA_DIVIDER_UART - UART module divider has additional enable bit which is
  *  set when divider value is not 0. This flags indicates that the divider
  *  is for UART module.
+ * TEGRA_DIVIDER_SUPER - Super clock divider has additional enable bit which
+ *  is set when divider value is not 0. This flags indicates that the
+ *  divider is for super clock.
  */
 struct tegra_clk_frac_div {
struct clk_hw   hw;
@@ -70,6 +73,7 @@ struct tegra_clk_frac_div {
 #define TEGRA_DIVIDER_FIXED BIT(1)
 #define TEGRA_DIVIDER_INT BIT(2)
 #define TEGRA_DIVIDER_UART BIT(3)
+#define TEGRA_DIVIDER_SUPER BIT(4)
 
 extern const struct clk_ops tegra_clk_frac_div_ops;
 struct clk *tegra_clk_register_divider(const char *name,
-- 
2.22.0



Re: [RFC PATCH v3 00/16] Core scheduling v3

2019-07-22 Thread Aubrey Li
On Mon, Jul 22, 2019 at 6:43 PM Aaron Lu  wrote:
>
> On 2019/7/22 18:26, Aubrey Li wrote:
> > The granularity period of util_avg seems too large to decide task priority
> > during pick_task(), at least it is in my case, cfs_prio_less() always picked
> > core max task, so pick_task() eventually picked idle, which causes this 
> > change
> > not very helpful for my case.
> >
> >  -0 [057] dN..83.716973: __schedule: max: sysbench/2578
> > 889050f68600
> >  -0 [057] dN..83.716974: __schedule:
> > (swapper/5/0;140,0,0) ?< (mysqld/2511;119,1042118143,0)
> >  -0 [057] dN..83.716975: __schedule:
> > (sysbench/2578;119,96449836,0) ?< (mysqld/2511;119,1042118143,0)
> >  -0 [057] dN..83.716975: cfs_prio_less: picked
> > sysbench/2578 util_avg: 20 527 -507 <=== here===
> >  -0 [057] dN..83.716976: __schedule: pick_task cookie
> > pick swapper/5/0 889050f68600
>
> Can you share your setup of the test? I would like to try it locally.

My setup is a co-location of AVX512 tasks(gemmbench) and non-AVX512 tasks
(sysbench MYSQL). Let me simply it and send offline.

Thanks,
-Aubrey


Re: [PATCH 6/6] arm64: dts: imx8mq: Add clock for TMU node

2019-07-22 Thread Shawn Guo
On Fri, Jul 05, 2019 at 12:56:12PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> i.MX8MQ has clock gate for TMU module, add clock info to TMU
> node for clock management.
> 
> Signed-off-by: Anson Huang 

Applied, thanks.


Re: [PATCH 5/6] clk: imx8mq: Remove CLK_IS_CRITICAL flag for IMX8MQ_CLK_TMU_ROOT

2019-07-22 Thread Shawn Guo
On Fri, Jul 05, 2019 at 12:56:11PM +0800, anson.hu...@nxp.com wrote:
> From: Anson Huang 
> 
> IMX8MQ_CLK_TMU_ROOT is ONLY used for thermal module, the driver
> should manage this clock, so no need to have CLK_IS_CRITICAL flag
> set.
> 
> Signed-off-by: Anson Huang 

Applied, thanks.


[PATCH] RDMA/hns: Fix build error for hip08

2019-07-22 Thread YueHaibing
If INFINIBAND_HNS_HIP08 is selected and HNS3 is m,
but INFINIBAND_HNS is y, building fails:

drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_exit':
hns_roce_hw_v2.c:(.exit.text+0xd): undefined reference to 
`hnae3_unregister_client'
drivers/infiniband/hw/hns/hns_roce_hw_v2.o: In function `hns_roce_hw_v2_init':
hns_roce_hw_v2.c:(.init.text+0xd): undefined reference to 
`hnae3_register_client'

Reported-by: Hulk Robot 
Fixes: dd74282df573 ("RDMA/hns: Initialize the PCI device for hip08 RoCE")
Signed-off-by: YueHaibing 
---
 drivers/infiniband/hw/hns/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/Kconfig 
b/drivers/infiniband/hw/hns/Kconfig
index b59da5d..4371c80 100644
--- a/drivers/infiniband/hw/hns/Kconfig
+++ b/drivers/infiniband/hw/hns/Kconfig
@@ -23,7 +23,8 @@ config INFINIBAND_HNS_HIP06
 
 config INFINIBAND_HNS_HIP08
bool "Hisilicon Hip08 Family RoCE support"
-   depends on INFINIBAND_HNS && PCI && HNS3
+   depends on INFINIBAND_HNS && (INFINIBAND_HNS = HNS3)
+   depends on PCI
---help---
  RoCE driver support for Hisilicon RoCE engine in Hisilicon Hip08 SoC.
  The RoCE engine is a PCI device.
-- 
2.7.4




Re: [PATCH RESEND 1/1] dt-bindings: serial: lpuart: add the clock requirement for imx8qxp

2019-07-22 Thread Shawn Guo
On Thu, Jul 04, 2019 at 09:43:55PM +0800, fugang.d...@nxp.com wrote:
> From: Fugang Duan 
> 
> Add the baud clock requirement for imx8qxp.
> 
> Signed-off-by: Fugang Duan 

Applied, thanks.


[PATCH] power/supply: ingenic-battery: Don't change scale if there's only one

2019-07-22 Thread Paul Cercueil
The ADC in the JZ4740 can work either in high-precision mode with a 2.5V
range, or in low-precision mode with a 7.5V range. The code in place in
this driver will select the proper scale according to the maximum
voltage of the battery.

The JZ4770 however only has one mode, with a 6.6V range. If only one
scale is available, there's no need to change it (and nothing to change
it to), and trying to do so will fail with -EINVAL.

Fixes commit fb24ccfbe1e0 ("power: supply: add Ingenic JZ47xx battery
driver.")

Signed-off-by: Paul Cercueil 
Cc: sta...@vger.kernel.org
---
 drivers/power/supply/ingenic-battery.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/power/supply/ingenic-battery.c 
b/drivers/power/supply/ingenic-battery.c
index 35816d4b3012..5a53057b4f64 100644
--- a/drivers/power/supply/ingenic-battery.c
+++ b/drivers/power/supply/ingenic-battery.c
@@ -80,6 +80,10 @@ static int ingenic_battery_set_scale(struct ingenic_battery 
*bat)
if (ret != IIO_AVAIL_LIST || scale_type != IIO_VAL_FRACTIONAL_LOG2)
return -EINVAL;
 
+   /* Only one (fractional) entry - nothing to change */
+   if (scale_len == 2)
+   return 0;
+
max_mV = bat->info.voltage_max_design_uv / 1000;
 
for (i = 0; i < scale_len; i += 2) {
-- 
2.21.0.593.g511ec345e18



[PATCH] RDMA/hns: Fix build error for hip06

2019-07-22 Thread YueHaibing
If INFINIBAND_HNS_HIP06 is selected and HNS_DSAF
is m, but INFINIBAND_HNS is y, building fails:

drivers/infiniband/hw/hns/hns_roce_hw_v1.o: In function `hns_roce_v1_reset':
hns_roce_hw_v1.c:(.text+0x39fa): undefined reference to `hns_dsaf_roce_reset'
hns_roce_hw_v1.c:(.text+0x3a25): undefined reference to `hns_dsaf_roce_reset'

Reported-by: Hulk Robot 
Fixes: 08805fdbeb2d ("RDMA/hns: Split hw v1 driver from hns roce driver")
Signed-off-by: YueHaibing 
---
 drivers/infiniband/hw/hns/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/Kconfig 
b/drivers/infiniband/hw/hns/Kconfig
index 8bf847b..b59da5d 100644
--- a/drivers/infiniband/hw/hns/Kconfig
+++ b/drivers/infiniband/hw/hns/Kconfig
@@ -12,7 +12,8 @@ config INFINIBAND_HNS
 
 config INFINIBAND_HNS_HIP06
bool "Hisilicon Hip06 Family RoCE support"
-   depends on INFINIBAND_HNS && HNS && HNS_DSAF && HNS_ENET
+   depends on INFINIBAND_HNS && HNS && (INFINIBAND_HNS = HNS_DSAF)
+   depends on HNS_ENET
---help---
  RoCE driver support for Hisilicon RoCE engine in Hisilicon Hip06 and
  Hip07 SoC. These RoCE engines are platform devices.
-- 
2.7.4




Re: [PATCH 0/3] arm64: Allow early timestamping of kernel log

2019-07-22 Thread Hanjun Guo
On 2019/7/22 18:33, Marc Zyngier wrote:
> So far, we've let the arm64 kernel start its meaningful time stamping
> of the kernel log pretty late, which is caused by sched_clock() being
> initialised rather late compared to other architectures.
> 
> Pavel Tatashin proposed[1] to move the initialisation of sched_clock
> much earlier, which I had objections to. The reason for initialising
> sched_clock late is that a number of systems have broken counters, and
> we need to apply all kind of terrifying workarounds to avoid time
> going backward on the affected platforms. Being able to identify the
> right workaround comes pretty late in the kernel boot, and providing
> an unreliable sched_clock, even for a short period of time, isn't an
> appealing prospect.
> 
> To address this, I'm proposing that we allow an architecture to chose
> to (1) divorce time stamping and sched_clock during the early phase of
> booting, and (2) inherit the time stamping clock as the new epoch the
> first time a sched_sched clock gets registered.
> 
> (1) would allow arm64 to provide a time stamping clock, however
> unreliable it might be, while (2) would allow sched_clock to provide
> time stamps that are continuous with the time-stamping clock.
> 
> The last patch in the series adds the necessary logic to arm64,
> allowing the (potentially unreliable) time stamping of early kernel
> messages.
> 
> Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot
> tested on a x86 guest.

This makes the boot log more useful and I can debug some time consuming
issue easier before the arch timer initialization, tested on my ARM64
server, I can see the timestamping from the start [1],

Tested-by: Hanjun Guo 

Thanks
Hanjun

[1]:
[0.00] Booting Linux on physical CPU 0x08 [0x481fd010]
[0.00] Linux version 5.2.0+ (root@localhost.localdomain) (gcc version 
9.0.1 20190312 (Red Hat 9.0.1-0.10) (GCC)) #45 SMP Tue Jul 23 09:17:48 CST 2019
[0.00] Using timestamp clock @100MHz
[0.74] efi: Getting EFI parameters from FDT:
[0.82] efi: EFI v2.70 by EDK II
[0.83] efi:  ACPI 2.0=0x3a30  SMBIOS 3.0=0x39f8  
MEMATTR=0x30996018  MEMRESERVE=0x30997e18
[0.000122] crashkernel reserved: 0x0ba0 - 0x2ba0 
(512 MB)
[0.000126] cma: Failed to reserve 512 MiB
[0.185111] ACPI: Early table checksum verification disabled
[0.185115] ACPI: RSDP 0x3A30 24 (v02 HISI  )
[0.185120] ACPI: XSDT 0x3A27 9C (v01 HISI   HIP08
  0113)
[0.185127] ACPI: FACP 0x39B1 000114 (v06 HISI   HIP08
 HISI 20151124)
[0.185134] ACPI: DSDT 0x39AB 0084E4 (v02 HISI   HIP08
 INTL 20181213)
[0.185139] ACPI: PCCT 0x39FB 8A (v01 HISI   HIP08
 HISI 20151124)
[0.185143] ACPI: SSDT 0x39F9 01021A (v02 HISI   HIP07
 INTL 20181213)
[0.185147] ACPI: BERT 0x39F5 30 (v01 HISI   HIP08
 HISI 20151124)
[0.185150] ACPI: HEST 0x39F3 000308 (v01 HISI   HIP08
 HISI 20151124)
[0.185154] ACPI: ERST 0x39EF 000230 (v01 HISI   HIP08
 HISI 20151124)
[0.185158] ACPI: EINJ 0x39EE 000170 (v01 HISI   HIP08
 HISI 20151124)
[0.185162] ACPI: SLIT 0x39B3 3C (v01 HISI   HIP08
 HISI 20151124)
[0.185166] ACPI: GTDT 0x39B0 7C (v02 HISI   HIP08
 HISI 20151124)
[0.185169] ACPI: MCFG 0x39AF 3C (v01 HISI   HIP08
 HISI 20151124)
[0.185173] ACPI: SPCR 0x39AE 50 (v02 HISI   HIP08
 HISI 20151124)
[0.185177] ACPI: SRAT 0x39AD 0007D0 (v03 HISI   HIP08
 HISI 20151124)
[0.185181] ACPI: APIC 0x39AC 001E6C (v04 HISI   HIP08
 HISI 20151124)
[0.185185] ACPI: IORT 0x39AA 001310 (v00 HISI   HIP08
 INTL 20181213)
[0.185189] ACPI: PPTT 0x3097 0031B0 (v01 HISI   HIP08
 HISI 20151124)
[0.185196] ACPI: SPCR: console: pl011,mmio32,0x9408,115200
[0.185208] ACPI: SRAT: Node 0 PXM 0 [mem 0x208000-0x2f]
[0.185210] ACPI: SRAT: Node 1 PXM 1 [mem 0x30-0x3f]
[0.185212] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x7fff]
[0.185213] ACPI: SRAT: Node 2 PXM 2 [mem 0x2020-0x202f]
[0.185215] ACPI: SRAT: Node 3 PXM 3 [mem 0x2030-0x203f]
[0.185221] NUMA: NODE_DATA [mem 0x2fe3c0-0x2f]
[0.185224] NUMA: NODE_DATA [mem 0x3fe3c0-0x3f]
[0.185226] NUMA: NODE_DATA [mem 0x202fe3c0-0x202f]
[0.185229] NUMA: NODE_DATA [mem 0x203ffdfde3c0-0x203ffdfd]




Re: Linux 5.3-rc1

2019-07-22 Thread Guenter Roeck

On 7/22/19 4:45 PM, James Bottomley wrote:

[linux-scsi added to cc]
On Mon, 2019-07-22 at 15:21 -0700, Guenter Roeck wrote:

On Sun, Jul 21, 2019 at 02:33:38PM -0700, Linus Torvalds wrote:

[ ... ]


Go test,



Things looked pretty good until a few days ago. Unfortunately,
the last few days brought in a couple of issues.

riscv:virt:defconfig:scsi[virtio]
riscv:virt:defconfig:scsi[virtio-pci]

Boot tests crash with no useful backtrace. Bisect points to
merge ac60602a6d8f ("Merge tag 'dma-mapping-5.3-1'"). Log is at
https://kerneltests.org/builders/qemu-riscv64-master/builds/238/steps
/qemubuildcommand_1/logs/stdio

ppc:mpc8544ds:mpc85xx_defconfig:sata-sii3112
ppc64:pseries:pseries_defconfig:sata-sii3112
ppc64:pseries:pseries_defconfig:little:sata-sii3112
ppc64:ppce500:corenet64_smp_defconfig:e5500:sata-sii3112

ata1: lost interrupt (Status 0x50)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: failed command: READ DMA

and many similar errors. Boot ultimately times out. Bisect points to
merge
f65420df914a ("Merge tag 'scsi-fixes'").

Logs:
https://kerneltests.org/builders/qemu-ppc64-master/builds/1212/steps/
qemubuildcommand/logs/stdio
https://kerneltests.org/builders/qemu-ppc-master/builds/1255/steps/qe
mubuildcommand/logs/stdio

Guenter

---
riscv bisect log

# bad: [5f9e832c137075045d15cd6899ab0505cfb2ca4b] Linus 5.3-rc1
# good: [bdd17bdef7d8da4d8eee254abb4c92d8a566bdc1] scsi: core: take
the DMA max mapping size into account
git bisect start 'HEAD' 'bdd17bdef7d8'
# good: [237f83dfbe668443b5e31c3c7576125871cca674] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 237f83dfbe668443b5e31c3c7576125871cca674
# good: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag 'drm-
next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
git bisect good be8454afc50f43016ca8b6130d9673bdd0bd56ec
# good: [d4df33b0e9925c158b313a586fb1557cf29cfdf4] Merge branch 'for-
linus-5.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
git bisect good d4df33b0e9925c158b313a586fb1557cf29cfdf4
# good: [f90b8fda3a9d72a9422ea80ae95843697f94ea4a] ARM: dts: gemini:
Set DIR-685 SPI CS as active low
git bisect good f90b8fda3a9d72a9422ea80ae95843697f94ea4a
# good: [31cc088a4f5d83481c6f5041bd6eb06115b974af] Merge tag 'drm-
next-2019-07-19' of git://anongit.freedesktop.org/drm/drm
git bisect good 31cc088a4f5d83481c6f5041bd6eb06115b974af
# good: [ad21a4ce040cc41b4a085417169b558e86af56b7] dt-bindings:
pinctrl: aspeed: Fix 'compatible' schema errors
git bisect good ad21a4ce040cc41b4a085417169b558e86af56b7
# good: [e6023adc5c6af79ac8ac5b17939f58091fa0d870] Merge branch
'core-urgent-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good e6023adc5c6af79ac8ac5b17939f58091fa0d870
# bad: [ac60602a6d8f6830dee89f4b87ee005f62eb7171] Merge tag 'dma-
mapping-5.3-1' of git://git.infradead.org/users/hch/dma-mapping
git bisect bad ac60602a6d8f6830dee89f4b87ee005f62eb7171
# good: [6e67d77d673d785631b0c52314b60d3c68ebe809] perf vendor events
s390: Add JSON files for machine type 8561
git bisect good 6e67d77d673d785631b0c52314b60d3c68ebe809
# good: [a0d14b8909de55139b8702fe0c7e80b69763dcfb] x86/mm, tracing:
Fix CR2 corruption
git bisect good a0d14b8909de55139b8702fe0c7e80b69763dcfb
# good: [6879298bd0673840cadd1fb36d7225485504ceb4] x86/entry/64:
Prevent clobbering of saved CR2 value
git bisect good 6879298bd0673840cadd1fb36d7225485504ceb4
# good: [449fa54d6815be8c2c1f68fa9dbbae9384a7c03e] dma-direct:
correct the physical addr in dma_direct_sync_sg_for_cpu/device
git bisect good 449fa54d6815be8c2c1f68fa9dbbae9384a7c03e
# good: [e0c5c5e308ee9b3548844f0d88da937782b895ef] Merge tag 'perf-
core-for-mingo-5.3-20190715' of
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into
perf/urgent
git bisect good e0c5c5e308ee9b3548844f0d88da937782b895ef
# good: [c6dd78fcb8eefa15dd861889e0f59d301cb5230c] Merge branch 'x86-
urgent-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good c6dd78fcb8eefa15dd861889e0f59d301cb5230c
# first bad commit: [ac60602a6d8f6830dee89f4b87ee005f62eb7171] Merge
tag 'dma-mapping-5.3-1' of git://git.infradead.org/users/hch/dma-
mapping

-
ppc/ppc64 bisect log

# bad: [5f9e832c137075045d15cd6899ab0505cfb2ca4b] Linus 5.3-rc1
# good: [abdfd52a295fb5731ab07b5c9013e2e39f4d1cbe] Merge tag 'armsoc-
defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect start 'HEAD' 'abdfd52a295f'
# bad: [e6023adc5c6af79ac8ac5b17939f58091fa0d870] Merge branch 'core-
urgent-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad e6023adc5c6af79ac8ac5b17939f58091fa0d870
# bad: [f65420df914a85e33b2c8b1cab310858b2abb7c0] Merge tag 'scsi-
fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad f65420df914a85e33b2c8b1cab310858b2abb7c0
# good: [168c79971b4a7be7011e73bf488b740a8e1135c8] Merge tag 'kbuild-
v5.3-2' of

[PATCH v2 2/2] soc/tegra: pmc: Remove unnecessary memory barrier

2019-07-22 Thread Dmitry Osipenko
The removed barrier isn't needed because the writes/reads are strictly
ordered and even if PMC had separate ports for the writes, it wouldn't
matter since the hardware logic takes into effect after triggering CPU's
power-gating and at that point all CPU accesses are guaranteed to be
completed. Hence remove the barrier to eliminate the confusion.

Signed-off-by: Dmitry Osipenko 
---

Changelog:

v2: New patch that was added after Jon's Hunter pointing that it's better
not to change the barrier's placement in the code. In fact the barrier
is not needed at all.

 drivers/soc/tegra/pmc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index aba3396b2e73..3044809f1c10 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -1457,8 +1457,6 @@ void tegra_pmc_enter_suspend_mode(enum tegra_suspend_mode 
mode)
do_div(ticks, USEC_PER_SEC);
tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER);
 
-   wmb();
-
value = tegra_pmc_readl(pmc, PMC_CNTRL);
value &= ~PMC_CNTRL_SIDE_EFFECT_LP0;
value |= PMC_CNTRL_CPU_PWRREQ_OE;
-- 
2.22.0



Re: scsi_debug module panic

2019-07-22 Thread Randy Dunlap
[add linux-scsi]

On 7/22/19 4:39 PM, Murphy Zhou wrote:
> 
> Hi,
> 
> It reproduces every time. It's ok on v5.2. So it's a regression in v5.3-rc1.
> 
> Thanks,
> M
> 
> [root@7u ~]# modprobe scsi_debug
> [  244.084203] scsi host2: scsi_debug: version 0188 [20190125]
> [  244.084203]   dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
> [  244.093098] BUG: kernel NULL pointer dereference, address: 
> [  244.097625] #PF: supervisor read access in kernel mode
> [  244.101175] #PF: error_code(0x) - not-present page
> [  244.104670] PGD 0 P4D 0
> [  244.106381] Oops:  [#1] SMP PTI
> [  244.108738] CPU: 17 PID: 182 Comm: kworker/u64:1 Not tainted 
> 5.3.0-rc1-master-5f9e832 #112
> [  244.114161] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  244.117854] Workqueue: events_unbound async_run_entry_fn
> [  244.121025] RIP: 0010:dma_direct_max_mapping_size+0x2b/0x65
> [  244.124324] Code: 66 66 66 90 55 53 48 89 fb e8 f1 14 00 00 84 c0 75 0a 5b 
> 48 c7 c0 ff ff ff ff 5d c3 48 8b 83 28 02 00 00 48 8b ab 38 02 00 00 <48> 8b 
> 00 48 89 ea 48 85 c0 74 0f 48 85 d2 48 89 c5 74 07 48 39 d0
> [  244.135752] RSP: 0018:b3bd40733bf8 EFLAGS: 00010202
> [  244.139237] RAX:  RBX: a027feb50c18 RCX: 
> 
> [  244.143966] RDX: 0800 RSI: 0800 RDI: 
> a027feb50c18
> [  244.148748] RBP:  R08: 000300e0 R09: 
> a028104dd280
> [  244.153399] R10: a028104dd280 R11: ffa0 R12: 
> a027feb50c18
> [  244.157982] R13:  R14: a0280513c828 R15: 
> 
> [  244.162375] FS:  () GS:a0289464() 
> knlGS:
> [  244.167286] CS:  0010 DS:  ES:  CR0: 80050033
> [  244.170876] CR2:  CR3: 3c20a000 CR4: 
> 06e0
> [  244.175116] Call Trace:
> [  244.176622]  __scsi_init_queue+0x7a/0x130
> [  244.178788]  scsi_mq_alloc_queue+0x34/0x50
> [  244.181015]  scsi_alloc_sdev+0x1e4/0x2b0
> [  244.183150]  scsi_probe_and_add_lun+0x8af/0xd60
> [  244.185628]  ? kobject_set_name_vargs+0x6e/0x90
> [  244.188168]  ? dev_set_name+0x53/0x70
> [  244.190258]  ? _cond_resched+0x15/0x30
> [  244.192416]  ? mutex_lock+0xe/0x30
> [  244.194284]  __scsi_scan_target+0xf4/0x250
> [  244.196527]  scsi_scan_channel.part.13+0x52/0x70
> [  244.198830]  scsi_scan_host_selected+0xe3/0x190
> [  244.201159]  ? __switch_to_asm+0x40/0x70
> [  244.203124]  do_scan_async+0x17/0x180
> [  244.204961]  async_run_entry_fn+0x39/0x160
> [  244.207012]  process_one_work+0x171/0x380
> [  244.209007]  worker_thread+0x49/0x3f0
> [  244.210840]  kthread+0xf8/0x130
> [  244.212419]  ? max_active_store+0x80/0x80
> [  244.214426]  ? kthread_bind+0x10/0x10
> [  244.216264]  ret_from_fork+0x35/0x40
> [  244.218075] Modules linked in: scsi_debug sunrpc snd_hda_codec_generic 
> ledtrig_audio snd_hda_intel snd_hda_codec crct10dif_pclmul snd_hda_core 
> crc32_pclmul snd_hwdep ghash_clmulni_intel snd_seq snd_seq_device snd_pcm 
> aesni_intel crypto_simd snd_timer cryptd snd glue_helper sg pcspkr soundcore 
> joydev virtio_balloon i2c_piix4 ip_tables xfs libcrc32c qxl drm_kms_helper 
> syscopyarea sysfillrect sd_mod sysimgblt fb_sys_fops ttm ata_generic 
> pata_acpi drm virtio_console 8139too ata_piix libata virtio_pci 8139cp 
> virtio_ring crc32c_intel serio_raw mii virtio floppy dm_mirror dm_region_hash 
> dm_log dm_mod
> [  244.243647] CR2: 
> [  244.245274] ---[ end trace 1209311dc64cb7fa ]---
> [  244.247399] RIP: 0010:dma_direct_max_mapping_size+0x2b/0x65
> [  244.250145] Code: 66 66 66 90 55 53 48 89 fb e8 f1 14 00 00 84 c0 75 0a 5b 
> 48 c7 c0 ff ff ff ff 5d c3 48 8b 83 28 02 00 00 48 8b ab 38 02 00 00 <48> 8b 
> 00 48 89 ea 48 85 c0 74 0f 48 85 d2 48 89 c5 74 07 48 39 d0
> [  244.258533] RSP: 0018:b3bd40733bf8 EFLAGS: 00010202
> [  244.260749] RAX:  RBX: a027feb50c18 RCX: 
> 
> [  244.263777] RDX: 0800 RSI: 0800 RDI: 
> a027feb50c18
> [  244.266798] RBP:  R08: 000300e0 R09: 
> a028104dd280
> [  244.269901] R10: a028104dd280 R11: ffa0 R12: 
> a027feb50c18
> [  244.272899] R13:  R14: a0280513c828 R15: 
> 
> [  244.275909] FS:  () GS:a0289464() 
> knlGS:
> [  244.279131] CS:  0010 DS:  ES:  CR0: 80050033
> [  244.281655] CR2:  CR3: 3c20a000 CR4: 
> 06e0
> [  244.284554] Kernel panic - not syncing: Fatal exception
> [  244.287052] Kernel Offset: 0x22c0 from 0x8100 (relocation 
> range: 0x8000-0xbfff)
> [  244.291412] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 


-- 
~Randy


[PATCH v2 1/2] soc/tegra: pmc: Query PCLK clock rate at probe time

2019-07-22 Thread Dmitry Osipenko
The PCLK clock is running off SCLK, which is a critical clock that is
very unlikely to randomly change its rate. It's also a bit clumsy (and
apparently incorrect) to query the clock's rate with interrupts being
disabled because clk_get_rate() takes a mutex and that's the case during
suspend/cpuidle entering.

Signed-off-by: Dmitry Osipenko 
---

Changelog:

v2: Addressed review comments that were made by Jon Hunter to v1 by
not moving the memory barrier, replacing one missed clk_get_rate()
with pmc->rate, handling possible clk_get_rate() error on probe and
slightly adjusting the commits message.

 drivers/soc/tegra/pmc.c | 34 --
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
index 9f9c1c677cf4..aba3396b2e73 100644
--- a/drivers/soc/tegra/pmc.c
+++ b/drivers/soc/tegra/pmc.c
@@ -1192,7 +1192,7 @@ static int tegra_io_pad_prepare(struct tegra_pmc *pmc, 
enum tegra_io_pad id,
return err;
 
if (pmc->clk) {
-   rate = clk_get_rate(pmc->clk);
+   rate = pmc->rate;
if (!rate) {
dev_err(pmc->dev, "failed to get clock rate\n");
return -ENODEV;
@@ -1433,6 +1433,7 @@ void tegra_pmc_set_suspend_mode(enum tegra_suspend_mode 
mode)
 void tegra_pmc_enter_suspend_mode(enum tegra_suspend_mode mode)
 {
unsigned long long rate = 0;
+   u64 ticks;
u32 value;
 
switch (mode) {
@@ -1441,31 +1442,22 @@ void tegra_pmc_enter_suspend_mode(enum 
tegra_suspend_mode mode)
break;
 
case TEGRA_SUSPEND_LP2:
-   rate = clk_get_rate(pmc->clk);
+   rate = pmc->rate;
break;
 
default:
break;
}
 
-   if (WARN_ON_ONCE(rate == 0))
-   rate = 1;
+   ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1;
+   do_div(ticks, USEC_PER_SEC);
+   tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER);
 
-   if (rate != pmc->rate) {
-   u64 ticks;
+   ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1;
+   do_div(ticks, USEC_PER_SEC);
+   tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER);
 
-   ticks = pmc->cpu_good_time * rate + USEC_PER_SEC - 1;
-   do_div(ticks, USEC_PER_SEC);
-   tegra_pmc_writel(pmc, ticks, PMC_CPUPWRGOOD_TIMER);
-
-   ticks = pmc->cpu_off_time * rate + USEC_PER_SEC - 1;
-   do_div(ticks, USEC_PER_SEC);
-   tegra_pmc_writel(pmc, ticks, PMC_CPUPWROFF_TIMER);
-
-   wmb();
-
-   pmc->rate = rate;
-   }
+   wmb();
 
value = tegra_pmc_readl(pmc, PMC_CNTRL);
value &= ~PMC_CNTRL_SIDE_EFFECT_LP0;
@@ -2082,8 +2074,14 @@ static int tegra_pmc_probe(struct platform_device *pdev)
pmc->clk = NULL;
}
 
+   pmc->rate = clk_get_rate(pmc->clk);
pmc->dev = >dev;
 
+   if (!pmc->rate) {
+   dev_err(>dev, "failed to get pclk rate\n");
+   pmc->rate = 1;
+   }
+
tegra_pmc_init(pmc);
 
tegra_pmc_init_tsense_reset(pmc);
-- 
2.22.0



RE: [PATCH V5 3/5] arm64: dts: imx8mm: Add system counter node

2019-07-22 Thread Anson Huang
Hi, Shawn

> On Wed, Jul 10, 2019 at 02:30:54PM +0800, anson.hu...@nxp.com wrote:
> > From: Anson Huang 
> >
> > Add i.MX8MM system counter node to enable timer-imx-sysctr broadcast
> > timer driver.
> >
> > Signed-off-by: Anson Huang 
> 
> Do I need to wait for patch #1 landing before I apply #3 ~ #5, or can they be
> applied independently (no breaking on anything)?

Without #1, system can bootup, but the system counter's freq will be incorrect,
although it does NOT impact normal function. So I think it is better to wait for
#1 landing. @daniel.lezc...@linaro.org, can you help review the #1 patch, since
I use a different way to fix the clock issue which is more simple.

Anson



Re: [PATCH v3 5/6] dt-bindings: interconnect: Add interconnect-opp-table property

2019-07-22 Thread Viresh Kumar
On 22-07-19, 17:39, Rob Herring wrote:
> On Tue, Jul 02, 2019 at 06:10:19PM -0700, Saravana Kannan wrote:
> > Add support for listing bandwidth OPP tables for each interconnect path
> > listed using the interconnects property.
> > 
> > Signed-off-by: Saravana Kannan 
> > ---
> >  .../devicetree/bindings/interconnect/interconnect.txt | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git 
> > a/Documentation/devicetree/bindings/interconnect/interconnect.txt 
> > b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > index 6f5d23a605b7..fc5b75b76a2c 100644
> > --- a/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > +++ b/Documentation/devicetree/bindings/interconnect/interconnect.txt
> > @@ -55,10 +55,18 @@ interconnect-names : List of interconnect path name 
> > strings sorted in the same
> >  * dma-mem: Path from the device to the main memory of
> > the system
> >  
> > +interconnect-opp-table: List of phandles to OPP tables (bandwidth OPP 
> > tables)
> > +   that specify the OPPs for the interconnect paths listed
> > +   in the interconnects property. This property can only
> > +   point to OPP tables that belong to the device and are
> > +   listed in the device's operating-points-v2 property.
> > +
> 
> IMO, there's no need for this property. Which OPP is which should be 
> defined already as part of the device's binding. That's enough for the 
> driver to know which OPP applies to the interconnect.

And if there is confusion we can actually use the compatible property
to have another string which highlights that it is an interconnect OPP
?

-- 
viresh


  1   2   3   4   5   6   7   8   9   10   >